Getting Started with Smart Gallery in U-M Zoom Rooms

Environment

ITS Service Center, U-M Zoom Rooms Support

Issue

What is Smart Gallery and how can I enable it in U-M Zoom Rooms?

Resolution

Zoom Smart Gallery extracts images of regions of interest (for example, a person that is speaking) from the video stream of a camera into separate, individual video streams. It also displays the single video stream showing all call participants.

U-M Zoom Rooms defaults to AutoFrame view. However, the Smart Gallery experience can be enabled using the Multi-Stream setting on the Zoom Rooms controller or Zoom Rooms for Touch display.

Enabling Smart Gallery

On a Zoom Rooms controller:

  1. Tap Camera Control on the controller interface
  2. Tap Multi-Stream. You will see each conference room camera feed displayed on your Zoom Rooms display

On a Zoom Rooms for Touch display:

  1. Tap More on the display, then tap Settings
  2. Tap Camera, then tap Open Camera Control
  3. Tap Multi-Stream. You will see each conference room camera feed displayed on your Zoom Rooms display

How remote attendees experience Smart Gallery

Remote attendees will be able to access all the available video feeds from the Zoom Room if they are using the Gallery, Side-by-Side, or Thumbnail views, or if they decide to pin a specific video. Each Zoom Room video feed is displayed as a distinct meeting participant with full resolution. These same meeting participants are named after the Zoom Room itself followed by a number: "Conference Room - 2", "Conference Room - 3", etc. 

Note: Remote attendees who use Active Speaker view will only see the main camera feed from the Zoom Room, even if participants from the other Zoom Room camera feeds are speaking.

Additional Information

When you enable Smart Gallery, the Poly system continuously determines multiple regions of interest based on people found in the camera’s view who are currently or have in the past been active speakers. Note that the determination of a region of interest is dependent on: 

  • The ability of the system to automatically detect a face in the camera’s view. Best results are achieved when an individual is directly facing the camera, their face isn’t obstructed by other individuals in the camera’s view, and their face is consistently in the view of the camera.
  • The ability of the system to automatically detect an actively speaking person in the camera’s view. Best results are achieved when an individual is speaking towards the camera.
  • The camera’s resolution, field of view, and distance from the camera to the individual. You achieve the best results when the individual is sitting closer to the camera and isn’t obstructed by other individuals in the camera’s view.

When the system determines a region of interest exists, the region of interest is centered on the person and cropped from the overall camera image as tightly as possible, constrained by the camera’s resolution, field of view, and the person’s distance from the camera. 

The system presents the regions of interest to the Zoom Rooms application on up to two (2) virtual camera(s) based on this priority: 

  • The current or most recent actively speaking person in the camera’s view
  • The previous actively speaking person in the camera’s view
  • If there are no active speakers, or the active speakers leave the camera’s field of view, the virtual camera(s) are stopped

As different individuals in the camera’s view become the current active speaker, the regions of interest change. The Zoom Rooms application always sends the logical camera video to the Zoom meeting. As the virtual cameras appear and disappear, it also starts or stops sending additional video streams to the meeting. 

Adapted from/further reading

Need additional information or assistance? Contact the ITS Service Center.