Videoconferencing Bridges - Technology Overview

Many people consider telemedicine to be synonymous with video teleconferencing, or VTC.  Indeed, VTC has been used in an incredibly diverse range of clinical settings for decades.  Many states and insurance providers will only reimburse for telemedicine if consultations take place over a videoconference, further reinforcing the videoconferencing position as a fundamental part of the telemedicine landscape.

Videoconferencing can potentially involve many complex, interconnected parts.  Many people who use VTC products are not fully aware of what happens behind the scenes, instead looking at videoconferencing as a camera and monitor.  This portion of the toolkit looks at the network bridges that connect two or more VTC systems together.  Additional information on VTC endpoints and software systems can be found in the Videoconferencing Endpoints and Desktop Video Applications toolkits.

What is a Bridge, or Multipoint Control Unit (MCU)?

Videoconferencing has increasing become associated with video sessions that can include many people on many different systems, all communicating together seamlessly.  The bridge, or MCU, is one of the key pieces of technology that can make these multi-party video calls possible.

MCUs can support many different functions.  Generally, they are devices that allow multiple videoconferencing endpoints to communicate in a conference that includes three or more people, as opposed to the point-to-point video calls that are limited to two participants. It is possible for some endpoints to provide this bridging functionality in a limited manner.

The products are typically licensed with a set number of ports, or live video connections, that can be engaged at once.  This may consist of many people in one large videoconference, or several smaller groups of people engaged in different, simultaneous conferences.


MCUs, as stated, provide the ability to help people connect in a videoconferencing system.  Their exact functions may vary by manufacturer and model.

Video Switching

Video switching is the ability of a videoconferencing bridge to show the active speaker in a larger portion of the video image (or possibly the entire image), with people who are not speaking displayed in smaller squares, if at all.  This is often voice-activated, meaning that the current speaker is shown in the main portion of the screen.  On some systems, the active speaker may be set at each individual endpoint, or by a central chair that grants and revokes the main video image.  Those acting as the video “chairperson” may also perform other functions, such as dropping people from the call or canceling the entire conference.

If switching is not allowed or is not available, another option is to show everyone in what is colloquially called “Hollywood Squares” or “Brady Bunch” style.  Essentially, everyone is viewed in a small square, tiled in the video stream.  As more people join the conference, the number of squares increases as individual sizes decrease.  Some bridges will output this in a 4:3 aspect ratio, which can cause some cropping issues when displayed on a 16:9 “widescreen” monitor.

Identifiers – Rooms and People

MCUs can provide two potentially significant features related to what are best described as “identifiers” – rooms and people. Rooms- can be incredibly useful for managing video calls.  As opposed to knowing which endpoint needs to be called, or which alias to reach, all parties can agree to call into a virtual “room”.  This can be especially useful for people or clinicans “on-call”. 

For example, let us consider a rehabilitation clinic with 5 physicians who have all agreed to provide telehealth services.  They decide that one of them will be available on a rotating basis, each taking one day of the week to be “on call” from their own offices.  Remote clinics trying to reach a doctor would be frustrated if they had to call five doctors to find the one who was providing the services each day.  By establishing this virtual room, the remote clinic could call into the room, the on-call doctor could call into the room, and they would be able to seamlessly connect.

The other identifier, which identifies specific people, allows participants in a multi-party call to see a textual identifier for each person, allowing you to see the name of the person speaking.

Selected Communication Mode

In a traditional, point-to-point videoconference, systems would agree on the connection speed.  If one person’s system could not support the call, they would possibly have to drop the call and reconnect at a lower speed.  MCUs provide different data rates to different endpoints, allowing people connecting at mismatched speeds to communicate without having to reconnect on lower settings.  This means that people working on higher-bandwidth lines will not have to be throttled down if a single low-speed person joins the call.

Selected Communication Mode also allows an MCU to establish a maximum and minimum call speed. If an organization determines that it is only willing to perform video consults over high-speed connections, they can design a system that will not support low-performing endpoints to connect.  Additionally, network resources can be saved from excessive bandwidth by limiting how much is allowed to be used by a conference.


Audio signals can be mixed in a videoconference, allowing multiple people to speak at the same time.  While this can be a distraction, as with face-to-face conversation, it allows people to interject while a person is talking without having to wait for them to finish.

Additionally, people have the ability to call into a videoconference by phone, which allows them to engage in a conversation even if they do not have access to videoconferencing equipment.

Data Transmission

MCUs may have the ability to stream serialized data on a separate data channel, allowing devices such as serial electronic stethoscopes to send data to another endpoint.  This process has not yet been standardized, and many MCUs will handle this differently.  The TTAC has seen endpoints from one manufacturer unable to send serial data between their own endpoints via an MCU, yet the same MCU could send the data between two endpoints of a competitor.


Bridges can provide secure connections, to the degree that they may disallow anyone who does not support encryption from participating in a call.


When multiple bridges are connected, they are described as being “cascaded”.  This means that a bridged call from Organization A can join in a bridged call from Organization B, and all parties may be connected.  There is a limit of three MCUs between endpoints, however.  This means that Organization A could be cascaded with Organization B, and Organization C could be cascaded with Organization B, effectively bridging the call between A and C.  It would not be possible for Organization D to join to Organization C and communicate in the bridged call with Organization A.  Note that this is not necessarily a common problem to face when first entering the videoconferencing arena, but may be important to consider, depending on how various partner networks are connected.

The Standards

Videoconferencing systems rely on a large collection of standards for everything from displaying video to establishing connectivity with endpoints across the globe.  The vast majority of features implemented by videoconferencing systems have a corresponding standard.  Not all manufacturers support all of the standards in the same way, resulting in some proprietary implementations of features that can make interoperability a problem.

One of the main standards-issuing bodies in this field is the International Telecommunication Union, which releases standards in the ITU Telecommunications Standards Sector (ITU-T).  Many of the standards, including the G. and H. standards, are created by this group.

Video Standards

Video standards address the compression and playback of video data.  The standards are used in many areas outside of videoconferencing, with applications in many multimedia systems online and on computers.  The two prominent video standards, H.263 and H.264, are widely adopted and used throughout videoconferencing systems.  Older standards such as H.261 are not used as frequently.

The H.264 includes a series of “annexes” that provide additional features or functions to the video standard. One relevant example is something in Annex G called Scalable Video Coding, or SVG.  SVG allows for separate detail layers, or sub-bitstreams, to be created.  These can be translated into a smoother video image on networks with connectivity issues.

Annexes that are not universally supported provide a particular challenge in the videoconferencing world, as the benefits of the annexes will not be seen in systems that have not implemented them.  Similarly, as the annexes may be implemented differently between manufacturers, there is no guarantee that the features will work between systems without use of other intermediary devices.


Also related to video standards are a set of standards for the resolution of video images.  A variety of terms are used to describe how many pixels there are in a video image, many of which are seen in areas beyond videoconferencing.  High definition and standard definition are the terms commonly used, but additional phrases such as high resolution, VGA, 720i, and 1080p are often used as well.  Some of the terms are very clearly defined, while others are defined more broadly.

High definition refers to any video signal that is a higher resolution than standard definition video, though this is often specified as vertical pixel counts of 720 and 1080, with respective horizontal pixel counts of 1280 and 1920.  Some additional terms are often attached to these resolutions – progressive or interlaced.  Both of these refer to the method in which the image is sampled from the imaging sensor. 

A progressively scanned image updates the entire image at once, meaning that a paused video image will show a complete, clear picture.  Interlaced scanning samples the image in horizontal bands, alternating between even and odd pictures.  This means that on frame one of a video image, all odd lines will be captured, while the next frame will capture all even lines.  This means that a paused video image will display alternating bands of video from the most current frame and the preceding frame.

Standard definition covers video signals that are either in the PAL or NTSC standards.  NTSC is the standard used in the United States, and covers a resolution of either 640 x 480 pixels.  There are many other resolutions and associated acronyms that relate to video signals, including 400p, 448p, SIF, CIF, 4SIF, 4CIF, QSIF, QCIF, VGA, XGA, or QVGA. 

Despite the increased interest in high-definition video, standard definition video still a place in telemedicine, as it requires less bandwidth and is still the format of choice for many medical imaging devices that connect to videoconferencing endpoints.


The G.nnn standards (such as G.711, G.722, and G.729) are the audio standards used in video conferencing.  Support for G.711 is mandated by the H.323 standard (which means that all H.323-compliant systems must support it). 

Other G.nnn standards are included in annexes to the H.323 standard, providing various sampling frequencies and bandwidth optimizations.  G.722 provides an increase in the sampling rate of the incoming audio, which increases fidelity at the cost of requiring additional bandwidth.  G.729 requires less bandwidth, as it is optimized for frequencies associated with speech.  There are some questions as to how the G.729 standard may impact transmission of certain sounds associated with telestethoscopy over the standard audio-input and microphone lines (note that this should not impact the use of serial stethoscopes that do not send audio data through an audio input on the endpoint).

Multimedia Call Control Standards

There are two primary standards used for initiating a call between videoconferencing systems – H.323 and Session Initiation Protocol (SIP).  These standards are designed to manage call signaling, call control, and media streaming.  At this time, SIP and H.323 are not interoperable standards, though calls can occur between systems using the different standards if a gateway infrastructure is in place.

Addressing, or defining a unique identifier for the endpoints, is done differently in SIP and H.323.  SIP addresses are formatted in a Uniform Resource Identifier format (e.g.,,, or jsmith@  H.323 addresses can be either IP addresses (e.g., or, H.323 aliases (e.g., user@ or or E.164 addresses (e.g., 18005551234).

Content Sharing

Sharing content from a computer is managed through the H.239 standard.  This is an ITU-T recommendation for the H.323 standard, supporting the streaming of a still image (typically taken from a VGA-type input) to another endpoint.  Systems that only use SIP will display the content as a part of the main video, while systems that use H.323 will have the content sent to another screen, if available.  Some calls that are signaled and managed via SIP may use H.323 as a part of their infrastructure, thereby allowing content to be shared.

Serial Data Transfer

Some medical devices, such as electronic stethoscopes, may be capable of providing a serial output.  Some videoconferencing manufacturers support the transmission of this serial data in a separate channel in the course of videoconferencing, allowing for the serial data to be sent alongside standard video and audio content.

It is important to note that at this time vendors do not send serial data in a standardized, agreed-upon way, which results in a lack of interoperability between manufacturers.  In the course of TTAC evaluations, the following was noted:
1) It was not possible to send serial data between two different manufacturers’ endpoints
2) It was not possible to send content via a single manufacturer’s Multipoint Control Unit (MCU) and their respective endpoints in a bridged call, BUT
3) It was possible to send content via that particular manufacturer’s MCU and a different manufacturer’s endpoints


Both H.323- and SIP-controlled calls can support encryption.  Encryption is typically done with an implementation of the Advanced Encryption Standard (AES), with some options for signaling encryption using Secure Sockets Layer (SSL) protocol.  H.323 calls support the H.235 standard, which details requirements for encryption and integrity, and is used in conjunction with the H.245 standard that handles many issues, including authentication.  It is important to ensure that endpoints are properly configured; some systems may drop encryption if the connecting site does not support it, while others may disallow connections from or to unencrypted systems.

Security may be implemented with a variety of protocols, including Secure Sockets Layer (SSL), Transport Layer Security (TLS), Secure Real-Time Transport Protocol (SRTP), Secure Shell (SSH), and Internet Protocol Security (IPSec). 

NAT Traversal and Firewalls

Firewalls, which are an important part of organizational network infrastructures, can provide several challenges for videoconferencing.  They are capable of restricting what traffic is allowed into and out of a network, and may perform Network Address Translations (NAT), which effectively obfuscates the exact IP address of computers within a network.  As endpoints often need to communicate through firewalls, NAT traversal techniques and standards need to be implemented.

The most common ITU-T standard for NAT traversal is H.460.  Methods for traversal include Session Traversal Utilities for NAT (STUN), Traversal Using Relay NAT (TURN), and Interactive Connectivity Establishment.


While this toolkit focuses largely on the endpoints used in videoconferencing, it is important to consider the other hardware that is often associated with videoconferencing.  This additional hardware is often mandatory for any serious videoconferencing deployment.  If managing this infrastructure within an organization is not feasible, options do exist for third-party hosting and support.


The functions performed by a gatekeeper will vary by model, but can generally be broken down into a handful of categories – address translation, admission control, bandwidth control, user authentication, and zone management.  Address translations allow an endpoint to be given a more user-friendly alias, with internal addressing and routing handled by the gatekeeper whenever another endpoint tries to dial the alias.  Admission control and bandwidth control limit how many simultaneous calls can take place, and manage bandwidth allocation.  Zones are associated with a single gatekeeper; zone management can help control devices registered in a single zone, as well as how the endpoints communicate with other zones.

Multipoint Conferencing Unit (MCU)

The MCU connects multiple endpoints in a single call.  Also referred to as “bridges”, these devices are typically required to allow more than two endpoints to join in a conference call.  Note that some manufacturers do provide limited bridging capabilities in their endpoints, which can allow for a handful of simultaneous connections if one of the endpoints supports this functionality.


Despite the fact that standardization has been widely adopted by videoconferencing manufacturers, getting devices from different manufacturers to communicate with one another can be a challenge.  Standards have changed, new standards have been introduced, and optional features of some standards have been implemented in different ways by different manufacturers.  Gateways help manage these communication difficulties, connecting and translating (transcoding) between endpoints, MCUs, and other network devices.


Videoconferencing often takes place across the boundaries of at least one network, with connections needing to occur between different organizations or between internet-based endpoints and a single organization.  There are certain challenges inherent in connecting these various networks.  Proxies are designed to help mitigate these problems, sitting on the “edge” of a network and managing communication with endpoints, gateways, and gatekeepers.


Videoconferencing technology encompasses an enormous range of devices, standards, and possible challenges.  While standardization has done a lot to improve the ease with which networks and systems can be connected and managed, there still exists a sizeable difference between exactly how the pieces all fit together. 

Videoconferencing endpoints often need to work within these larger networks and systems.  While it may seem daunting to consider implementing an enterprise-wide videoconferencing system, these additional elements can help improve user experience, ease management issues, and ensure the ongoing success of a videoconferencing deployment.