Videoconferencing - Standards

Technology Standards - What Are They?

A technology standard establishes a uniform way to engineer and develop technical devices.  By standardizing the many different methods and functions in technical systems or devices, interoperability is simplified.  When manufacturers implement these standards in their products, the standards provide a sort of common language or interface that allows for independently-designed systems to work together.  As with human language, the "language" of standards may have different interpretations, occasionally resulting in failed communications between systems that support the same standard.  The groups that write standards need to design them to be broad enough to support a wide range of products, yet also make them clear enough to avoid miscommunication between systems using the same standards.

Technology Standards - Who Determines Them?

Numerous standards-issuing bodies work on the standards that are implemented in telehealth systems, though there are a handful that are referenced more often than others, especially in relation to videoconferencing systems.  These standards groups include:

  • Internet Engineering Task Force (IETF)
  • ITU Telecommunication Standardization Sector (ITU-T)
  • World Wide Web Consortium (W3C)
  • European Telecommunications Standards Institute (ETSI)

What is a protocol?

A protocol is a set of rules that govern an interaction. In networking or video teleconferencing, a protocol is a set of rules or standards that specify the manner in which computers or devices communicate with one another to transfer data.  Protocols spell out the timing and format of transmissions.

What is a protocol stack?

A protocol stack is a layered set of related protocols that allow devices to communicate with one another regardless of their manufacturer. This term can also refer to the software that processes the protocols. Many users have strong opinions about the benefits of various protocol stacks, and each offers its own advantages and disadvantages.  Coexistence issues also surface, for example, when a small clinic uses one protocol stack and needs a consultation from a specialist using another method.

H.323 Protocol Stack

Many video teleconferencing networks use an umbrella set of ITU-T standards, referred to as H.323.  For multi-media streaming over an Internet Protocol (IP) network, H.323 is one protocol stack that standardizes bidirectional audio-visual streaming.  H.323 is comprised of a collection of other standards, including call control and signaling, terminal and gateway communication, and audio, video, and data standards.

H.323 Includes:

  • H.245 (Call Control):  This is the 'handshake' where two H.323 endpoints exchange capability data and mode preferences, including which codec to use.
  • H.255.0 (Call Signaling):  Sets up the connection between the H.323 endpoints after the handshake.
  • RAS (Registration Admission Status): Protocol that administers user authorization, monitoring, & permission levels.
  • RTP and RTCP (Real-Time Transport Protocol and Real-Time Control Protocol): Protocols used to sequence the audio and video data packets.

Audio Standards

  • G.711
  • G.722
  • G.723.1
  • G.728
  • G.729

Video Standards

  • H.261
  • H.262/MPEG-2
  • H.263
  • H.264/MPEG4/AVC
  • AVC (Advanced Video Coding)

Data Standards

  • T.120 (Data protocols for multimedia conferencing)
  • H.239 (Multiple video channels on the same call, e.g. one for video and one for documents)

Session Initiation Protocol Stack (SIP)

SIP is another common protocol stack that was initially designed to channel voice traffic over internet-based telephone networks. Later it was expanded to include video, instant messaging, and application sharing. Market timing and adoption played an important role in the growth of this technology, and many users appreciate the customization and flexibility that SIP provisioning can provide.

Scalable Video Coding

The key to video teleconferencing technology is the digital compression of audio and visual data streams in real time. The hardware or software that compresses and decompresses the data is called a codec (coder/decoder). A large amount of analog data, including color, movement, and sound, is digitized, compressed, and broken into small packets of data. VTC data packets travel over an Internet Protocol (IP) network, where they are reassembled and decoded at the other points in the conference.

Scalable video coding (SVC) is an optional feature in the H.264 standard, also known as Annex G.  SVC has been in use for many years. SVC differs from earlier video codecs in that it allows video to stream at a scaled-down level over connections or devices that won't handle the entire incoming file size of HD video. On the receiving end, the decoding process includes an evaluation of the network speed and device capability, and only those layers of data that will display well are sent through to the endpoint.  The signal is essentially 'dumbed down' to the capability of the receiving device. 

The three scaling variables that the SVC codec uses to customize data stream delivery are temporal (frames per second), spatial (image display size), and quality (resolution/clarity).  Given the unique capabilities of the end device and delivery network, SVC manipulates the flow of data to provide the best possible display that the endpoint can handle by scaling back the time, space, and resolution decisions.

Comparison of H.263, H.264 (without Annex G) and H.265

Video standards H.263 and H.264 (without Annex G) could provide for clear video teleconferencing without delay, freezing screens, dropped connections, or pixilated or choppy images, but was highly dependent on the quality of the hardware in use and the speed of the network.  Then, receiving a video call without the proper set-up could be compared to trying to drink from a fire hose.  The mass of data that was being pushed through the connection would often overwhelm the connection and cause quality of service issues, if not outright dropped calls.  The ability to scale down the signal, reducing the required bandwidth to send video, then algorithmically reconstruct the signal based upon the capabilities of the receiving device, is the true innovation behind implementations of scalable video coding.

The H.265 standard, or Hi Efficiency Video Coding (HEVC), is the successor to H.264.  HEVC provides for even more efficient encoding, cutting the required bandwidth needed to send a video signal by 50%.  This improvement uses improved compression algorithms and faster computing power to reduce required bandwidth.  HEVC has the potential to make the highest-quality video calls ubiquitous, though adoption will be slow until hardware and network upgrades in the market bring the codec mainstream. The technology exists to display high-definition images, but not every clinic, household, or cell phone has a screen capable to display video of this quality.

H.265 vs. VP9

Concurrent with HEVC development is Google's open and royalty-free video compression standard, VP9.  Using the same concepts to reduce necessary bandwidth to stream video, VP9's focus is to improve customer experience when viewing internet-based video, such as video delivered via YouTube. VP9 is incorporated into the latest Chromium, Chrome and Firefox browsers, and uses SVC technology to reduce delays and freeze-ups while watching online video content.  While the average bandwidth savings are quite similar with both H.265 and VP9, the fact that TV and computer screen manufacturers can implement the VP9 technology without paying licensing fees may improve its position in the market.


What if one clinic purchases video equipment that implements one standard, while a remote clinic utilizes equipment based upon a different standard?  Will clinicians be able to teleconference with each other?  This concept of interoperability is a major factor in any deployment of video teleconferencing technology.  By definition, 'standards-based' video teleconferencing equipment will be compatible with other standards-based systems, but proprietary implementations of certain features can make interoperability a problem. Fortunately, connecting disparate systems and devices continues to get easier as manufacturers improve their products.  Unfortunately, issues may still arise when implementing solutions that span networks, technological frameworks, and standards.  TTAC recommends working with a professional videoconferencing specialist when deploying and supporting videoconferencing equipment and thoroughly testing devices as a part of the selection and implementation process.