Dedicated (legacy) videoconferencing systems rely on a large collection of standards for everything from displaying video to establishing connectivity with endpoints across the globe. The clear majority of features implemented by videoconferencing systems have a corresponding standard however there are newer solutions on the market where the standards are still in formation. Security standards are consistently being updated to respond to new threats. Not all manufacturers support all the standards in the same way, resulting in some proprietary implementations of features that can make interoperability a problem.
One of the main standards-issuing bodies in this field is the International Telecommunication Union, which releases standards in the ITU Telecommunications Standards Sector (ITU-T). Many of the standards, including the G. and H. standards, are created by this group. Another group involved was the Moving Picture Experts Group (MPEG) formed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). In 2010, these organizations established the Joint Collaborative Team on Video coding to work on the next video coding standards referred to as H.265 MPEG-H Part 2 although commonly referred to by just H.265.
H.26x standards utilize technologies covered by patents and may require licensing from those patent holders. In response, there has been a push for a royalty-free alternative video coding standard. Currently a standard called VP9 is seeing increased adoption with the gaining popularity of WebRTC solutions. In 2015, the Alliance for Open Media was created with the goal of supporting this effort.
http://aomedia.org/
Video Standards
Video standards address the compression and playback of video data. The standards are used in many areas outside of videoconferencing, with applications in many multimedia systems online and on computers. Compression of video has continually developed since the 1990’s with the standards represented by ITU-T numbered standards. H.261 was and early one followed by H.323 in the late 1990’s, and then H.264 which is the prominent standard in use today although for web-based video technology, VP9 is rapidly increasing in use. Most dedicated video systems supporting the full H.323 standards and web-based interoperability will support all these codec standards.
The H.264 includes a series of “annexes” that provide additional features or functions to the video standard. One relevant example is something in Annex G called Scalable Video Coding, or SVG. SVG allows for separate detail layers, or sub-bitstreams, to be created. These can be translated into a smoother video image on networks with connectivity issues.
Annexes that are not universally supported provide a challenge in the videoconferencing world, as the benefits of the annexes will not be seen in systems that have not implemented them. Similarly, as the annexes may be implemented differently between manufacturers, there is no guarantee that the features will work between systems without use of other intermediary devices.
Resolution
Also related to video standards are a set of standards for the resolution of video images. A variety of terms are used to describe how many pixels there are in a video image, many of which are seen in areas beyond videoconferencing. High definition and standard definition are the terms commonly used, but additional phrases such as high resolution, VGA, 720i, and are often used as well. Some of the terms are very clearly defined, while others are defined more broadly.
High definition refers to any video signal that is a higher resolution than standard definition video, though this is often specified as vertical pixel counts of 720 and 1080, with respective horizontal pixel counts of 1280 and 1920. Some additional terms are often attached to these resolutions – progressive or interlaced. Both of these refer to the method in which the image is sampled from the imaging sensor.
A progressively scanned image updates the entire image at once, meaning that a paused video image will show a complete, clear picture. Interlaced scanning samples the image in horizontal bands, alternating between even and odd pictures. This means that on frame one of a video image, all odd lines will be captured, while the next frame will capture all even lines. This means that a paused video image will display alternating bands of video from the most current frame and the preceding frame.
Standard definition covers video signals that are either in the PAL or NTSC standards. NTSC is the standard used in the United States, and covers a resolution of either 640 x 480 pixels. There are many other resolutions and associated acronyms that relate to video signals, including 400p, 448p, SIF, CIF, 4SIF, 4CIF, QSIF, QCIF, VGA, XGA, or QVGA.
Despite the increased interest in high-definition video, standard definition video at 640×480 pixel resolution still has a place in telemedicine as it requires less bandwidth and is still the format of choice for many medical imaging devices that connect to videoconferencing endpoints.
Audio
Audio codecs used in video are lossy in that the compression first eliminates those signals above or below the human range of hearing and then discarding additional sound data to achieve the high level of compression that they offer (80%-90+% reduction ratio). If a user intended to capture audio for signal-processing that incorporates signals outside of the range of human hearing, then a lossless audio codec should be considered. There are several on the market which are free to use.
H.323 dedicated systems
The G.nnn standards (such as G.711, G.722, and G.729) are the audio standards used in video conferencing for H.323 systems. Support for G.711 is mandated by the H.323 standard (which means that all H.323-compliant systems must support it). Non-H.323 complaint systems or those that integrate with H.323 may utilize their own audio codec (such as VP9 leveraging the Opus audio codec).
Other G.nnn standards are included in annexes to the H.323 standard, providing various sampling frequencies and bandwidth optimizations. G.722 provides an increase in the sampling rate of the incoming audio, which increases fidelity at the cost of requiring additional bandwidth. G.729 requires less bandwidth, as it is optimized for frequencies associated with speech. There are some questions as to how the G.729 standard may impact transmission of certain sounds associated with telestethoscopy over the standard audio-input and microphone lines (note that this should not impact the use of serial stethoscopes that do not send audio data through an audio input on the endpoint).
VP9 solutions
VP9 leverages the Opus audio codec which is a royalty-free and open-source code standardized by the Internet Engineering Task-Force as RFC 6716.
https://tools.ietf.org/html/rfc6716
Regardless of which audio codec is used, the primary impact on the user experience for audio is in the choice of microphone, the speakers, and the ambient sounds in the room. Notably HVAC systems, fans, and other sounds that aren’t typically noticed by those in the room can be amplified by a poor choice of microphone and result in a distracting noise by others in the session. You’ve probably experienced a conference call where someone was calling in from a moving car with the background wind and traffic creating a distraction.
Multimedia Call Control Standards – Signaling
Signaling represents the activity of coordinating communication between two parties. This process handles the passing of messages to open or close communications, key data if there are security requirements, checking which codec and media types can be used, network information, and error handling. There are several ways to perform this activity and not all systems do it the same way.
Historically, there were two primary standards used for initiating a call between videoconferencing systems – H.323 and Session Initiation Protocol (SIP) which were both initiated in 1996. AT this time, SIP and H.323 are not interoperable standards, though calls can occur between systems using the different standards if a gateway infrastructure is in place. Since their creation, many articles have been written comparing the two.
Addressing, or defining a unique identifier for the endpoints, is done differently in SIP and H.323. SIP addresses are formatted in a Uniform Resource Identifier format (e.g., my.username@organization.com, 18005551234@organization.com, or jsmith@192.168.1.100). H.323 addresses can be either IP addresses (e.g., 192.168.1.100 or organization.com), H.323 aliases (e.g., user@192.168.1.00 or user@organization.com) or E.164 addresses (e.g., 18005551234).
With the popularity of WebRTC, which doesn’t have a signaling component to it, users wanting to incorporate a robust WebRTC solution will need to provide a signaling solution themselves, of which there are several on the market.
Content Sharing
Sharing content from a computer is managed through the H.239 standard. This is an ITU-T recommendation for the H.323 standard, supporting the streaming of a still image (typically taken from a VGA-type input) to another endpoint. Systems that only use SIP will display the content as a part of the main video, while systems that use H.323 will have the content sent to another screen, if available. Some calls that are signaled and managed via SIP may use H.323 as a part of their infrastructure, thereby allowing content to be shared.
WebRTC solutions and online video will often incorporate some type of screen sharing or co-browsing type of experience as one way to share data and connect users. Screen sharing allows one user to observe a portion of or the entire screen being shared by another user which may include the image or other media wanting to be shared. Co-browsing is different in that the solution sends commands to another users computer to open up a browser and interact with it in real-time. There are several ways to achieve this functionality wit numerous offerings in the market.
Data Transfer
Some medical devices, such as electronic stethoscopes, still send data through a serial output while others have upgraded to USB connections. Some videoconferencing manufacturers support the transmission of this serial data in a separate channel during videoconferencing, allowing for the serial data to be sent alongside standard video and audio content.
It is important to note that at this time vendors do not send serial data in a standardized, agreed-upon way, which results in a lack of interoperability between manufacturers.
In the course of TTAC evaluations, the following was noted:
1) It was not possible to send serial data between two different manufacturers’ endpoints
2) It was not possible to send content via a single manufacturer’s Multipoint Control Unit (MCU) and their respective endpoints in a bridged call, BUT
3) It was possible to send content via that particular manufacturer’s MCU and a different manufacturer’s endpoints
The recommendation is that if you plan on using a device that requires use of a serial connection, you should certainly test that particular device in any assessment of new or upgraded technology implementations.
Security
Both H.323, SIP-controlled calls, and WebRTC based sessions can support encryption.
For H.323 and SIP based services, encryption is typically done with an implementation of the Advanced Encryption Standard (AES), with some options for signaling encryption using Secure Sockets Layer (SSL) protocol. H.323 calls support the H.235 standard, which details requirements for encryption and integrity, and is used in conjunction with the H.245 standard that handles many issues, including authentication. It is important to ensure that endpoints are properly configured; some systems may drop encryption if the connecting site does not support it, while others may disallow connections from or to unencrypted systems.
WebRTC based services utilizes the Datagram Transport Layer Security (DTLS) standard protocol (RFCs 6347, 5238, 6983, and 5764) and is a requirement of WebRTC. Note that this encryption is for the application data. WebRTC does not specify the signaling method so if you desire to encrypt the signal layer as well, that would be done separately.
Security may be implemented with a variety of protocols, including Secure Sockets Layer (SSL), Transport Layer Security (TLS), Secure Real-Time Transport Protocol (SRTP), Secure Shell (SSH), and Internet Protocol Security (IPSec).