Video Platforms: Information Technology

Communications infrastructure for video related services

Regardless of the choice to use a dedicated, hosted, or hybrid system, the end user experiences video communication at what is referred to as an endpoint. To capture video, there needs to be at least one camera and one microphone to capture audio. To view video the user will need a screen and to hear audio the user needs speakers. The action of capturing and viewing are separated simply because asynchronous communications are increasing and the use of video in healthcare isn’t limited to live 2-way video anymore. A DSLR camera can be used to capture video although it is rather impractical to use a DSLR camera to receive a video. Captured video can be sent with audio captured from a separate source such as a phone and the response viewed on a desktop later. If the goal is efficiency, asynchronous communication can offer some useful benefits for healthcare delivery.

Certainly, to participate in a live two-way conversation, an endpoint would need all these components. In the past, a camera, microphone, and speakers were considered peripherals but today, in mobile devices such as smart phones and in all-in-one computers, these technologies are come built in. For dedicated and desktop systems, these components are still selected individually and vary significantly in their cost and quality.

Whether wired or wireless, the devices will need to be connected to a network that is able to connect to the other party in the conversation. In many instances these connections will be over the public internet or within a private virtual network. Wireless connections introduce some additional complexities for large network deployments where wireless nodes can get saturated and have connection issues but in ideal cases, do not negatively affect the video .

Dedicated (legacy) systems will have their own operating system and interface but these days, most also have applications and web based interfaces that can be used on personal computing devices. A lot of effort over the past few years has been to achieve interoperability between dedicated systems and solutions that work on mobile and personal devices.

Low Bandwidth situations

In 2015 the FCC updated the definition for what constitutes broadband internet as a minimum of 25 Mbps download and 3 Mbps upload. This is significantly more than is required to support 2 way live video so for the purpose of this toolkit, the term “low bandwidth” is referring to access which is below 2 Mpbs download. In areas with this level of internet access, folks are likely limited to KU band satellite connections or wired connections which are based on KU band satellite backhaul. For these connections, live video can be a challenge for reasons other than just the data rate. Environmental and other issues can result in unreliable connections, dropped signals, and technical challenges beyond the usual.

In Alaska, approximately 60,000 individuals are dependent on satellite connectivity for internet services. While many have access to the newer Jupiter 2 satellite which offers 25 Mbps download and 3 Mbps upload, the latency can be about 1/2 a second. Another option to the Jupiter 2 satellite are Via satellites which cover most of the US including Alaska and Hawaii and there are newer satellites planned for launch in the next few years. If your location doesn’t have access to either of these, you may be dependent on the KU band Horizon 1 satellite which can provide 1-2 Mbps download and limited upload. For these folk and others in with very limited access speeds, this portion of the toolkit if for you.

The minimum data rate suggested to support live video is 384 Kbps which is enough to support 640×480 video using dedicated codecs and equipment running on the H.323 standard and WebRTC. If data is interrupted or slows down into the 250 Kbps range, the experience will be impacted and packet loss will increase. While a little packet loss is tolerable, at some point the video quality suffers to the point that it is difficult to hold a conversation, especially if the latency is already half a second or so.

One option to consider if your data rates are consistently unable to support a live video conversation is to use an asynchronous video method. This approach is much like texting or using Messenger or similar applications but since there isn’t the need for the computer to synchronize a 2 way live stream, the quality of the video itself can be improved. Dr. Peter Yellowlees, the current President of the American Telehealth Association and professor at UC-Davis in California, has been using asynchronous video for his tele-psychiatry practice and recommends it as an effective way to perform a clinical evaluation.

Endpoint specific equipment

Cameras

It would be difficult to find a web-camera on the market or in a big-box-store today that does not provide at least 640 x 480 pixel resolution for video. Consumer grade web-cameras that provide 720p level resolution can usually be found for under $40. More expensive cameras will bring useful features such as larger CMOS sensors, higher , and better ability to capture and manage light. Due to their size, inexpensive cameras require a lot of light to function at their highest capability while cameras with the ability to capture more light do well with rather normal room lighting conditions. Color accuracy is also an important consideration but can vary between cameras. Between the time the light hits the image sensor to the time that the data is sent to the computer, the software on the sensor board will perform millions of individual calculations so it is natural that some variation will exist. For the most accurate color rendition, a color management solution should be considered for services like wound care and dermatology where skin tones, particularly red tones, need to be accurate. This can be as simple as calibrating the monitor or having color calibration swatches visible in the video or still pictures.

Speakers

Depending on the set up, most speakers will suffice for generic video conversations. Certain settings where background noise or where colleagues work nearby may benefit from a headset mounted speaker for privacy concerns or to buffer background .

Microphone

While video is certainly the focus of this toolkit, if you can’t understand what is being said on the other end of the conversation, the communication experience is certainly diminished. While there tends to be an acceptance of spending a lot of money for a video camera, audio is often an area where the investment is lower than it probably should be, and the benefits of a quality microphone are often underappreciated. The best speakers in the world can’t compensate for a poorly captured audio signal and usually, users are unaware how they sound to everyone else in a video session or on an audio call.

Fortunately, over the past 20 years, the technology has significantly improved on the lower end of the price range. Although in an ideal environment such as a sound booth, a web camera microphone or a laptop microphone will suffice, in situations where the acoustics are less desirable, an external microphone can significantly improve the audio quality of the speaker’s voice for the other participants. Some higher end microphones will have a “studio-out” connection for ear-phones much like one would see in a professional studio but if you are using a microphone that has this feature, you’re probably already using a higher end microphone. Most of us have been on a conference call where one participant is either at home or in a moving car with a window rolled down and experience the difficulty with understanding what is being said so be conscious of those types of sounds in your own environment (HVAC fans are notoriously a problem) and consider investing in a good microphone so your patients and colleagues can hear you clearly. In situations where there is only a single person at an endpoint, a directional microphones or lapel mounted can overcome a plethora of obstacles if the laptop or web-cam mounted microphones are insufficient for that setting.

Dedicated video systems were initially designed for large board room installations and will usually come with a microphone designed for the room or cart they are installed. The information and options for microphone technology can be quite an involved topic and consulting an audio engineer on a large room installation would be a good investment. Also if the situation requires more than one microphone, is a room with lots or hard surfaces like wood tables and such or even a smaller room with glass or sheetrock walls, testing the microphone for echo and sound would be a minimum and having an audio consultant help with microphone selection would may be advisable. For some edge cases like ICU monitoring or where there is interest in hearing what a patient is saying, which may only be at the level of a whisper, a microphone mounted on the wall across the room will be hard pressed to capture that level of speech so other options such as secondary microphones that can be place near to the patient may be required. These types of unique cases, if present, should be part of your requirements documents when assessing possible solutions as a feature to select which microphone feed is active may be desired.

Network Topologies

If you choose to use a hosted video service, then your infrastructure requirements will be focused on ensuring that those computers and any dedicated end-points will have enough bandwidth on your internet network to accommodate the amount of network traffic required for the quality of video you are interested in.

When supporting an on premises system, you’ll likely be working with a vendor who should be asking for very detailed questions regarding your needs and existing network so they can select the correct equipment. How the video data is managed on the internal network will depend on many variables.

Either way, and depending on your organizations existing network design, there may be other mission critical uses of that same network that would need to be considered, and if needed, prioritized. Your security infrastructure may need to be customized to allow video traffic from outside the firewall if your system is set up to block video as a precaution.

Larger organizations may prioritize how use of video affects their overall network load while smaller clinics and practices may prioritize simplicity and low-cost. For those using a hosted system, the vendor will dictate the topology of the connection opting for the most efficient way given the state of the network at that time.

As video conferencing equipment has been sold for over two decades, it is beyond the scope of this toolkit to maintain a list of all the different versions of the equipment. Each vendor markets equipment and the role performed under different names. Some common pieces of equipment are:

MCU- Multipoint control unit
SFU – Selective forwarding unit
Router
Switch
Firewall
Gateway
Media server
Network transversal management servers

Network specific equipment

The Multipoint Control Unit (MCU) a.k.a. Bridge

Videoconferencing has increasing become associated with video sessions that can include many people on many different systems, all communicating together seamlessly. The bridge, or MCU, is one of the key pieces of technology that can make these multi-party video calls possible. This tool kit will get into the detail a bit as this investment is probably the most expensive portion of the overall investment and takes on the majority of the processing workload to support the video services infrastructure.

MCUs can support many different functions. Generally, they are devices that allow multiple videoconferencing endpoints to communicate in a conference that includes many people, as opposed to the point-to-point video calls that are usually for two participants. It is possible for some endpoints to support multiple endpoints in a limited manner up to 3 or 4 connections but to support more requires compromise on other factors such as resolution or frame rate.

MCU products are typically licensed based on the number of ports, or live video connections, that can be engaged at once and at what resolution. This may consist of many people in one large videoconference, or several smaller groups of people engaged in different, simultaneous conferences.

Currently in this market, there are vendors that have either a single model or multiple models of videoconferencing bridges available for purchase. Each model varies depending on the number of video ports, or the number of simultaneous calls, that it can support, and the functionality that it offers. Generally, each model is available at a base price, with mandatory and/or optional costs associated with the purchase. There are also options where the MCU (bridge) is integrated directly into the CODEC, or is available as a cloud-based service, eliminating the need to purchase bridge hardware.

Bridge pricing consists of a few components that are similar across the market. Software may be needed; the supporting software costs are usually broken down in one of three ways: you may pay a one-time fee for the software, you may pay a one-time fee for the software with a yearly maintenance fee, or you may have to pay a one-time fee for the software, and buy user license keys for each user in addition to yearly maintenance fees. There may also be additional hardware needed to support the videoconferencing bridge. Some vendors may make it mandatory that you purchase one to three years of technical support, and some may have additional warranty periods available for purchase. You may also need to budget for hardware and software upgrades as part of ongoing product maintenance.

Equipment can be purchased from manufactures, resellers, servicers, and vendors. The pricing that you are quoted may be adjustable based on your project and/or organization. It helps to be able to fully explain your project when you are to the point where you need to do your price comparisons, so the seller can give you the most comprehensive pricing information. There may be discounts available based on volume and organization type. Many vendors offer purchasing of equipment based on current need with projections for expansion in the future.

Additional features, management tools, and network infrastructure supports that may be available include: scheduling packages, tools that allow additional web-based users to participate, redundancy packages, recording services, streaming video services, additional networking hardware and software, hardware or software support for content sharing, additional system management tools, hardware expansion space, maintenance plans, and additional warranty periods.

Capabilities

MCUs, as stated, provide the ability to help people connect in a videoconferencing system. Their exact functions may vary by manufacturer and model. What they do offer is the ability for each endpoint to participate in the communication session while maintaining only one uplink and one downlink to the MCU device. This is significantly more efficient in terms of network load when there are multiple participants in a communication session compared to other solutions.

Graphic showing the network load of MCU vs P2P topology with multiple participants

Video Switching

Video switching is the ability of a videoconferencing bridge to show the active speaker in a larger portion of the video image (or possibly the entire image), with people who are not speaking displayed in smaller squares, if at all. This is often voice-activated, meaning that the current speaker is shown in the main portion of the screen. On some systems, the active speaker may be set at each individual endpoint, or by a central chair that grants and revokes the main video image. Those acting as the video “chairperson” may also perform other functions, such as dropping people from the call or canceling the entire conference.

If switching is not allowed or is not available, another option is to show everyone in what is colloquially called “Hollywood Squares” or “Brady Bunch” style. Essentially, everyone is viewed in a small square, tiled in the video stream. As more people join the conference, the number of squares increases as individual sizes decrease. Some bridges will output this in a 4:3 aspect ratio, which can cause some cropping issues when displayed on a 16:9 “widescreen” monitor.

Selected Communication Mode

In a traditional, point-to-point videoconference, systems would agree on the connection speed. If one person’s system could not support the call, they would possibly have to drop the call and reconnect at a lower speed. MCUs provide different data rates to different endpoints, allowing people connecting at mismatched speeds to communicate without having to reconnect on lower settings. This means that people working on higher-bandwidth lines will not have to be throttled down if a single low-speed person joins the call.

Selected Communication Mode also allows an MCU to establish a maximum and minimum call speed. If an organization determines that it is only willing to perform video consults over high-speed connections, they can design a system that will not support low-performing endpoints to connect. Additionally, network resources can be saved from excessive bandwidth by limiting how much is allowed to be used by a conference.

Audio

Audio signals can be mixed in a videoconference, allowing multiple people to speak at the same time. While this can be a distraction, as with face-to-face conversation, it allows people to interject while a person is talking without having to wait for them to finish.

Additionally, people have the ability to call into a videoconference by phone, which allows them to engage in a conversation even if they do not have access to videoconferencing equipment. This feature can require additional equipment such as a gateway that connects to the telephone system.

Data Transmission

MCUs may have the ability to stream serialized data on a separate data channel, allowing devices such as serial electronic stethoscopes to send data to another endpoint. This process has not yet been standardized, and many MCUs will handle this differently. The TTAC has seen endpoints from one manufacturer unable to send serial data between their own endpoints via an MCU, yet the same MCU could send the data between two endpoints of a competitor. This feature is increasingly available with software independent of an MCU but is a feature of some MCU’s as well.

Security

Bridges can provide secure connections, to the degree that they may disallow anyone who does not support encryption from participating in a call. This feature is also available in many software based systems and not dependent on an MCU but for older designated endpoints, the MCU was required to prevent interruptions or to require certain parameters like encryption be met in order to participate.

Cascading

When multiple bridges are connected, they are described as being “cascaded”. This means that a bridged call from Organization A can join in a bridged call from Organization B, and all parties may be connected. There is a limit of three MCUs between endpoints, however. This means that Organization A could be cascaded with Organization B, and Organization C could be cascaded with Organization B, effectively bridging the call between A and C. It would not be possible for Organization D to join to Organization C and communicate in the bridged call with Organization A. Note that this is not necessarily a common problem to face when first entering the videoconferencing arena, but may be important to consider, depending on how various partner networks are connected.

Selective Forwarding Unit (SFU)

This term can either refer to a dedicated device, a server or simply be a reference to a technology used in video communications. The SFU receives input from all participants in all their various resolutions and data rates and then selects how to route that data to the rest of the participants. By design, the SFU passes off the CPU intensive tasks, if needed, to the endpoint device and by doing so, an SFU is a much less expensive and more cost-efficient scalable option than an MCU. Another benefit is that there is very low delay as this approach removes the transcoding step.

At first, an SFU description may sound like what the MCU does but the MCU takes in all the streams and combines them to send a single stream (usually) back to each participant while an SFU will simply direct the multiple incoming streams back to each participant. The real differences and descriptions can get technically quickly but in an SFU’s most simple application, all the users are using the same codecs and as a result, no transcoding (from one codec to another) needs to be performed which is a CPU intensive task. The basic philosophy of the SFU is to rely on the edge unit (the endpoint) to have enough CPU power and sophisticated browser or similar application to deal with the complexity.

An MCU does the opposite by taking all the streams and combining them into one. If any of the streams used a different codec, bitrates, or other variants, then the MCU takes on the CPU intensive task of transcoding between the two codecs and adjusting for the other variables. This step can also introduce some delay which may be noticeable.

Large companies and commercial services rely heavily on an SFU implementations but there is still a need for MCU typical services such as transcoding or when required to connect and integrate with legacy or older systems. In many instances, both SFU and MCU technologies are leveraged where needed to achieve the most economical and efficient experience.

Gatekeeper

The functions performed by a gatekeeper will vary by model, but can generally be broken down into a handful of categories – address translation, admission control, bandwidth control, user authentication, and zone management. Address translations allow an endpoint to be given a more user-friendly alias, with internal addressing and routing handled by the gatekeeper whenever another endpoint tries to dial the alias. Admission control and bandwidth control limit how many simultaneous calls can take place, and manage bandwidth allocation. Zones are associated with a single gatekeeper; zone management can help control devices registered in a single zone, as well as how the endpoints communicate with other zones.

Gateway

Even though standardization has been widely adopted by videoconferencing manufacturers, getting devices from different manufacturers to communicate with one another can be a challenge. Standards have changed, new standards have been introduced, and optional features of some standards have been implemented in different ways by different manufacturers. Gateways help manage these communication difficulties, connecting and translating (transcoding) between endpoints, MCUs, and other network devices.

Proxies

Videoconferencing often takes place across the boundaries of at least one network, with connections needing to occur between different organizations or between internet-based endpoints and a single organization. There are certain challenges inherent in connecting these various networks. Proxies are designed to help mitigate these problems, sitting on the “edge” of a network and managing communication with endpoints, gateways, and gatekeepers.

NAT Traversal and Firewalls

Firewalls, which are an important part of organizational network infrastructures, can provide several challenges for videoconferencing. They are capable of restricting what traffic is allowed into and out of a network, and may perform Network Address Translations (NAT), which effectively obfuscates the exact IP address of computers within a network. As endpoints often need to communicate through firewalls, NAT traversal techniques and standards need to be implemented.

The most common ITU-T standard for NAT traversal is H.460. Methods for traversal include Session Traversal Utilities for NAT (STUN), Traversal Using Relay NAT (TURN), and Interactive Connectivity Establishment.