Videoconferencing – Testing Process

Testing videoconferencing equipment can quickly become an unwieldy process. Numerous manufacturers have developed diverse product lines that utilize a range of technological standards, all of which may meet an organization’s needs to varying degrees. TTAC’s perspective is that testing should serve a clear goal and address user needs. As such, we recommend that the process described below be modified (and likely simplified) to meet your own organizational needs.

This document consists of the following sections:

Description of current testing goals and user needs

Description of the testing environment

Summary of the testing process

Discussion of testing: assumptions and rationale

Testing Goals and User Needs

The goal of this testing effort is to provide clear, usable data for our readers that demonstrate how various network constraints and endpoint platforms impact call quality. As the many testing permutations could be overwhelming if TTAC assessed every form and function of VTC platforms, the focus instead is on network bandwidth, packet loss, jitter and latency, software product, and a subset of supported standards.

TTAC presumes that readers’ needs are more complex than a simple “good product / bad product” comparison of various vendors’ wares, as the exact requirements of each organization will vary in such a way as to render an assessment painted in artificially-narrow strokes rather irrelevant. Instead, by addressing the issues raised in the aforementioned goals, TTAC hopes to simplify the readers’ own processes of product assessment by providing data on VTC performance across various networks and platforms.

Users, here considered the toolkit readers, will have more complex needs when executing their own assessments. As such, the following resources will serve as both a model of testing processes and a dataset to help clarify what to focus on in future, individual assessments by other organizations.

Description of the Testing Environment

Given the goals of this particular test, and the presumed information-driven needs of the users, TTAC focused on four primary areas when constructing the testing environment – network emulation, computer platform, product variety, and data collection.

Network Emulation

Emulation of a network–intentionally producing various issues such as latency, bandwidth, and interruptions in the sending of data–is a key part of the testing process for this current assessment. Many organizations will not be interested in emulating a network, opting instead to perform testing on their own real-world networks, though they may be well-served to reconsider the utility of a network emulator if they will be deploying videoconferencing equipment to a wide number of endpoints with significant variations in connectivity. As a key goal of TTAC’s assessment is to explore the impact of various network problems, it was critical that network performance be systematically controlled.

A variety of options exist when “emulating” a network connection. Some products are self-contained appliances that purpose-built for this task. They typically have polished interfaces, support some degree of scripting or automating of emulation functions, and are designed to accurately model a range of network problems. Their primary downside is their cost, which can range from several thousand to tens-of-thousands of dollars.

TTAC opted instead to utilize a software-based solution by installing the Linux-based WanEm product on a VMWare virtual machine on a Dell Optiplex 360 that had been installed with two Gigabit PCI network interface cards (NICs). As the software was free and the PC was already on-hand, the only financial outlay for this configuration was to purchase two NICs at a cost of approximately $30 each.

As seen in the diagram below, the network emulator was placed between the VTC endpoints, allowing the tests to be “throttled down” as needed. This allowed for creation of networks that operated on bandwidth from 128 Kbps to 10 Mbps, latency similar to that found in satellite connections, packet loss as might exist on overloaded networks and poor wireless connections, and jitter that may be introduced by a variety of causes.

It is important to note that the above diagram includes a connection to the public internet. This decision was motivated by the fact that several videoconferencing vendors offered products that were only available via the internet. TTAC decided to utilize an internet connection as a component for all tests in order to reduce the possibility that variations in the network configuration caused variations in the test results across products. This decision to use a public internet connection introduces room for some variability within TTAC test results and also introduced a limititation in the connection speed that could be utilized during the tests. Further testing is likely when a higher-bandwidth connection is established within TTAC labs.

Computer Hardware

Videoconferencing systems increasingly offer software-based endpoints for their products. As such, the focus of this evaluation was on the performance of these products with standard laptops and household routers. The equipment used for the tests included:

-Dell Latitude E6420, wired connection – Intel i5 CPU @ 2.50 / 2.50 GHz, 64-bit Windows 7 OS, 8 GB RAM

-Dell Latitue E7240, wired connection – Intel i7 CPU @ 2.10 / 2.70 GHz, 64-bit Windows 7 OS, 8 GB RAM

-ASUS AC66U Wireless Router (used with a wired connection to the Dell Latitude E7240)

Product Variety

As indicated above, the number of software and hardware products available on the market renders the prospect of testing every product permutation unfeasible in the current test set. The products that will be assessed, chosen as representative of H.264, SVC, and consumer-oriented products consist of the following.

Polycom CMA Desktop implements the H.264 standard for video encoding, but does not include support for Scaleable Video Coding (SVC). Polycom CMA Desktop utilizes the H.323 standard for establishing calls, which enables it to communicate with other H.323 videoconferencing software within an organization.

Vidyo Desktop implements the H.264 standard for video encoding, and includes support for Scaleable Video Coding (SVC). Vidyo Desktop utilizes the H.323 standard for establishing calls, which enables it to communicate with other H.323 videoconferencing software and hardware within an organization. Note that the benefits of SVC are not seen when connecting to non-SVC clients and endpoints.

Zoom implements the H.264 standard for video encoding, and includes support for Scaleable Video Coding (SVC). Zoom supports the H.323 standard for establishing calls for an extra fee, which enables it to communicate with other H.323 videoconferencing software and hardware within an organization. Note that the benefits of SVC are not seen when connecting to non-SVC clients and endpoints.

Skype implements the H.264 standard for video encoding, but does not include support for Scaleable Video Coding (SVC). Skype utilizes proprietary protocols for establishing calls, which renders it unable to communicate with other H.323 videoconferencing software and hardware within an organization. Note that the benefits of SVC are not seen when connecting to non-SVC clients and endpoints.

Data Collection

The tests could have been arranged to gather data in numerous ways. As qualitative assessments about the general experience failed to paint a clear picture of the user experience, TTAC opted to record the output from the Dell laptop that was on the test network. The HMDI output of the laptop (1080i, 29.96 fps) was run through the HDMI input on the Odyssey 7Q recorder, which captured the audio coming from the far site in the VTC call and the video as perceived by the laptop on the test network. An additional audio recorder was set up on an Apple MacBook Air with a Jabra PHS001U microphone and QuickTime so as to capture the audio from the near side of the test. The audio and video were later synchronized in Windows Movie Maker.

Summary of the Testing Process

A standard script was utilized for all tests. The two testers, in this case Donna Bain and Garret Spargo, would establish a call through the network emulator (see the network diagram, above), then proceed through the following steps:

[G]: Start the audio recorder
[G]: Announce that audio was now being recorded
[G]: Start the video recorder
[G]: Announce that video was now being recorded
[G]: Clap loudly (used to synchronize audio and video in post production)
[G]: Request that Donna start the timer on her phone, which was displayed on the recording
[G]: Describe the current configuration of the network and software (to allow for matching results to footage and audio in case of an error when synchronizing footage and reviewing results)
[G]: Announce the beginning of the test
[G]: State the first five letters of the alphabet
[D]: State the next five letters of the alphabet
[G]: State the next five letters of the alphabet
[D]: State the next five letters of the alphabet
[G]: State the next three letters of the alphabet
[D]: State the last three letters of the alphabet
[G]: State the phrase “Now I know my ABCs”
[D]: State the phrase “Next time won’t you wing with me?”
[G]: Request that the test addresses moderate movement on the screen
[D]: Announce that moderate movement is being tested while waving left arm
[D]: Stop waving arm
[D]: Announce that moderate movement testing is completed
[G]: Request that the test addresses extensive movement on the screen
[D]: Announce that extensive movement is being tested while waving both arms
[D]: Stop waving arms
[D]: Announce that extensive movement testing is completed
[G]: Announce the conclusion of the test

Various testing behaviors were selected for different reasons, each with a mind to the problems that were predicted when testing over the different network connections with each software application. The script was used to highlight:

The timer was a consistently visible element that would clearly show if there were problems with the video
The timer was constantly moving; codec performance on fine movement would be highlighted here
Back-and-forth reciting of simple text showed how latency impacted exchanges between two parties
Use of very clear text (alphabet) would clearly show when the audio was clipped or garbled
Exchange of longer phrases (one sentence at a time) would show how well the audio codec performed with a different style of speech that had fewer gaps between words and phonemes
Use of moderate and extensive movement patterns would demonstrate how video handled different amounts of movement
Speaking while moving would demonstrate how audio performance was impacted by video performance and vice versa
Ending the movement before announcing the end of the movement tests would demonstrate how quickly audio and video could correct after periods of movement, resolution loss, and possible synchronization issues
Ending the test with an announcement on the near end and minimal input or movement on the far end would further demonstrate how quickly the far end video would correct after the movement stress tests

The recordings were then saved, the tests were run on each software application, and then the network emulator settings were modified for the next round of tests. When the tests were completed for each set of devices, the video was uploaded to a PC for labeling and review.

Assumptions and Rationale

Each component of this assessment includes a certain number of assumptions that should be addressed for the sake of clarity and completeness.

Device Selection

By refraining from including every product produced by every manufacturer, it is assumed that a subset of devices will offer meaningful data that can be seen as representative of the current market. This does not address variations in implementation of the technologies, manufacturer differences, the impact of legacy equipment or some of the newest product developments.

As this test does not aim to offer product recommendations nor detailed descriptions of how each manufacturer’s product line functions, this is seen as an acceptable limitation of these tests. To reiterate the goal of these tests, TTAC wanted to focus on the impact of general computing hardware, network conditions, and differences in video codecs more than variations in individual products or manufacturers.

Network Emulation

By using a software-based emulator it is assumed that accurate networking modeling will be possible; however, hardware-based network emulators are often seen as being more reliable than some of the free software products that are included in this test. The most significant criticism is around how software products emulate packet loss, as certain network problems may not be accurately modeled. In some situations a router will “dump” a large queue of data packets, meaning that a substantial number of sequentially-ordered packets will be simultaneously lost.

Many software products emulate packet loss by consistently dropping random packets, which may not adequately demonstrate flaws in how video codecs are implemented. Specifically, one of the strengths of the SVC protocol is the transmission of redundant packets that helps compensate for connections with slow and steady packet loss. This may limit the robustness of the data by not emulating all forms of packet loss. The data will still be valid, but may not paint a complete picture of the potential costs and benefits of various VTC codecs.

Video Signal Splitting and Conversion

By inserting numerous devices between the videoconferencing unit and the recorder, it is assumed that there will be a minimal and/or equally-distributed impact on signal quality and temporal variation. Each conversion may introduce milliseconds of delay and small quantities of signal degradation into the process.

The consistent use of the various conversion and splitting devices is presumed to render the impact on codec performance trivial. It was determined to be more important to enable monitoring at the same time as recording, and to enable simultaneous recording on a centralized device, rather than allowing each endpoint to use its own recording functionality. The use of a centralized device will more readily support side-by-side comparisons (including the impacts of network delays and bandwidth limitations) than might be found by recording within endpoint-managed systems.

Non-Standard Performance of Testers

TTAC used two individual performers and a standardized script for its testing efforts. This allowed for a relatively consistent evaluation of each software product over a range of network conditions. However, as the testers could not precisely reproduce an identical performance for each test, there will be some degree of variability that may impact the performance of each product in a given test.

By creating a standardized script, this impact has been mitigated as much as could be reasonably expected. It was decided not to utilize a pre-recorded performance as the testing measure as such efforts would not allow for reacting to changes in videoconference performance, and would also potentially introduce other uncontrolled factors that are not commonly seen in a videoconference (e.g. the video would be compressed prior to being received by a codec, which is a usecase very rarely seen in the real world).

Important Note on Testing Process and Results

TTAC does not endorse the use of any products included in this product assessment. As stated repeatedly throughout this document, the aim of this particular set of tests is to provide data on the impact of bandwidth and hardware specifications rather than demonstrate any particular “type” of VTC endpoint as being superior or inferior to any other.

The process of selecting and implementing any technology is fraught with tough decisions and organization-specific requirements. There is no one-size-fits-all product on the market, and it is important to keep that in mind. Should your organization need help in planning its own equipment selection process, please feel free to contact TTAC for additional tips and assistance.