Mastering Video at Scale: The Evolution of Facebook’s Streaming Video Engine (SVE)
Meet the demands of a global user base with low latency, flexibility, and fault tolerance
Facebook faced the tremendous challenge of managing the vast influx of video uploaded by its billions of users as video content became a central part of social media. The platform’s vision of a video-first world demanded a robust system that could efficiently handle video uploading, processing, and distribution at an unprecedented scale. To meet these requirements, the system must be low latency, flexible, and resilient to faults and overloads.
MES (Monolithic Encoding Script) was used to process video uploads in the early stages. The MES system served its purpose when video content was simple and limited in volume, but it proved inadequate when Facebook started processing large amounts of video. Each video had to be processed entirely before moving on to the next task in the MES, which was designed as a sequential, single-threaded system. As video upload volumes increased, this approach caused significant bottlenecks, resulting in high latency and inefficiency.
Facebook’s expansion of video offerings across various applications highlighted the limitations of MES. The platform had to support everything from standard video posts to Facebook 360 videos and Instagram stories. MES’ monolithic nature made accommodating this diversity difficult without extensive manual intervention and customization.
As a result of viral content and live events, the sequential processing model was prone to failures under heavy load. MES faults, whether hardware or software, can halt the video processing pipeline, resulting in millions of users being unable to access content. As a result of this fragility, a more resilient system was needed that could continue to operate smoothly even when it failed.
Facebook recognized these challenges and developed a new framework to meet platform demands. As well as being faster, this new system had to be flexible and reliable. For global video uploads, it needed robust fault tolerance mechanisms, parallel processing to reduce latency, and a programmable interface to allow easy customization.
In response to these limitations, Facebook developed the Streaming Video Engine (SVE), a system specifically designed to scale with its video processing needs. Using parallelism at multiple levels, SVE enables faster processing and more efficient resource utilization. Using a flexible programming model, developers were able to easily integrate new video applications and ensure that the system could quickly recover from failures, thereby providing a consistent and reliable video experience.
Streaming Video Engine (SVE)
Streaming Video Engine (SVE) helped Facebook handle massive video uploads at a scale as the company’s video processing needs grew. The SVE meets three core requirements: flexibility, robustness, and low latency. These were essential to enabling users to share videos quickly, reliably, and across many platforms.
Low Latency:
Facebook’s scale requires low latency for video processing, which directly impacts how quickly users can share their content after uploading. Compared to the previous Monolithic Encoding Script (MES), SVE fundamentally changes how video processing is managed.
The MES system processed videos linearly and sequentially, meaning each video had to be uploaded before processing could begin. It caused significant delays, especially for large files. SVE, however, implements several innovations that significantly reduce latency. First, SVE overlaps uploading and processing, so processing begins immediately after the first chunk of video is uploaded.
Parallelism is also employed by SVE by dividing videos into chunks aligned with the Group of Pictures boundaries. This allows a cluster of machines to process chunks simultaneously. SVE speeds up the overall encoding process by parallelizing these segments. SVE also parallelizes the uploading and processing of videos, ensuring that segments of videos are stored as soon as they are processed, further minimizing delays. By combining these strategies, SVE can deliver videos to users at speeds multiple times faster than MES.
Flexibility:
In addition to speed, flexibility was a critical requirement for SVE to support Facebook’s diverse range of video applications. By using a Directed Acyclic Graph (DAG) programming model, SVE achieves this flexibility. Video, audio, and metadata are separated into separate tracks. Using these tracks in parallel allows tasks to be executed in a non-linear manner.
DAG-based approaches allow developers to add new processing tasks or modify existing ones without disrupting the whole system. It might be necessary to encode a video at different bitrates, extract thumbnails, or perform complex computer vision operations. According to the application’s needs, these tasks can be chained together or run in parallel.
Additionally, you can customize the processing pipeline for each video based on its specific characteristics or the application requirements with the DAG model. SVE’s flexibility allows it to easily adapt to new video formats, processing techniques, and application requirements, keeping Facebook at the forefront of video technology.
Robustness:
Facebook’s global scale demands robust video processing. Even in cases of failure or overload, SVE is built with comprehensive fault tolerance mechanisms.
Among SVE’s primary strategies is replication, which duplicates data across machines and data centers to provide fault tolerance. A second machine can take over if one fails without interrupting the video processing workflow. For non-deterministic errors, such as temporary network problems or software glitches, SVE incorporates retries. The system strategically manages retries so they do not affect overall processing time.
A tiered response system is used by SVE to handle overloads, which are inevitable with viral content and live events. To balance the load, non-critical tasks are delayed and, if necessary, processing tasks are redistributed across different data centers. Certain processing tasks may even be paused to prioritize those that are most critical, ensuring that essential services remain accessible.
Throughout the video processing pipeline, SVE monitors and logs extensively. Real-time monitoring allows the system to detect and address failures in real-time, providing valuable insights. In challenging conditions, SVE ensures videos are processed and shared smoothly by identifying and isolating problems quickly.





