This is a guest post by Ankit Sirmorya. Ankit is working as a Machine Learning Lead/Sr. Machine Learning Engineer at Amazon and has led several machine-learning initiatives across the Amazon ecosystem. Ankit has been working on applying machine learning to solve ambiguous business problems and improve customer experience. For instance, he created a platform for experimenting with different hypotheses on Amazon product pages using reinforcement learning techniques. Currently, he is in the Alexa Shopping organization where he is developing machine-learning-based solutions to send personalized reorder hints to customers for improving their experience.
Design a video streaming platform similar to Netflix where content creators can upload their video content and viewers are able to play video on different devices. We should also be able to store user statistics of the videos such as number of views, video watched duration, and so forth.
The application should be able to support the following requirements.
We need to build an application which should be able to support the traffic at the scale of Netflix. We should also be able to handle the surge of traffic which is often seen when sequel to a popular web-series (e.g. House of Cards, Breaking Bad etc.) comes out. Some of the numbers which are relevant to the capacity planning are listed below.
Netflix comprises of multiple microservices. However, the playback service responsible for responding to user playback queries will get the maximum traffic. So, it requires the maximum number of servers. We can calculate the number of servers required for handling the playback requests by using the equation mentioned below.
S𝑒𝑟𝑣𝑒𝑟𝑠 𝑖𝑛 playback 𝑚𝑖𝑐𝑟𝑜𝑠𝑒𝑟𝑣𝑖𝑐𝑒= (#playbacks requested per second∗ Latency)/ #concurrent connections per server
Let’s assume that the latency (time taken to respond to user requests) of the playback service is 20 milliseconds and each server can support a maximum of 10K connections. Additionally, we need to scale the application for a peak traffic scenario where 75% of active users place playback requests. In such a scenario, we will need a total of 150 servers(=75M*20ms/10K).
Number of videos watched per second= (#active users * #average videos watched daily)/86400 = 3472(100M * 3/86400)
Size of the content stored on a daily basis = #average size of video uploaded every min * #pairwise combination of resolutions and codecs * 24* 60 = 36 TB/day (=2500MB * 10 * 24 * 60)
There will primarily be two kinds users of the system: content creators who upload video content, and viewers who watch the video. The entire system can be divided into the following components.
Content Distributor Network (CDN): It’s responsible for storing the content in the locations which are geographically closest to users. This significantly enhances the user experience as it reduces the network and geographical distances that our video bits must travel during playback. It also reduces the overall demand on upstream network capacity by a great extent.
FUN FACT: Open Connect is the global customized CDN for Netflix that delivers Netflix TV shows and movies to members world-wide. This essentially is a network of thousands of Open Connect Appliances (OCAs) which store encoded video/image files and is responsible for delivering the playable bits to client devices. OCAs are made up of tuned hardware and software which are deployed on the ISP site and custom tailored to provide optimal customer experience.
Fig 0: HLD of Netflix
In the image above, we have shown the bird’s eye-view of the overall system which should be able to meet all the in-scope requirements. The details of each of the component interactions are listed below.
POST /video-contents/v1/videos
{ videoTitle : Title of the video videoDescription : Description of the video tags : Tags associated with the video category : Category of the video, e.g. Movie, TV Show, videoContent: Stream of video content to be uploaded }
GET /video-contents/v1/search-query/
{ user-location: location of the user performing search }
GET /video-contents/v1/videos/
{ offset: Time in seconds from the beginning of the video }
Within the scope of this problem, we need to persist video metadata and its subtitles in the database. The video metadata can be stored in an aggregate-oriented database and given that the values within the aggregate may be updated frequently, we can use a document-based store like MongoDB to store this information. The data-model for storing the metadata is shown in the table below.
We can use a time-series database such as OpenTSDB, which builds on top of Cassandra, to store the sub-titles. We have shown below a snippet of the data-model which can be used to store video sub-titles. In this model(let’s call it Media Document), we have provided an event-based representation where each event occupies a time-interval on the timeline.
FUN FACT: In this talk, Rohit Puri from Netflix, talks about the Netflix Media Database(NMDB) which is modeled around the notion of a media timeline with spatial properties. NMDB is desired to be a highly-scalable, multi-tenant, media metadata system which can serve near real-time queries and can serve high read/write throughput. The structure of the media timeline data model is called a “Media Document”.
This component will mainly comprise of three modules: Content Uploader, CDN Health Checker, and Title Indexer. Each of these modules will be a micro-service performing a specific task. We have covered details of each of these modules in the section below.
This module is executed when a content creator uploads a content. It is responsible for distributing the content on CDN to provide optimal customer experience.
Fig 1: Sequence Diagram of Content Upload Operation
The diagram above depicts the sequence of operations which gets executed when content creators upload the video content (TV Show or movie).
The encoder works by splitting the video file into smaller video segments. These video segments are encoded in all possible combinations of codecs and resolutions. In our example, we can plan on supporting four codecs(Cinepak, MPEG-2, H.264, VP8) and three different resolutions(240p, 480p, 720p). This implies that each video segment gets encoded in a total of 12 formats(4 codecs * 3 resolutions). These encoded video segments are distributed across the CDN and the CDN url is maintained in the data store. The playback api is responsible for finding the most optimal CDN url based on the input parameters(client’s device, bandwidth, and so forth) of user requests.
FUN FACT: Netflix’s media processing platform is used for video encoding(FFmpeg), title image generation, media processing(Archer) and so forth. They have developed a tool called MezzFS which collects metrics on data throughput, download efficiency, resource usage, etc. in Atlas, Netflix’s in-memory time-series database. They use this data for developing optimizations such as replays and adaptive buffering.
This module ingests the health metrics of the CDNs and persists them in the data storage. This data is used by the data plane to get optimal CDN urls when users request playback.
Fig 2: Sequence Diagram to check CDN Health Metrics
In the image above, we have shown the sequence of operations which gets executed to get statistics around the CDN health metrics and the BGP routes. The details about each of the steps in the sequence diagram are listed below.
This module is responsible for creating the indexes of the video titles and updates them in the elastic search to enable faster content discovery for end users.
Fig 3: Sequence Diagram to store indexed titles on ElasticSearch
The details of the sequence of operations required for indexing the video titles for searching video content are listed below.
This component will be processing the user requests in real-time and will comprise of two major workflows: Playback Workflow and Content Discovery Workflow.
This workflow is responsible for orchestrating operations when a user places a playback request. It co-ordinates between different microservices such as Authorization Service (for checking user authorization and licensing), Steering Service (for deciding the best playback experience) and Playback Experience Service (for tracking the events to measure playback experience). Steering Service ensures the best customer experience by finding the most optimal CDN url based on user request such as user’s device, bandwidth and so forth. The orchestration process will be handled by the Playback_Service as shown in the image below.
Fig 4: Sequence Diagram of Playback Service
The details about each of the steps in the sequence diagram is listed below.
FUN FACT: As mentioned in this talk by Netflix Engineer Suudhan Rangarajan, gRPC is used as the framework for communication between different micro-services at Netflix. The advantages it provides over REST include: bi-directional streaming, minimum operational coupling and support across languages and platforms.
This workflow is triggered when user searches for a video title and comprises of two microservices: Content Discovery Service and Content Similarity Service. The Content Discovery Service gets invoked when user requests for the video title. On the other hand, the Content Similarity Service returns the list of similar video title if the exact video title doesn’t exist in our data-store.
Fig 5: Sequence Diagram of Content Lookup Workflow
We have listed below details of each of the steps involved in the Content Lookup Workflow.
We can work towards optimizing the latency of the Playback Workflow by caching the CDN information. This cache will be used by the steering service to pick the CDNs from which the video content would be served. We can further enhance performance of the playback operation by making the architecture asynchronous. Let’s try to understand it further using example of the playback api(getPlayData()) which requires customer(getCustomerInfo()) and device information(getDeviceInfo()) to process(decidePlayData()) a video playback request. Suppose each of the three operations(getCustomerInfo(), getDeviceInfo(), and decidePlayData()) depend on different microservices.
The synchronous implementation of the getPlayData() operation will look similar to the code snippet shown below. Such an architecture will comprise of two types of thread-pools: request-handler thread-pool and client thread-pools (for each of the micro-services). For each playback request, an execution thread from the request-response thread-pool gets blocked until the getPlayData() call gets finished. Each time getPlayData() is invoked, an execution thread(from the request-handler thread-pool) interacts with the client thread-pools of the dependent microservices. It is blocked until the execution is completely finished. It works for a simple request/response model where latency isn’t a concern, and the number of clients is limited.
PlayData getPlayData(String customerId, String titleId, String deviceId) { CustomerInfo custInfo = getCustomerInfo(customerId); DeviceInfo deviceInfo = getDeviceInfo(deviceId); PlayData playData = decidePlayData(custInfo, deviceInfo, titleId); return playData; }
One way to scale out the playback operation is to split the operation into independent processes which can be executed in-parallel and re-assembled together. This can be done by using an asynchronous architecture comprising of event-loops for handling request-responses and client-interactions along-with worker threads. We have shown the asynchronous processing of the playback requests in the image below.
Fig 6: Asynchronous architecture of Playback API
We have shown below the tweaked the code-snippet to effectively leverage the asynchronous architecture. For every playback request, the request-handler event pool triggers a worker thread to set up the entire execution flow. After that, one of the worker threads fetches customer-info from the related micro-service and another thread fetches the device information. Once a response is returned by both the worker threads, a separate execution unit bundles the two responses together and uses them for the decidePlayData() call. In such a process, all the context is passed as messages between separate threads. The asynchronous architecture not only helps by effectively leveraging the available computational resources but also reducing the latency.
PlayData getPlayData(String customerId, String titleId, String deviceId) { Zip(getCustomerInfo(customerId), getDeviceInfo(deviceId), (custInfo, deviceInfo) -> decidePlayData(custInfo, deviceInfo, titleId) ); }
The usage of micro-services comes with the caveat of efficiently handling fallbacks, retries and time-outs while calling other services. We can address the bottlenecks of using distributed systems by using the concepts of Chaos Engineering, interestingly devised at Netflix. We can use tools such as Chaos Monkey which randomly terminates instances in production to ensure that services are resilient to instance failures.
We may introduce chaos in the system by using the concepts of Failure Injection Testing(FIT). This can be done by either introducing latency in the I/O calls or by injecting faults while calling other services. After that, we can implement fallback strategies by either returning latest cached data from the failing service or using a fallback microservice. We can also use libraries such as Hystrix for isolating the points of access between failing services. Hystrix acts as circuit breakers if the error threshold gets breached. We should also ensure that the retry time-outs, service call time-outs, and the hystrix time-outs are in sync.
FUN FACT: In this presentation by Nora Jones (Chaos Engineer at Netflix), the importance and different strategies of resilience testing at Netflix is discussed at length. She has provided key pointers which engineers should keep in mind while designing microservices for resiliency and to ensure that optimal design decisions are in place on a continuous basis.
A common issue observed while streaming a video is that the subtitle appears on top of a text in the video (called the text-on-text issue). The issue is illustrated in the image below. How can we extend the current solution and the data model to detect this issue?
Fig 7: Example of Text-on-Text Issue
We can extend the existing Media Document solution (used for video subtitles) to store the video media information as well. We can then run Text-in-video detection and subtitle positioning algorithms on the media document data store and persist the results as separate indexes. After that, these indexes will be queried by the Text-on-Text detection application to identify any overlap, which will detect the text-on-text issue.
Fig8 : Text-on-Text Detection Application Flow
https://www.youtube.com/watch?v=CZ3wIuvmHeM
https://www.youtube.com/watch?v=RWyZkNzvC-c
https://www.youtube.com/watch?v=LkLLpYdDINA
https://www.youtube.com/watch?v=6oPj-DW09DU
https://www.youtube.com/channel/UC8MdvSjinN761VUtnUjHwJw
https://openconnect.netflix.com/Open-Connect-Overview.pdf
https://medium.com/netflix-techblog/simplifying-media-innovation-at-netflix-with-archer-3f8cbb0e2bcb
https://www.youtube.com/watch?v=CZ3wIuvmHeM
https://www.youtube.com/watch?v=6oPj-DW09DU
https://www.youtube.com/watch?v=OQK3E21BEn8
https://medium.com/netflix-techblog/implementing-the-netflix-media-database-53b5a840b42a
Chaos Engineering:
https://www.youtube.com/watch?v=RWyZkNzvC-c
https://www.youtube.com/watch?v=psQzyFfsUGU
https://www.youtube.com/watch?v=x9Hrn0oNmJM
https://medium.com/@narengowda/netflix-system-design-dbec30fede8d
https://openconnect.netflix.com/Open-Connect-Overview.pdf
https://medium.com/netflix-techblog/implementing-the-netflix-media-database-53b5a840b42a