Skip to content

Latest commit

 

History

History
649 lines (395 loc) · 28.7 KB

README.md

File metadata and controls

649 lines (395 loc) · 28.7 KB

Logo

Movio - A Scalable Video On Demand Platform in Microservices

Movio is a VOD Platform Backend Solution Just Like Netflix!

The Monolith version of Movio is CuteTube. Check it here »

The Live Stream version of Movio is ProStream. Check it here »

Read the blog »

Movio Auth Service . Movio API Service . Movio Worker Service

General Information

Movio is a High Performance video streaming platform just like Netflix built in Microservices Architecture. A viewer can signup in Movio and watch their favorite videos as well as upload their video for others to watch.

Movio is primarily tries to mimic the core backend of any established VoD platforms, such as Netflix, YouTube. The backend of Movio is built on Django.

Please see the respective service for code review

NOTE: Currently I am updating the readme of this central repository. The Readme update for each service might be delayed. This central repository will share the high level understanding of the Movio Platform. You are encourage to read throughly the Movio Overall Workflow Section below for high level understanding of Movio Platform.

Movio Auth Service: Movio Auth Service

Movio API Service: Movio API Service

Movio Worker Service: Movio Worker Service .

The Movio platform is backed by the following core technologies -

a. Django - The Backend
b. Proven Celery Pipeline - For Asynchronous Video Processing  
c. Postgres - As Database 

d. FFmpeg - Extract Subtitles, Transcode, Segment Videos in MPEG-DASH with Adaptive Bitrate Streaming Technology
e. Gcore CDN - Distribute the Content Globally with Low Latency

f. AWS Translate -  Translate the Subtitle.
h. AWS Lambda - Automate the Subtitle Translation Process 
i. AWS S3 - Store the DASH Segments

j. RabbitMQ - As Message Broker 
k. Celery - As Asynchronous Task Manager 
l. Flower - To monitor Celery Workers 
m. Redis - As Celery Backend 

n. Elastisearch - To power the searches of Video Metadata and Subtitles 
o. Docker - To Containerized the solution 
p. Cloudflare - As DNS Service 
q. Nginx - As Proxy Manger 
r. Gunicorn - As App Server 

NOTE

A. Containerized Solution

The project is fully containerized using docker. A make docker-up would run the full system locally provided the .envs.

B. Documentation

Please visit the documentation page localhost:{{NGINX HOST PORT}}/doc/ for more information related to API documentation.

C. Progress

Currently Movio has three / 3 microservices. I am constantly updating the ReadME, you may see different versions of Readme!

D. Future Updates

I am planning to add the following supports in the future Updates:

i. A Rate Limit to the video submission API implemented in Token Bucket Algorithm

ii. Implement Whatsapp Message to the User when the video processing is completed

iii. Send Email to the user when the video processing is completed.

Movio Overall Workflow

Overall Workflow of Movio


NOTE: Postman APIs Exports are available here: Postman APIs Exports


  • The authentication system of Movio Auth Service is built from scratch. No 3rd party packages has been used.

  • The Auth Service provides JWT access and refresh token with some additional custom user data encoded in it.



Workflow of Movio API Service

The Movio API Service service is exposed to public access along with the Movio Auth Service . Users can interact with Movio API Service for video processing and requesting the video metadata. The service Produces RabbitMQ events that would be consumed by Movio Worker Service and the service also Consumes RabbitMQ events that are produced by Movio WorkerService .

Once Movio API Service consumes a message that is produced by Movio Worker Service , it updates the status of the video processing in the database. As an extension, I am planning to sending Whatsapp Message and Email to the user at this time to inform the User about the successful video process result.

NOTE: See API Documentation Below for Full API Params and Information.

TL;DR;

Upload API:

  • User send a video file along with title, duration, and description.
  • The video is offloaded to celery worker.
  • The celery worker uploads the video to S3 Bucket, deletes the local file and creates a RabbitMQ Event with some data.
  • This message is consumed by Movio Worker Service to process the video.

Stream API:

  • User passes a video_id and the API provides all the necessary information to play the video such as: CDN URL, and other video metadata.

Search APIs:

i. video title

ii. video description

iii. subtitle keyword

List API:

  • The ListView is a paginated API. It responds with all the videos in the database with paginated response.

Below I am describing below app wise how Movio API Service works internally.

APP: Event Manager

  • event_manager is the app that is responsible to communicate with the cloud RabbitMQ instance Movio API Service is using to communicate with Movio Worker Service .

  • The event_manager app consists of a Django Management Command that runs on different process to listen to the events of the Movio Inter Services.

  • It produces messages to be consumed by Movio Worker Service and it also listens to the messages that are produced by Movio Worker Service .

  • Video Submission API - /api/v1/app/events/video-upload/ -> API that offloads the video processing to the celery worker.

APP: Stream

  • The stream app is responsible for all the streaming related APIs.

  • SingleVideoMetaDataAPI - /api/v1/app/stream/video-metadata/<uuid:video_id>/ -> to get video metadata about a single video provided a video_id.

  • ListView API - /api/v1/app/stream/videos/all/ -> API to get all the videos available in the platform with paginated response.

  • The user needs to use a dash player and send a request to the CDN URL to play the video along with the subtitle.

APP: ES Search (Elastic Search)

  • The app is responsible for managing all the elasticsearch related documents, serializers and views.

  • Movio is powered by Elasticsearch for searching in the platform.

  • Search In Video MetaData - /api/v1/app/search/video-metadata/ -> To search on video title and video description.

  • Search in Video Subtitle - /api/v1/app/search/subtitle/ -> To search on video subtitle.



The Movio Worker Service is the heart of the Movio Platform. The service, just like the Movio API Service , runs an additional Process (Django Management Command) to listen to the RabbitMQ events that are produced by Movio API Service . The service also produces message after it finishes the video processing and that message is consumed by the Movio API Service .

Once the Movio Worker Service consumes a message produced by Movio API Service , a robust Celery Pipeline is activated.

Below, I am mentioning all the Celery tasks that are included in this Pipeline.

NOTE: All The Below Tasks Run Sequentially.

Task 01: Download Video From S3

  • This is the very first task in the pipeline. It downloads the video that was submitted by the user (which was uploaded to S3 by Movio API Service ).

Task 02: Delete Video From S3

  • As the video is downloaded from S3 to process locally, this task deletes the video from S3 bucket as that video is no longer needed.

Task 03: Extract Subtitle From Video

  • This task make uses of ffmpeg tool and extracts the subtitle from the video, and saves it locally for further processing.

Task 04: Upload Subtitle File to S3 for Lambda Processing

  • This task uploads the Subtitle / CC file to a S3 bucket.

AWS Lambda Function for Subtitle Translation

  • The S3 bucket is a trigger point for a AWS Lambda function.

The Lambda function downloads the subtitle file from S3, and translate the subtitle in Bengali, Hindi, French and Spanish.

The Lambda function uses AWS Translate Service to translate the subtitles.

Once the translation is completed, the Lambda function uploads all the subtitle files to another S3 bucket which is designated for production stage.

Task 05: Transcode Video to MP4

  • As MPEG-DASH players are browser based, dash player can not directly serve .mkv container dash segments. Hence, it is necessary to transcode the video into MP4 so that any dash player can play the segments.

  • This task make use of ffmpeg tool and transcode the video into MP4 container format.

  • It uses h264 codec for transcoding the video and aac for sound stream.

Task 06: Make DASH Segments of the Video with Adaptive Bitrate

  • Once the .MKV video file is transcoded into MP4 container type, it is ready for segmentation.

  • This task segments the video into chunks with adaptive bitrate technology.

Adaptive Bitrate

Adaptive Bitrate (ABR) is very popular method of video segmentation. ABRmakes video streaming very joyful as it accounts for thenetwork bandwidthandnetwork latency` of the client.

If the client has slower network speed, the ABR technology downgrades the video quality automatically hence the client does not experience buffering or minimal buffering.

In ABR, the segment size is automatically chosen by the dash player.

  • The task segments the video in: 360p, 480p, and 720p with 800 kbps, 1200 kbps and 2400 kbps bitrate respectively.

Task 07: Edit manifest.mpd File

  • Once the video is processed for DASH Segments, it generates an .mpd file where all required information about the video is stored.

  • ffmpeg does not support adding Subtitle information in the manifest.mpd file, hence, we need to manually add the Subtitle information in the manifest.mpd file.

  • This task does the same job. It opens the manifest.mpd file, and adds some AdaptionSet in the manifest.mpd with the BaseURL of the Subtitles.

Task 08: Upload DASH Segments to S3 and Chain Callback

  • This task is interesting. This task creates sub-tasks within it to utilize the maximum resources of the server.

  • This task creates batch of a list of 10 segments, and a task task is created for each list.

  • Each list has 10 segments, hence, all the sub-tasks are dedicated to upload a batch of 10 segments in S3.

Why Create Sub Task

I am creating sub-tasks in order to fully utilize the full throttle of the server. Imagine, a video file has 1000 segments, and we are allocating only one celery worker, it is a bottleneck for two reasons:

i. The worker may fail as uploading straight 1000 segments may need much more connection time to the s3_client.

ii. Assigning only one worker doesn't utilizes the full capabilities of the server. Imaging a server has 8 workers. If we are assigning only one worker, it has high chances that a few worker may be idle/doing nothing.

"We pay cloud providers for the cores. Cores are not promised to stay cool" - Someone Wise!

Hence, in order to fully utilize all the capabilities and all the available workers, I am distributing the uploading task in smaller batches and allocating all the available workers to upload.

This way, no worker needs to maintain a connection for long time, and as many available workers are working symultaniously, the upload is faster.

  • The task creates all the batches as Celery Group so that all the tasks may run symultaniously.

  • The task also creates a callback chain with two other tasks:

task i. publish message to mq task ii. cleanup local files

  • Then the task creates a celery cord canvas celery group as the header and the callback chain as the callback so that the cleanup local files task only execute when all the uploading tasks are completed.

Task 09: Publish MQ Message

  • This task is the first task in the previous callback chain.

  • It produces message for the Movio API Service with additional video metadata information so that the Movio API Service may update the database.

Task 10: Cleanup Local Files

  • This is the last task in the callback chain of the callback of the previous task.

  • This task clears all the processed data from local storage such as Subtitle file, .MKV and .MP4 file and the local DASH Segments.



Architecture of Movio | HLD

Architecture of Movio

Architecture of Movio Auth Service

movio-auth-service-hld

Architecture of Movio API Service

movio-api-service-hld

Architecture of Movio Worker Service

NOTE: Movio Worker Service is only accessible through Message Queue Events. The Nginx Container in Movio Worker Service is only for Healthcheck and for Admin Portal.

movio-worker-service-hld



Key Images of Movio Workflow

In this section, I am adding some key images of Movio workflow.


  • A snapshot of Movio Player with Multiple Subtitle Support

movio-player-with-subtitle


  • The MPEG-DASH Segments are being served through a CDN (Gcore CDN)

The CDN URL is movio-cdn.algocode.site.

I am using my old domain I purchased for my another project - Algocode. This is a microservices backend solution of Online Judge just like Leetcode. No other 3rd Party APIs has been used. User passes code in C++, and the RCE Enginer executes the code in a secure docker container and generates results such as AC, WA, TLE etc. To learn more about Algocode, please visit Algocode here.

segments-served-from-cdn


  • Segments in S3 Bucket

segments-in-s3-bucket


  • Subtitles in S3 Bucket

subtitles-in-s3-bucket


  • AWS Lambda function

aws-lambda-function


  • DASH Segmentation with Adaptive Bitrate of the Video

dash-segments-with-abr


  • DASH Segments Batch Processing to Upload in S3 Bucket

dash-segments-s3-batch-processing




How to Run Locally

  • In order to watch the video file, you do not need to run any of the services!

  • As all the segments are uploaded in S3 Bucket, and I have already set up Gcore CDN in order to serve the DASH Segments, you only need to run a htmlpage which has adash player` in it.

  • Please copy this already available dash player from the Movio API Service , and run it with the Live Server, and you will be able to see the video with subtitle!

  • Please COPY THIS HTML FILE , and put the long video or short video CDN URL in the url variable, and depending upon the long or short video URL, the video will be played.

  • If you want to setup local development, you need a few AWS credentials as well as RabbitMQ Credential, Gcore CDN Credential, CloudFlare DNS



Watch In Action | YouTube Video

Watch Movio on Action


Video 01: Watch The General Video Playback Demonstration Of Movio

  • In this video, you can watch the general video playback of Movio with Subtitles.
Watch the video

Video 02: Movio High Level Architecture Understanding

  • In this video, you can learn more about the high level architecture of Movio

TimeStamp:

A. Introduction: 00:00

B. Movio Auth Service: 02:05

C. Movio API Service: 03:55

D. Movio Worker Service: 09:30

E. Movio Worker Service Celery Pipeline: 11:40

Watch the video

Video 03: Watch the Big Picture of Movio - How Movio Movio Worker Service and Movio API Service and the Celery Pipeline Works Internally

TimeStamp:

A. Introduction: 00:00

B. Docker Compose Introduction: 02:10

C. Django Middleware for Auth and Video Body Check: 03:55

D. Docker Up Movio API Service: 08:57

E. Show Movio API Service Celery Logs: 11:24

F. Run Django Management Command for API Service: 11:50

G. Docker Up Movio Worker Service: 12:40

F. Show Flower for Celery Monitoring: 13:50

G. Show Movio WOrker Service Celery Logs: 14:30

H. Run Django Management Command for Worker Service: 15:00

I. Submit Video to Movio API Service from Postman [POST]: 19:15

J. Celery Task Explanation of Movio API Service: 19:50

K. API Service Management Command for RabbitMQ: 21:10

L. Worker Service Flower: 23:30

Worker Service Celery Task:

M. Worker Service Celery Tasks: 24:00

O. Worker Service Celery Task - Download Video from S3: 24:43

P. Worker Service Celery Task - Delete Video from S3: 25:50

Q. Worker Service Celery Task - Extract Subtitle from Video: 27:40

R. Worker Service Celery Task - Upload Subtitle to Translate for AWS Lambda: 29:40

S. Worker Service Celery Task - Transcode Video to MP4: 32:25

T. Worker Service Celery Task - DASH Segment Video: 34:55

U. Worker Service Celery Task - Edit Manifest to Add Subtitle Information: 39:00

V. Worker Service Celery Task - Upload DASH Segments to S3 Entrypoint and Celery Callback: 39:40

W. Worker Service Celery Task - Sub Task Batch Processing (Celery Chain, Group, Chord): 40:15

X. Worker Service Celery Task - Upload Segments Sub Task (S3 Batch Upload): 46:40

Y. Worker Service Celery Task - MQ Message Publish: 48:15

Movio API Service APIs:

Z. Get Singel Video Metadata Information: 49:30

a. Search Video Title/Description/Subtitle Search #Elastisearch: 51:40

AWS Services:

b. AWS S3 for Segments and Subtitles: 54:18

c. Show DASH Manifest File with Subtitle: 55:10

d. Show English and Bengali Subtitle: 56:00

e. AWS Lambda Function for Subtitle Translation: 56:50

VIDEO PLAY:

See The Video being Served through Gcore CDN:

f. See the Video Segments are Played: 58:00

g. Run Live Server with DASH Player: 58:15

h. See the Video is being Played: 59:00

i. See a Long Video being Played with Seek Backward and Forward with Synked Subtitles through CDN: 01:01:20

Watch the video

Learnings and Challenges

Learning and Challenges

  • It was super fun to build this project. I learnt so much new concepts during building this project.

  • I begun the project development on 09/09/24, and developed till 19/09/24. It took total 10 days to build the project what it is right now. Few times, I thought to give up, but I knew, every problem comes with a solution, we just need to pay attention to the problem itself to understand the solution.

  • I am hoping to continue this project and add some more extensions such as send email or whatsapp message to the user, add rate limiting to the video submission API with Token Bucket or Fixed Window or Dynamic Window algorithm, add few more middleware to decrease the load to the view as I am already using to middleware to authenticate the request and validate the video upload API.




Linux Postman C++ Java Python Django AWS Docker Bash Azure CircleCI Node.js Kafka RabbitMQ Nginx

🔗 Links

Email Me Twitter LinkedIn Hashnode Medium Devto LeetCode