Building ETL and Data Pipelines with Bash, Airflow and Kafka

This course provides you with practical skills to build and manage data pipelines and Extract, Transform, Load (ETL) processes using shell scripts, Airflow and Kafka.

Description

Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required.

This course is designed to provide you the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes.

Upon completing this course you’ll gain a solid understanding of Extract, Transform, Load (ETL), and Extract, Load, and Transform (ELT) processes; practice extracting data, transforming data, and loading transformed data into a staging area; create an ETL data pipeline using Bash shell-scripting, build a batch ETL workflow using Apache Airflow and build a streaming data pipeline using Apache Kafka.

You’ll gain hands-on experience with practice labs throughout the course and work on a real-world inspired project to build data pipelines using several technologies that can be added to your portfolio and demonstrate your ability to perform as a Data Engineer.

This course pre-requisites that you have prior skills to work with datasets, SQL, relational databases, and Bash shell scripts.

What you'll learn

Describe and differentiate between Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) processes.
Define data pipeline components, processes, tools, and technologies.
Create batch ETL processes using Apache Airflow and streaming data pipelines using Apache Kafka.
Demonstrate understanding of how shell-scripting is used to implement an ETL pipeline.

Platform

edx

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Build a DAG using Airflow.md		Build a DAG using Airflow.md
ETL using shell scripting.md		ETL using shell scripting.md
ETL_Peer_Review_Assignment.md		ETL_Peer_Review_Assignment.md
Getting started with Apache Airflow.md		Getting started with Apache Airflow.md
Hands-on_Lab-_Monitoring_a_DAG.md		Hands-on_Lab-_Monitoring_a_DAG.md
Kafka_hands_on.md		Kafka_hands_on.md
README.md		README.md
Streaming data with kafka.md		Streaming data with kafka.md
Streaming_data_peer_review_assignment.md		Streaming_data_peer_review_assignment.md
edx_ibm_ETL_Peer_Review_Assigment.py		edx_ibm_ETL_Peer_Review_Assigment.py
edx_ibm_ETL_Server_Access_Log_Processing.py		edx_ibm_ETL_Server_Access_Log_Processing.py
edx_ibm_dag_anatomy.py		edx_ibm_dag_anatomy.py
edx_ibm_my_first_dag.py		edx_ibm_my_first_dag.py
edx_ibm_simple_example.py		edx_ibm_simple_example.py
reading-kafka-python-client.md		reading-kafka-python-client.md
reading-optional-kafka-msgkey_offset.md		reading-optional-kafka-msgkey_offset.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building ETL and Data Pipelines with Bash, Airflow and Kafka

Description

What you'll learn

Platform

About

Releases

Packages

Languages

vsvale/Building-ETL-and-Data-Pipelines-with-Bash-Airflow-and-Kafka

Folders and files

Latest commit

History

Repository files navigation

Building ETL and Data Pipelines with Bash, Airflow and Kafka

Description

What you'll learn

Platform

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages