- Designing Data-Intensive Applications
- Fundamentals of Data Engineering
- The Data Warehouse Toolkit
- Cracking the Data Engineering Interview
- Data Engineering with Python
- Data Pipelines with Apache Airflow
- The Data Warehouse Toolkit
- Big Data: Principles and Best Practices of Scalable Real-Time Data Systems
- Designing Data-Intensive Applications
-
Basic Skills:
Linux
,Git & GitHub
,Computer Networking
,Cloud Computing
,Network & Security
,Agile Development
-
Advanced Skills (Good to Know):
Data Lake & Data WareHouse Concepts
,REST APIs
,Databases(SQL & NoSQL)
-
Programming Languages:
Python
,SQL
,Java
,Scala
-
Databases:
PostgreSQL
,MongoDB
,Neo4j
,Redis
,Cassandra
,Apache HBase
,Snowflake
,InfluxDB
-
Data Ingestion:
Apache Kafka
,Flume
,Logstash
,Airbyte
,Apache Spark
,Talend
,Informatica
-
Data Tranformation:
Python
,Pandas
,SQL
,Apache Spark
,Hive
,dbt
,Matillion
,Pig
-
Data Preprocessing:
Apache Spark
,Apache Hadoop
,Apache Flink
-
Data Orchestration:
Apache Airflow
,Luigi
-
Data Storage:
Data Lake
: AWS S3, Azure Blob Storage, Google Cloud Storage,Data Warehouse
: Snowflake, Google BigQuery, Amazon Redshift, Apache Hive -
Data Visualization:
Tableau
,PowerBI
,Looker
-
DataOps:
Docker
,Kubernetes
,Jenkins
- 🐍 Python,
- 📊 SQL,
- 🛠️ MySQL,
- 🌳 MongoDB,
- 🔥 PySpark,
- 🎈 Bash,
- 🌬️ Airflow,
- ☕ Apache Kafka,
- 🐙 Git,
- 🐈 GitHub,
- ⚙️ CICD basics,
- 🏬 Data Warehousing,
- 🛠️ DBT,
- 🌊 Data Lakes,
- 📘 DataBricks,
- ☁️ Azure Databricks,
- ❄️ Snowflake,
- 🌪️ Apache NiFi,
- 🌐 Debezium
-
Master Python: https://lnkd.in/d-pZPyf5
-
Learn SQL: https://lnkd.in/dzAiRF-x
-
Get hands-on with MySQL: https://lnkd.in/ddpSkUhc
-
Dive into MongoDB: https://lnkd.in/dHQ4VC2E
-
Master PySpark: https://lnkd.in/d7fgs7dE
-
Discover Bash, Airflow & Kafka: https://lnkd.in/dDhuEqQE
-
Master Git & GitHub: https://lnkd.in/dqJ7J3kN
-
Understand CICD basics: https://lnkd.in/dcfKBmCa
-
Decode Data Warehousing: https://lnkd.in/dPVRDJT5
-
Learn DBT: https://lnkd.in/eG9eaEuE
-
Understand Data Lakes: https://lnkd.in/dtZKJ4d6
-
Explore DataBricks: https://lnkd.in/dCBiQXPR
-
Learn Azure Databricks: https://lnkd.in/dzmwBs4Y
-
Master Snowflake: https://lnkd.in/dDBeddVy
-
Explore Apache NiFi: https://lnkd.in/de7bvnSt
Tools | Link | Used for | Official Docs | Youtube |
---|---|---|---|---|
DBMS | - MySQL - MongoDB | |||
SQL | https://lnkd.in/dzAiRF-x | |||
Python | https://lnkd.in/d-pZPyf5 | |||
Linux | ||||
Data Warehouse & Lake Concepts | - Data Warehouse - Data Lakes | |||
Data Pipelines | ||||
DBT | https://lnkd.in/eG9eaEuE | |||
PySpark | https://lnkd.in/d7fgs7dE | |||
Kafka | ||||
Apache Nifi | https://lnkd.in/de7bvnSt | |||
Airflow | ||||
Databricks | https://lnkd.in/dCBiQXPR | |||
Snowflake | https://lnkd.in/dDBeddVy | |||
Cloud Computing Concepts | ||||
Distributed Systems fundamentals | ||||
AWS | ||||
Azure | ||||
GCP | ||||
Git & GitHub | https://lnkd.in/dqJ7J3kN | |||
CI/CD | https://lnkd.in/dcfKBmCa | |||
Jenkins | ||||
Github Actions | ||||
Terraform | ||||
Sonarqube | ||||
Docker | ||||
Kubernetes | ||||
Power BI | ||||
Tableau | ||||
Apache Superset | ||||
Prometheus | ||||
Graphana | ||||
Datadog |
- Netflix - https://netflixtechblog.medium.com/
- AWS - https://aws.amazon.com/solutions/case-studies/
- GCP - https://cloud.google.com/customers
- Azure - https://azure.microsoft.com/en-us/resources/customer-stories/
- Spotify - https://engineering.atspotify.com/category/data/
- MongoDB - https://www.mongodb.com/blog/all
- Swiggy - https://bytes.swiggy.com/the-swiggy-delivery-challenge-part-one-6a2abb4f82f6 - https://bytes.swiggy.com/swiggy-distance-service-9868dcf613f4 - https://bytes.swiggy.com/the-tech-that-brings-you-your-food-1a7926229886
- Zomato - https://blog.zomato.com/