In this DataCamp career track, I'm immersing myself in essential data engineering concepts, mastering ETL/ELT workflows, and working with relational databases like PostgreSQL. I'm honing my SQL skills for querying, joining tables, and utilizing advanced techniques like subqueries and aggregation. The track also delves into database design principles, including star and snowflake schemas, and normalization, preparing me to handle tasks like table creation and data consistency. Additionally, I'm learning to install and configure PostgreSQL, manage users, and explore data warehouse technologies like Snowflake, a leading cloud solution for data engineering.
- Understanding Data Engineering
- Introduction to SQL
- Intermediate SQL
- Joining Data in SQL
- Project: Analyzing Students' Mental Health
- Introduction to Relational Databases in SQL
- Database Design
- Data Warehousing Concepts
- Introduction to Snowflake
- Understanding Data Visualization
- Data engineering and big data
- Data engineers vs. data scientists
- The data pipeline
- Data structures
- SQL databases
- Data warehouses and data lakes
- Processing data
- Scheduling data
- Parallel computing
- Cloud computing
- We are champions
- Databases
- Tables
- Data
- Introducting queries
- Writing queries
- SQL flavors
- Congatulations
- Querying a database
- Query execution
- SQL style
- Filtering numbers
- Multiple criteria
- Filtering text
- NULL values
- Summarizing data
- Summarizing subsets
- Aliasing and arithmetic
- Sorting results
- Grouping data
- Filtering grouped data
- Congratulations
- The ins and outs of INNER JOIN
- Defining relationships
- Multiple joins
- LEFT and RIGHT JOINs
- FULL JOINs
- Crossing into CROSS JOIN
- Self joins
- Set theory for SQL Joins
- At the INTERSECT
- EXCEPT
- Subquerying with semi joins and anti joins
- Subqueries inside WHERE and SELECT
- Subqueries inside FROM
- The finish line
Studying abroad can be both exciting and difficult. But what might be contributing to this? One Japanese international university decided to find out!
Use your data manipulation skills to explore the data from a study on the mental health of international students, and find out which factors may have the greatest impact.
- Introduction to relational databases
- Tables: At the core of every database
- Update your goal as the structure changes
- Better data quality with constraints
- Working with data types
- The not-null and unique constraints
- Keys and superkeys
- Primary keys
- Surrogate keys
- Model 1: N relationships with foreign keys
- Model more complex relationships
- Referential integrity
- Roundup
- OLTP and OLAP
- Storing Data
- Database Design
- Star and Snowflake schema
- Normalized and denormalized databases
- Normal forms
- Database views
- Managing views
- Materilized view
- Database roles and access control
- Table partitioning
- Data integration
- Picking a Database Managment System(DBMS)
- What is data warehouse?
- What is the difference between data warehouse and data lake?
- Data warehouses support organizational analysis
- What are the different layers of data warehouse?
- The presentation layer
- Data warehouse architectures
- OLAP and OLTP systems
- Data warehouse data modeling
- Kimballs four step process
- Slowly changing dimentions
- Row vs. column data store
- ETL and ELT
- Data cleaning
- On premise and cloud data warehouse
- Data warehouse design example
- Wrap-up
- What is Snowflake?
- Snowflake Architecture
- Snowflake competitors and why use Snowflake
- Connecting to Snowflake and DDL commands
- Snowflake database structures and DML
- Snowflake data type and data type conversion
- Functions, storing and grouping
- Joining in Snowflake
- Subquerying and Common Table Expressions
- Snowflake Query Optimization
- Handling semi-structed data
- A plot tells a thousands words
- Histograms
- Box plots
- Scatter plots
- Line plots
- Bar plots
- Dot plots
- Higher dimentions
- Using color
- Plotting many variables at once
- Polar coordinates
- Axes of evil
- Sensory overload
How do Londoners get around? Transport for London (TfL) is a vast public transport network that allows London's residents to efficiently travel around the UK's capital, to the tune of over 1.5 million journeys per day!
In this introductory project, you will work with a Snowflake, Amazon Redshift, Google BigQuery, or Databricks database containing data on journeys and transport types in London between 2010 and 2022. You will write SQL queries to find the most popular transport methods, examine when the London cable car (which connects London's Royal Docks, where the Mayor's office is located, to Greenwich Peninsula, home of the O2 arena) was particularly busy, and identify rare periods when the Underground (also known as "the tube" to locals) was less busy.