Associate Data Engineer in SQL

In this DataCamp career track, I'm immersing myself in essential data engineering concepts, mastering ETL/ELT workflows, and working with relational databases like PostgreSQL. I'm honing my SQL skills for querying, joining tables, and utilizing advanced techniques like subqueries and aggregation. The track also delves into database design principles, including star and snowflake schemas, and normalization, preparing me to handle tasks like table creation and data consistency. Additionally, I'm learning to install and configure PostgreSQL, manage users, and explore data warehouse technologies like Snowflake, a leading cloud solution for data engineering.

Understanding Data Engineering

1. What is data engineering?

Data engineering and big data
Data engineers vs. data scientists
The data pipeline

2. Storing data

Data structures
SQL databases
Data warehouses and data lakes

3. Moving and processing data

Processing data
Scheduling data
Parallel computing
Cloud computing
We are champions

Introduction to SQL

1. Relational Databases

Databases
Tables
Data

2. Querying

Introducting queries
Writing queries
SQL flavors
Congatulations

Intermediate SQL

1. Selecting Data

Querying a database
Query execution
SQL style

2. Filtering Records

Filtering numbers
Multiple criteria
Filtering text
NULL values

3. Aggregate Functions

Summarizing data
Summarizing subsets
Aliasing and arithmetic

4. Sorting and Grouping

Sorting results
Grouping data
Filtering grouped data
Congratulations

Joining Data in SQL

1. Introducing Inner Joins

The ins and outs of INNER JOIN
Defining relationships
Multiple joins

2. Outer Joins, Cross Joins and Self Joins

LEFT and RIGHT JOINs
FULL JOINs
Crossing into CROSS JOIN
Self joins

3. Set Theory for SQL Joins

Set theory for SQL Joins
At the INTERSECT
EXCEPT

4. Subqueries

Subquerying with semi joins and anti joins
Subqueries inside WHERE and SELECT
Subqueries inside FROM
The finish line

Project : Analyzing Students' Mental Health

Studying abroad can be both exciting and difficult. But what might be contributing to this? One Japanese international university decided to find out!

Use your data manipulation skills to explore the data from a study on the mental health of international students, and find out which factors may have the greatest impact.

Link to the project

Introduction to Relational Databases in SQL

1. Introduction to Relational Database in SQL

Introduction to relational databases
Tables: At the core of every database
Update your goal as the structure changes

2. Enforce data consistency with attribute constraints

Better data quality with constraints
Working with data types
The not-null and unique constraints

3. Uniquely identify records with key constraints

Keys and superkeys
Primary keys
Surrogate keys

4. Glue together tables with foreign keys

Model 1: N relationships with foreign keys
Model more complex relationships
Referential integrity
Roundup

Database Design

1. Processing, Storing and Organizing Data

OLTP and OLAP
Storing Data
Database Design

2. Database Schemas and Normalization

Star and Snowflake schema
Normalized and denormalized databases
Normal forms

3. Database Views

Database views
Managing views
Materilized view

4. Database Management

Database roles and access control
Table partitioning
Data integration
Picking a Database Managment System(DBMS)

Data Warehousing Concepts

1. Data Warehouse Basics

What is data warehouse?
What is the difference between data warehouse and data lake?
Data warehouses support organizational analysis

2. Warehouse Architecture and Properties

What are the different layers of data warehouse?
The presentation layer
Data warehouse architectures
OLAP and OLTP systems

3. Data Warehouse Data Modeling

Data warehouse data modeling
Kimballs four step process
Slowly changing dimentions
Row vs. column data store

4. Implementation and Data Prep

ETL and ELT
Data cleaning
On premise and cloud data warehouse
Data warehouse design example
Wrap-up

Introduction to Snowflake

1. Introduction to Snowflake: Architecture. Competition, and SnowflakeSQL

What is Snowflake?
Snowflake Architecture
Snowflake competitors and why use Snowflake

2. Snowflake SQL and key concepts

Connecting to Snowflake and DDL commands
Snowflake database structures and DML
Snowflake data type and data type conversion
Functions, storing and grouping

3. Advance Snowflake SQL Cencepts

Joining in Snowflake
Subquerying and Common Table Expressions
Snowflake Query Optimization
Handling semi-structed data

Understanding Data Visualization

1. Visualizing distributions

A plot tells a thousands words
Histograms
Box plots

2. Visualizing two variables

Scatter plots
Line plots
Bar plots
Dot plots

3. The colour and the shape

Higher dimentions
Using color
Plotting many variables at once

4. 99 problems but a plot aint one of them

Polar coordinates
Axes of evil
Sensory overload

Project : Explorinf London's Travel Network

How do Londoners get around? Transport for London (TfL) is a vast public transport network that allows London's residents to efficiently travel around the UK's capital, to the tune of over 1.5 million journeys per day!

In this introductory project, you will work with a Snowflake, Amazon Redshift, Google BigQuery, or Databricks database containing data on journeys and transport types in London between 2010 and 2022. You will write SQL queries to find the most popular transport methods, examine when the London cable car (which connects London's Royal Docks, where the Mayor's office is located, to Greenwich Peninsula, home of the O2 arena) was particularly busy, and identify rare periods when the Underground (also known as "the tube" to locals) was less busy.