Skip to content

The Associate Data Engineer career track on DataCamp focuses on the fundamentals of data engineering, including database design and data warehousing, and working with technologies such as PostgreSQL and Snowflake.

Notifications You must be signed in to change notification settings

LinaYorda/data-engineer-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Associate Data Engineer in SQL

In this DataCamp career track, I'm immersing myself in essential data engineering concepts, mastering ETL/ELT workflows, and working with relational databases like PostgreSQL. I'm honing my SQL skills for querying, joining tables, and utilizing advanced techniques like subqueries and aggregation. The track also delves into database design principles, including star and snowflake schemas, and normalization, preparing me to handle tasks like table creation and data consistency. Additionally, I'm learning to install and configure PostgreSQL, manage users, and explore data warehouse technologies like Snowflake, a leading cloud solution for data engineering.

Table of Contents

Understanding Data Engineering

1. What is data engineering?

  • Data engineering and big data
  • Data engineers vs. data scientists
  • The data pipeline

2. Storing data

  • Data structures
  • SQL databases
  • Data warehouses and data lakes

3. Moving and processing data

  • Processing data
  • Scheduling data
  • Parallel computing
  • Cloud computing
  • We are champions

Introduction to SQL

1. Relational Databases

  • Databases
  • Tables
  • Data

2. Querying

  • Introducting queries
  • Writing queries
  • SQL flavors
  • Congatulations

Intermediate SQL

1. Selecting Data

  • Querying a database
  • Query execution
  • SQL style

2. Filtering Records

  • Filtering numbers
  • Multiple criteria
  • Filtering text
  • NULL values

3. Aggregate Functions

  • Summarizing data
  • Summarizing subsets
  • Aliasing and arithmetic

4. Sorting and Grouping

  • Sorting results
  • Grouping data
  • Filtering grouped data
  • Congratulations

Joining Data in SQL

1. Introducing Inner Joins

  • The ins and outs of INNER JOIN
  • Defining relationships
  • Multiple joins

2. Outer Joins, Cross Joins and Self Joins

  • LEFT and RIGHT JOINs
  • FULL JOINs
  • Crossing into CROSS JOIN
  • Self joins

3. Set Theory for SQL Joins

  • Set theory for SQL Joins
  • At the INTERSECT
  • EXCEPT

4. Subqueries

  • Subquerying with semi joins and anti joins
  • Subqueries inside WHERE and SELECT
  • Subqueries inside FROM
  • The finish line

Project : Analyzing Students' Mental Health

Studying abroad can be both exciting and difficult. But what might be contributing to this? One Japanese international university decided to find out!

Use your data manipulation skills to explore the data from a study on the mental health of international students, and find out which factors may have the greatest impact.

Link to the project

Introduction to Relational Databases in SQL

1. Introduction to Relational Database in SQL

  • Introduction to relational databases
  • Tables: At the core of every database
  • Update your goal as the structure changes

2. Enforce data consistency with attribute constraints

  • Better data quality with constraints
  • Working with data types
  • The not-null and unique constraints

3. Uniquely identify records with key constraints

  • Keys and superkeys
  • Primary keys
  • Surrogate keys

4. Glue together tables with foreign keys

  • Model 1: N relationships with foreign keys
  • Model more complex relationships
  • Referential integrity
  • Roundup

Database Design

1. Processing, Storing and Organizing Data

  • OLTP and OLAP
  • Storing Data
  • Database Design

2. Database Schemas and Normalization

  • Star and Snowflake schema
  • Normalized and denormalized databases
  • Normal forms

3. Database Views

  • Database views
  • Managing views
  • Materilized view

4. Database Management

  • Database roles and access control
  • Table partitioning
  • Data integration
  • Picking a Database Managment System(DBMS)

Data Warehousing Concepts

1. Data Warehouse Basics

  • What is data warehouse?
  • What is the difference between data warehouse and data lake?
  • Data warehouses support organizational analysis

2. Warehouse Architecture and Properties

  • What are the different layers of data warehouse?
  • The presentation layer
  • Data warehouse architectures
  • OLAP and OLTP systems

3. Data Warehouse Data Modeling

  • Data warehouse data modeling
  • Kimballs four step process
  • Slowly changing dimentions
  • Row vs. column data store

4. Implementation and Data Prep

  • ETL and ELT
  • Data cleaning
  • On premise and cloud data warehouse
  • Data warehouse design example
  • Wrap-up

Introduction to Snowflake

1. Introduction to Snowflake: Architecture. Competition, and SnowflakeSQL

  • What is Snowflake?
  • Snowflake Architecture
  • Snowflake competitors and why use Snowflake

2. Snowflake SQL and key concepts

  • Connecting to Snowflake and DDL commands
  • Snowflake database structures and DML
  • Snowflake data type and data type conversion
  • Functions, storing and grouping

3. Advance Snowflake SQL Cencepts

  • Joining in Snowflake
  • Subquerying and Common Table Expressions
  • Snowflake Query Optimization
  • Handling semi-structed data

Understanding Data Visualization

1. Visualizing distributions

  • A plot tells a thousands words
  • Histograms
  • Box plots

2. Visualizing two variables

  • Scatter plots
  • Line plots
  • Bar plots
  • Dot plots

3. The colour and the shape

  • Higher dimentions
  • Using color
  • Plotting many variables at once

4. 99 problems but a plot aint one of them

  • Polar coordinates
  • Axes of evil
  • Sensory overload

Project : Explorinf London's Travel Network

How do Londoners get around? Transport for London (TfL) is a vast public transport network that allows London's residents to efficiently travel around the UK's capital, to the tune of over 1.5 million journeys per day!

In this introductory project, you will work with a Snowflake, Amazon Redshift, Google BigQuery, or Databricks database containing data on journeys and transport types in London between 2010 and 2022. You will write SQL queries to find the most popular transport methods, examine when the London cable car (which connects London's Royal Docks, where the Mayor's office is located, to Greenwich Peninsula, home of the O2 arena) was particularly busy, and identify rare periods when the Underground (also known as "the tube" to locals) was less busy.

Link to the project

About

The Associate Data Engineer career track on DataCamp focuses on the fundamentals of data engineering, including database design and data warehousing, and working with technologies such as PostgreSQL and Snowflake.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published