Skip to content

Student-facing guide for Jean Bartik Computing Symposium (Feb 2024) workshop: "Building Scalable Systems for Data Science"

Notifications You must be signed in to change notification settings

jinnyyan/jbcs2024

Repository files navigation

Welcome cadets & midshipmen!

Follow along with powerpoint and notebook instruction. Try to take a stab at the extra credit if you can!

Make sure to set up a keys.yaml first with your secrets; start with the template from keys-template.yaml for formatting.

Notebooks

  • 1-playing-with-s3.ipynb - Learn how to pull objects, list bucket contents, and view metadata of objects in MinIO (S3-compatible object storage)
  • 2-postgres101.ipynb - Query and Insert to Postgres tables
  • 3-pandas-are-cute.ipynb - Learn some basic data wrangling with pandas
  • 4-your-turn.ipynb - Empty notebook to take your stab at ETL! Connect the dots using code provided in earlier notebooks
  • 5-extra-credit-ideas.ipynb - Work on extra credit here

Datasets

  1. https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents
  2. https://www.kaggle.com/datasets/kimdaegyeom/5g-traffic-datasets
  3. https://www.kaggle.com/datasets/ceshine/yet-another-chinese-news-dataset
  4. https://www.kaggle.com/datasets/konradb/swedish-civil-air-traffic-control-dataset
  5. https://www.kaggle.com/datasets/sooyoungher/smoking-drinking-dataset
  6. https://www.kaggle.com/datasets/andrewsundberg/college-basketball-dataset
  7. https://www.kaggle.com/datasets/joebeachcapital/57651-spotify-songs
  8. https://www.kaggle.com/datasets/anoopjohny/consumer-complaint-database

See something else you want to explore? Let’s add it to MinIO!

Technologies (endpoints provided separately)

About

Student-facing guide for Jean Bartik Computing Symposium (Feb 2024) workshop: "Building Scalable Systems for Data Science"

Resources

Stars

Watchers

Forks