Course author: Starburst Academy Team
Notebook author: Mark Bauer
This notebook accompanies the course Use PyStarburst for Data Analysis, which is part of the overall course Getting Started with PyStarburst on Starburst Academy.
To reproduce the results and follow along, you'll need to set up your environment in Starburst. You can prepare your environment by visiting: https://academy.starburst.io/getting-started-with-pystarburst/192527.
Explore the notebook: aviation.ipynb.
From the course:
Background
In one of the prerequisites to this tutorial, you created a catalog calledtmp_cat
which you connected to an S3 bucket owned by Starburst. The S3 bucket that you connected to also contains some flight data, which you will use in this tutorial. Before you can analyze the data, you'll need to complete a few tasks to prepare your environment.First, you'll have to add a location privilege to the accountadmin role in Starburst Galaxy to ensure that you can write to the folder within the S3 bucket that contains the flight data. After that, you'll use the Query editor to execute some SQL to create the necessary schema and tables.
The data you'll be working with comprises four csv files: flights.csv, airports.csv, carriers.csv, and plane-data.csv. You're going to make one table for each file, beginning with flights.csv.
Source: Use PyStarburst for data analysis
- PyStarburst Documentation: https://docs.starburst.io/clients/python/pystarburst.html
- Starburst Academy: https://academy.starburst.io/
- Starburst Documentation: https://www.starburst.io/
Feel free to reach out for further discussions.
- LinkedIn: markebauer
- GitHub: mebauer
- Portfolio: mebauer.github.io