Solution built for a Junior Analytics Engineer technical assessment. Read the complete challenge proposal here.
Solution developed for a technical assessment that prepared educational data. The project processed data from:
- Student profiles (2021-2022)
- School information (2021-2022)
- São Paulo city only
- Python
- Pandas
- SQLite
- Google Sheets
-
Quality Check
- Analyzed data quality in CSV files
- Used Python, Pandas and Google Sheets
-
Header Validation
- Developed a Python script to validate file headers
- Compared against data dictionaries
-
Manual Corrections
- Fixed field names
- Adjusted column positions
- Removed empty or invalid columns
-
Data Preparation
- Created a Python script for data cleaning
- Prepared data for SQLite storage
-
Database Creation
- Created SQLite database with tables:
educandos
: Student profiles (2021-2022)escolas
: Municipal schools data (2021-2022)escolas_educandos
: Relationship between schools and students
- Created SQLite database with tables:
The prepared database enabled various analyses for sales planning:
-
Demographics: Student distribution by race, gender, and age
- Helped understand customer diversity
- Guided product development
-
Special Education: Distribution of students with special needs
- Identified opportunities for specialized products
- Supported inclusive product planning
-
Trends: Year-over-year comparison (2021-2022)
- Helped predict future demand
- Guided inventory planning
-
School Clusters: Groups of schools by characteristics
- Location-based analysis
- Size-based segmentation
-
Market Segments: Identified distinct customer groups
- Customized product strategies
- Targeted marketing approaches
Note: This project was developed as part of a technical assessment for a Junior Analytics Engineer position. Some details have been modified to maintain confidentiality.