Skip to content

Latest commit

 

History

History
638 lines (403 loc) · 22.6 KB

README.md

File metadata and controls

638 lines (403 loc) · 22.6 KB

[🇧🇷 Português] [🇺🇸 English]


📊 Data Analysis and Visualization: São Paulo Municipal Elections - 1st and 2nd Round (2024)

An analysis of voting patterns in São Paulo's 2024 elections, focusing on voter behavior, absenteeism, and geographic trends.


📺 Watch in Full HD on YouTube


2-election-.english_compressed.1.1.mp4

Sponsor Mindful AI Assistants


This project provides an in-depth analysis of voting patterns in the 2024 São Paulo municipal elections, with a focus on the first and second rounds of mayoral and city council races. It examines key aspects such as voter behavior, shifts between rounds, and regional variations in voter turnout.

The dataset was manually compiled from official sources , includes over 15,000 entries. To gather relevant data, the project employed web scraping techniques, followed by data cleaning and exploratory data analysis (EDA). These methods uncover valuable insights into electoral trends and provide strategic guidance for understanding the political dynamics of São Paulo, which can inform future election strategies

This work was developed as part of the Integrated Project and Storytelling course in the second semester of the undergraduate program in Data Science and Artificial Intelligence at PUC-SP in 2024, under the mentorship of the renowned Professor ✨ Rooney Ribeiro Albuquerque Coelho

His expertise and unwavering dedication to teaching played a crucial role in deepening our understanding of both data science and the art of storytelling.


Developed by:



To access the full Map, click the Map below:



Power BI Dashboard

Access the dataset and explore the interactive dashboard via the Power BI link below, where you can use dynamic filters for detailed insights and visualizations.

📈 Power BI Dashboard


Table of Contents

1. Introduction

This report presents a detailed analysis of the data from São Paulo's 2024 municipal elections, focusing on vote distribution, voter behavior, and the performance of mayoral and councilor candidates. Various visualizations and dashboards are used to explore voting patterns, emerging trends, and the factors influencing electoral outcomes.

2. Study Objectives

The study aims to understand electoral dynamics in São Paulo's urban and peripheral areas, identifying factors determining voter preferences, such as the most-voted parties, candidate profiles, and voting behavior.

3. Theoretical Background

Analyzing electoral data is crucial for understanding voter behavior, party preferences, and political trends across different regions. Data visualization offers a clear and efficient way to identify patterns that can inform future campaigns.

4. Dataset Description

Access Dataset:

The data used in this study were extracted from public sources, providing information on votes by municipality, electoral zone, and political party. The dataset includes details about mayoral and councilor candidates in São Paulo, including the number of votes received by each candidate.

TSE Documentatiuon

Processed Files

👉🏻 Access Aqui Todos os Arquivos Processadios

The following CSV files were processed:

  • address_Mayor.csv
  • Mayor_by_city.csv
  • Mayor_by_city_round_2.csv
  • Mayor.csv
  • address_Councilor.csv
  • Councilor_by_city.csv
  • councilor.csv

Sample Column Structure

Here is an overview of the main columns in the processed CSV files:

  • NM_MUNICIPIO: Municipality name
  • NR_ZONA: Electoral zone number
  • DS_CARGO_PERGUNTA: Election role (Mayor or Councilor)
  • NM_VOTAVEL: Candidate name
  • SG_PARTIDO: Party acronym
  • QT_VOTOS: Number of votes received

5. Methodology

The methodology was divided into several steps:

  • Data Preprocessing: Reading and concatenating datasets, cleaning invalid records.
  • Exploratory Data Analysis (EDA): Identifying voting patterns and trends using graphs and tables.
  • Data Visualization: Creating interactive charts with the Plotly library for dynamic result exploration.

6. Exploratory Data Analysis

The exploratory analysis uncovered several interesting trends, such as:

  • The dominance of votes for parties like MDB and PSOL.
  • A geographic vote distribution showing high concentration in central São Paulo and greater support for progressive parties in peripheral areas.

7. Charts and Dashboards

7.1. Vote Distribution by Municipality

The votes distribution revealed a large concentration in São Paulo and neighboring urban areas. The analysis indicated the need for specific strategies for peripheral areas.

import plotly.express as px
import pandas as pd

# Reading the dataset
election = pd.read_csv('/path/to/your/data.csv', encoding='latin-1')

# Plotting vote distribution by municipality
fig = px.histogram(election, x="NM_MUNICIPIO", y="QT_VOTOS", 
                   title="Votes by Municipality", 
                   color_discrete_sequence=["#1f77b4"])
fig.update_layout(bargap=0.2)
fig.show()


7.2. Most Voted Mayoral Candidates

Ricardo Nunes (MDB) stood out in central zones, while Guilherme Boulos (PSOL) had strong support in the peripheries.

# Filtering mayoral candidates
mayor = election[(election["DS_CARGO_PERGUNTA"] == "Prefeito") & 
                 (election["NM_MUNICIPIO"] == "SÃO PAULO") & 
                 (election["SG_PARTIDO"] != "#NULO#")].copy()

# Grouping and ordering candidates by votes
mayor = mayor.groupby(['NM_VOTAVEL', 'SG_PARTIDO']).sum().sort_values("QT_VOTOS", ascending=False)["QT_VOTOS"].reset_index()

# Calculating vote percentages
total_votes = mayor["QT_VOTOS"].sum()
mayor["PERCENTAGE"] = mayor["QT_VOTOS"] / total_votes

# Bar chart
fig = px.bar(mayor, x="NM_VOTAVEL", y="QT_VOTOS", color="SG_PARTIDO", 
             title="Most Voted Mayoral Candidates", 
             color_discrete_sequence=px.colors.qualitative.Dark24)
fig.show()


7.3. Most Voted Councilor Candidates

Vote distribution showed a concentration among local candidates, with highlights for Tabata Amaral (PSB) and Renato Sorriso (PL) in peripheral zones.

# Filtering councilor candidates
councilor = election[(election["DS_CARGO_PERGUNTA"] == "Vereador") & 
                     (election["NM_MUNICIPIO"] == "SÃO PAULO") & 
                     (election["SG_PARTIDO"] != "#NULO#")].copy()

# Grouping and ordering candidates by votes
councilor = councilor.groupby(['NM_VOTAVEL', 'SG_PARTIDO']).sum().sort_values('QT_VOTOS', ascending=False)["QT_VOTOS"].reset_index()

# Calculating vote percentages
total_votes = councilor["QT_VOTOS"].sum()
councilor["PERCENTAGE"] = councilor["QT_VOTOS"] / total_votes

# Bar chart
fig = px.bar(councilor, x="NM_VOTAVEL", y="QT_VOTOS", color="SG_PARTIDO", 
             title="Most Voted Councilor Candidates", 
             color_discrete_sequence=px.colors.qualitative.Dark24)
fig.show()


7.4 Most Voted Mayors by Electoral Zone

Central zones favored Ricardo Nunes, while peripheral zones were dominated by Guilherme Boulos.

# Data of zones and neighborhoods
areas = pd.DataFrame({
    "ZONE": [1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 246, 246, 247, 247, 248, 248, 249, 250, 250, 250, 251, 251, 252],
    "NEIGHBORHOOD": ["BELA VISTA", "CONSOLACAO", "LIBERDADE", "REPUBLICA", "SE", "BARRA FUNDA", "PERDIZES", "SANTA CECILIA", "BOM RETIRO", "BRAS", "PARI", "AGUA RASA", "BELEM", "MOOCA", "JD PAULISTA"]
})

# Merging with mayor data
merged = mayor.merge(areas, left_on="NR_ZONE", right_on="ZONE")

# Bar chart
fig = px.bar(merged, x="NEIGHBORHOOD", y="QT_VOTES", color="SG_PARTY", title="Most Voted Mayor by Zone")
fig.show()


7.5 Most Voted Councilors by Electoral Zone

The analysis revealed candidates like Márcio Chagas (PSOL) and Luana Almeida (PL) performing well in suburban areas.

# Analyzing most voted councilors by electoral zone
areas = pd.DataFrame({
    "ZONE": [1, 1, 1, 2, 2, 3, 3, 4, 5, 6],
    "NEIGHBORHOOD": ["BELA VISTA", "CONSOLACAO", "LIBERDADE", "MOOCA", "CAMPO BELO", "ITAQUERA", "CID DUTRA", "PIRITUBA", "VILA PRUDENTE", "TATUAPE"]
})

# Merging councilor data
councilor_merged = councilor.merge(areas, left_on="NR_ZONE", right_on="ZONE")

# Bar chart
fig = px.bar(councilor_merged, x="NEIGHBORHOOD", y="QT_VOTES", color="SG_PARTY", title="Most Voted Councilor by Zone")
fig.show()


7.6 Most Voted Mayors by Municipality

The municipality-level analysis confirmed Ricardo Nunes' dominance in urban areas and Boulos’ strength in peripheral zones.

# Grouping mayors by municipality
municipality = mayor.groupby("NM_MUNICIPIO").sum().sort_values("QT_VOTES", ascending=False)

# Bar chart
fig = px.bar(municipality, x=municipality.index, y="QT_VOTES", title="Most Voted Mayor by Municipality")
fig.show()


7.7 Most Voted Councilors by Municipality

The analysis showed a strong presence of candidates like Eduardo Suplicy (PT) across several municipalities, reflecting broad political support.

# Grouping councilors by municipality
municipality_councilor = councilor.groupby("NM_MUNICIPIO").sum().sort_values("QT_VOTES", ascending=False)

# Bar chart
fig = px.bar(municipality_councilor, x=municipality_councilor.index, y="QT_VOTES", title="Most Voted Councilor by Municipality")
fig.show()


7.8 Distribution of Votes by Political Party

The vote distribution charts confirmed the dominance of MDB and PSOL, with PSOL's support growing in peripheral zones.

# Analyzing distribution of votes by party
party_votes = election.groupby("SG_PARTIDO").sum().sort_values("QT_VOTES", ascending=False)

# Bar chart
fig = px.bar(party_votes, x=party_votes.index, y="QT_VOTES", title="Distribution of Votes by Political Party")
fig.show()


8. Interactive Power BI Dashboards: Click to access the link

8.1 Dashboard 1: Geographic Distribution of Votes

This dashboard provided a detailed view of electoral preferences by region, highlighting the polarization between urban and peripheral areas.

import plotly.express as px

# Gráfico de mapa para distribuição de votos por município
df = pd.read_csv('distribution_votes.csv')
fig = px.choropleth(df, locations="municipality", color="votes", hover_name="municipality", title="Distribuição Geográfica de Votos")
fig.show()


Dashboard 2: Candidate Performance by Region

This dashboard was essential for understanding candidate performance across regions, using heatmaps and bar charts.

import plotly.express as px

# Bar chart for vote analysis by party
df = pd.read_csv('votes_by_party.csv')
fig = px.bar(df, x="party", y="votes", color="party", title="Vote Analysis by Party")
fig.show()


8.3 Dashboard 3: Voting Analysis by Party

The visualization allowed for identifying votes distribution by party and electoral preferences by zone.

# Dashboard for candidate performance
df = pd.read_csv('candidates_performance.csv')
fig = px.scatter(df, x="zone", y="votes", color="party", title="Candidate Performance by Electoral Zone")
fig.show()


8.4 Dashboard 4: Voting by Demographic Profile

This dashboard analyzed voting by age, gender, and social class, highlighting preferences of younger voters and lower social classes for progressive candidates.

# Dashboard for comparison between candidates
df = pd.read_csv('candidates_comparison.csv')
fig = px.scatter(df, x="votes_mayor", y="votes_councilor", color="party", title="Comparison of Mayoral and Councilor Candidates")
fig.show()


8.5 Dashboard 5: Voting Comparison Between 2020 and 2024 Elections

The comparison between the two elections revealed significant changes in electoral preferences, with PSOL gaining ground in the peripheries.

\# Dashboard for voting by age group
df = pd.read_csv('votes_by_age_group.csv')
fig = px.pie(df, names="age_group", values="votes", title="Voting by Age Group")
fig.show()


9. Conclusion

The analysis of the 2024 São Paulo municipal election data provided valuable insights into voter behavior and emerging trends. We observed increasing political polarization, with PSOL gaining strength in peripheral areas and MDB maintaining a solid base in central urban areas. Additionally, the analysis revealed a shift in electoral preferences, with growing support for more progressive parties, especially among younger voters and lower social classes.

The analysis of charts and dashboards enabled a more detailed understanding of vote distribution by geography, candidate performance by electoral zone, and vote segmentation by party and demographic profile. The trends observed suggest that future electoral campaigns should focus on more segmented strategies, considering the social and economic characteristics of each region.

Recommendations for future campaigns:

  • Personalize electoral communication for different regions, considering demographic and socioeconomic profiles.
  • Leverage the growth of social media and other digital platforms to connect with younger voters and those with limited access to traditional media.
  • Tailor campaign proposals according to local issues such as security, health, and education, which were decisive factors for votes in various peripheral zones.

10. Extra Material


  • QR Code:
    Scan the code to access the data and visualizations on Power BI.


11. References

  • Superior Electoral Court (TSE)
  • [Electoral Data Source]
  • Articles on electoral data analysis and data visualization

12. How to Run the Project

This project was developed in Python and uses libraries like Pandas, Plotly, and Dash for data analysis and visualization. Follow the instructions below to set up the environment and run the code.


12.1 Requirements

Before running the project, you need to have Python and Git installed on your system.

Download Python Download Git

Additionally, you will need the dependencies listed in the requirements.txt file:

pandas
plotly

12.2 Cloning the Repository

To get started, clone the repository to your computer:

git clone https://github.com/your_user/elections-sp-project.git  
cd elections-sp-project

12.3 Installing Dependencies

Install the required dependencies by running the following command:

pip install -r requirements.txt

12.4 Creating the Executable

To create an executable of the project, you can use PyInstaller. Run the following command to generate the executable:

pyinstaller --onefile electoral_analysis.py

This will create an executable file in the dist/ folder, which can be run directly without needing to install Python.

12.5 Running the Code

After installing the dependencies or creating the executable, run the main script to generate the analyses and visualizations:

python electoral_analysis.py

12.6 Running the Interactive Dashboard

If you wish to view the interactive dashboards using Bash, run the following command:

python app.py

This will open the dashboard in your browser.

13. Contributing

If you'd like to contribute to this project, feel free to fork it, make changes, and submit pull requests. Here are the steps to get started:

  1. Fork this repository.
  2. Create a branch for your feature:
    git checkout -b new-feature
  3. Make the necessary changes and commit:
    git commit -am 'Adds new feature'
  4. Push the branch to the remote repository:
    git push origin new-feature
  5. Open a pull request for review and integration.

Make sure your changes do not break existing functionality and that the tests are up to date.

14- 💌 Team and Contacts

Core Team:

  • Fabiana 🚀 Campanarii GitHub
  • Pedro 🛰️ Vyctor - Github

Contact:



Back to top

Copyright 2024 Mindful-AI-Assistants. Code released under the MIT license.