This project analyzes Spar Nord Bank's ATM transaction data to optimize their ATM network, enhance security, and improve customer service. Utilizing AWS services and big data tools, a robust ETL pipeline was created to process and analyze over 2.5 million ATM transactions.
The primary goal is to improve ATM management and usage patterns, which is crucial for providing an efficient, secure, and customer-oriented banking experience. The project aims to gain insights into customer withdrawals and ATM usage to minimize costs and mitigate risks associated with fraudulent activities.
- Analyze ATM usage patterns and customer behavior.
- Optimize ATM placement and cash management.
- Enhance security measures and fraud detection.
- Improve overall customer experience.
The dataset consists of over 2.5 million records of ATM transactions, including transaction date and time, amount, location, transaction type, and success status. The data is stored in RDS, extracted using Sqoop, and initially cleaned before being stored in Amazon S3 for further processing.
- Data Extraction: Data is extracted from RDS and stored directly in AWS S3 as CSV files.
- Data Preprocessing: PySpark is used for initial cleaning and transformation of the data.
- Data Storage: Processed data is stored in AWS S3.
- ETL Process: AWS Glue is utilized for Extract, Transform, Load operations from S3 to Redshift.
- Data Warehousing: Transformed data is loaded into AWS Redshift for analysis.
- Analysis: In-depth analysis is performed using Redshift Query Editor.
- Visualization: Interactive dashboards are created using Tableau and Redshift Query Editor V2.
- Implemented Star Schema design.
- Fact Table:
fact_atm_trans
- Dimension Tables:
atm
,date
,location
,card_type
- Data Import: Schema definition and merging of CSV files into a single DataFrame.
- Data Cleaning:
- Data validation
- Removing duplicates and irrelevant columns
- Handling outliers
- Resolving inconsistencies
- Feature Engineering: Created 'full_date_time' attribute.
- Dimension and Fact Table Creation: Separate tables for location, card_type, date, ATM, and transactions.
- Data Lake: Raw and processed data is stored in Amazon S3.
- Metadata Extraction: AWS Glue Crawler extracts metadata into Glue Data Catalog.
- Data Transformation: PySpark scripts in AWS Glue jobs for data transformation.
- Security: VPC and VPC Endpoint are used for secure data processing.
- Data Loading: Transformed data is loaded into Amazon Redshift.
- Verification: Data integrity is checked using Redshift Query Editor V2.
The query result shows the ATMs with the highest percentage of failed transactions helping the bank identify problematic ATMs that need immediate attention or maintenance, improving overall customer experience.
The query result highlights the busiest ATMs helping the bank identify locations that may need additional machines or more frequent maintenance and cash replenishment.
The query result shows the number of unsuccessful transactions per month, helping the bank identify seasonal trends in ATM failures and plan maintenance accordingly.
The query result identifies ATMs handling the highest cash volumes, assisting in cash management and security planning.
The query result helps the bank understand which card types are associated with more failed transactions, potentially leading to improvements in card technology or customer education.
The resultset shows peak transaction hours, helping the bank schedule maintenance and cash replenishment during off-peak times to minimize customer inconvenience.
This visualization shows the popularity of different card types, helping the bank tailor its services and potentially negotiate better terms with card providers.
This viz shows daily transaction patterns, helping the bank understand long-term trends and seasonality in ATM usage.
These reports provide valuable insights that can help the bank optimize ATM placement, improve maintenance schedules, enhance cash management, and ultimately provide better service to customers. By understanding usage patterns, identifying problematic ATMs, and recognizing external factors that affect ATM usage, the bank can make data-driven decisions to improve efficiency and customer satisfaction.
Spar Nord Bank's ATM transaction data is analyzed using AWS services to optimize ATM management and usage. Key findings include:
- Peak transaction hours: 10 AM - 12 PM
- Most used cards: Visa, followed by Mastercard
- Significant impact of temperature on ATM usage
These insights can guide ATM refill schedules, inform partnership strategies, and improve customer comfort at ATM locations. The analysis provides a data-driven foundation for enhancing the bank's ATM network efficiency.
ThePieSparkersDATA228_BDT_project.ipynb
: Jupyter notebook containing the project codeThePieSparkersProject_plan_v3.0.pdf
: Detailed project planThePieSparkers_DATA228_ATM_Transactions_Analysis.pptx
: Project presentation slidesThePieSparkers_DATA228_projectreport.pdf
: Comprehensive project reportatm_data_part1.csv.zip
: First part of the ATM transaction dataset (compressed)atm_data_part2.csv.zip
: Second part of the ATM transaction dataset (compressed)images/
: Directory containing visualization imagesREADME.md
: This file, providing an overview of the project
These files encompass the complete project, including code, documentation, data, and presentation materials.