Data Integration Pipelines for NYC Payroll Data Analytics

The City of New York would like to develop a Data Analytics platform on Azure Synapse Analytics to accomplish two primary objectives:

Analyze how the City's financial resources are allocated and how much of the City's budget is being devoted to overtime.
Make the data available to the interested public to show how the City’s budget is being spent on salary and overtime pay for all municipal employees.

This project will involve creating high-quality data pipelines that are dynamic, can be automated, and monitored for efficient operation. The project will aim to develop pipelines that maintain overall data quality.

The source data resides in Azure Data Lake and needs to be processed in a NYC data warehouse in Azure Synapse Analytics. The source datasets consist of CSV files with Employee master data and monthly payroll data entered by various City agencies.

Summary of Process

Infrastructure was setup: Azure Datalake Gen2, Azure Data Factory, Azure SQL Database and Azure Synapse with a dedicated SQL Server pool.

Data was ingested into Azure Datalake Gen2

Linked services were created in Azure Data Factory to the data sources and destinations

Tables were created in Azure SQL database and Azure synapse. Sql scripts are stored in the sql_scripts folder
Data sets were created in Azure Data Factory

Data flows were set up in Azure Data Factory

Pipelines were set up.

Having set up all pipelines, pipelines were triggered and monitored. Here are the results:

Challenges

Data type errors were resolved by manually setting data types in the Mapping tab of the sink task

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataflow		dataflow
dataset		dataset
factory		factory
images		images
linkedService		linkedService
pipeline		pipeline
sql_scripts		sql_scripts
trigger		trigger
publish_config.json		publish_config.json
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Integration Pipelines for NYC Payroll Data Analytics

Summary of Process

Challenges

About

Releases

Packages

Languages

obinnaonyema/azureDataPipelinesNYCPayroll

Folders and files

Latest commit

History

Repository files navigation

Data Integration Pipelines for NYC Payroll Data Analytics

Summary of Process

Challenges

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages