Skip to content

varma-prasad/ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL-Pipeline

Build ETL Pipelines for Various Scenarios

Table of Contents

Overview

Design, Develop and Data validation of ETL pipeline with the help of Informatica Powercenter.
Develop Pipeline for Complicated Scenarios such as Incremental Load, SCD and etc.

Incremental Load

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• For Incremental loading of Data, we need to use Mapping Variable
• The control variable can be timestamp or unique Id’s of a column
• With the help of setmaxvariable function, assign a value to the mapping variable
• In SQ, specify condition to load data if and only if load date is greater than the mapping variable (SQL overwrite)

image

Data Validation

• Check Run properties and session logs for the detailed report
• Validate source and target with Load date in the filter condition
• If source or target is Flat file, then validate the count or records with the help of grep command

image

SCD Implementation

SCD Type 1

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 1 Implementation: Change of records history is not possible. Only Update and Insert can be performed
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update. Surrogate key (Sequence generator) is not necessary, as only unique record of the employees is present.

image

Data Validation

• Check Run properties and session logs for the detailed report
• Validate source and target with necessary SQL Queries
• Use Update_date columns to filter records and validate data

image

SCD Type 2

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 2 Implementation: This preserves the changes made to the data. History maintenance is achieved with this type of Design.
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update and Expire. Surrogate key (Sequence generator) is used to update records.

image

Data Validation

• Check Run properties and session logs for the detailed report • Validate source and target with necessary SQL Queries • Use Active Flag to filter Active records and validate source and Target

image

SCD Type 3

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 3 Implementation: This provides partial history maintenance
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update. Surrogate key (Sequence generator) is not necessary, as only unique record of the employees is present.

image

Data Validation

• Check Run properties and session logs for the detailed report
• Validate source and target with necessary SQL Queries
• Use Previous / current columns and Update_date columns to filter records and validate data

image

F2F Pipeline

Flat file : Simple data sources, where data is stored in text format. Types of Flat files:

  1. Delimited flat File: Data is separated with delimiter (ex: CSV)
  2. Fixed width Flat File: Where fields are defined on fixed width

Design and Development

• Import source and target details to the Designer (Both are Flat files)
• Design Mapping from source to target with necessary transformation
• Joiner Transformation used to get data from Multiple sources
• Sorter transformation is used to sort the data and distinct properties defined to get unique records in the target.

image

Data Validation

Check Run properties and session logs for the detailed report
Validate source and target with Unix Commands

image

Positive Scenarios:

  1. Check Record counts in Source and Target
  2. Check for Duplicates in Target
  3. Check Transformation logic applied in Target
  4. Use Diff Command to check for truncation of Data

image

Negative Scenarios

  1. Check for Different Delimiter than specified in SQ
  2. Change Datatype in the column entries
  3. Error Handling Mechanism (Ex: if the file is absent)

image

ETL Pipeline Project

Source to Target Mapping Sheet and Rules

image

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Use Substr and instr functions to trim names to get first name and last name to load in the target table
• Use Joiner transformation to join to two tables with required join type to get the data
• Lookup transformation to get data from the lookup table
• Filter transformation to limit the entry of data into target table
• Router transformation to route the data according to city

image image image

Data Validation

• Check Run properties and session logs for the detailed report
• Check Count in Source and Target tables
• Check all the transformation logic are fulfilled such as (First/Last name, Total Sales Field, Null Cities Replaced, Product Price < 2000 are not entered)
• Check the rules of Data Loading are adhered

image


🛠 Tools used

Authors

🛠 Skills

SQL, ETL, Python, Power BI...

🔗 Links

linkedin

About

Build ETL Pipelines for Various Scenarios

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published