ETL-Pipeline

Build ETL Pipelines for Various Scenarios

Overview

Design, Develop and Data validation of ETL pipeline with the help of Informatica Powercenter.
Develop Pipeline for Complicated Scenarios such as Incremental Load, SCD and etc.

Incremental Load

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• For Incremental loading of Data, we need to use Mapping Variable
• The control variable can be timestamp or unique Id’s of a column
• With the help of setmaxvariable function, assign a value to the mapping variable
• In SQ, specify condition to load data if and only if load date is greater than the mapping variable (SQL overwrite)

Data Validation

• Check Run properties and session logs for the detailed report
• Validate source and target with Load date in the filter condition
• If source or target is Flat file, then validate the count or records with the help of grep command

SCD Implementation

SCD Type 1

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 1 Implementation: Change of records history is not possible. Only Update and Insert can be performed
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update. Surrogate key (Sequence generator) is not necessary, as only unique record of the employees is present.

Data Validation

• Check Run properties and session logs for the detailed report
• Validate source and target with necessary SQL Queries
• Use Update_date columns to filter records and validate data

SCD Type 2

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 2 Implementation: This preserves the changes made to the data. History maintenance is achieved with this type of Design.
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update and Expire. Surrogate key (Sequence generator) is used to update records.

Data Validation

• Check Run properties and session logs for the detailed report • Validate source and target with necessary SQL Queries • Use Active Flag to filter Active records and validate source and Target

SCD Type 3

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 3 Implementation: This provides partial history maintenance
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update. Surrogate key (Sequence generator) is not necessary, as only unique record of the employees is present.

Data Validation

• Check Run properties and session logs for the detailed report
• Validate source and target with necessary SQL Queries
• Use Previous / current columns and Update_date columns to filter records and validate data

F2F Pipeline

Flat file : Simple data sources, where data is stored in text format. Types of Flat files:

Delimited flat File: Data is separated with delimiter (ex: CSV)
Fixed width Flat File: Where fields are defined on fixed width

Design and Development

• Import source and target details to the Designer (Both are Flat files)
• Design Mapping from source to target with necessary transformation
• Joiner Transformation used to get data from Multiple sources
• Sorter transformation is used to sort the data and distinct properties defined to get unique records in the target.

Data Validation

Check Run properties and session logs for the detailed report
Validate source and target with Unix Commands

Positive Scenarios:

Check Record counts in Source and Target
Check for Duplicates in Target
Check Transformation logic applied in Target
Use Diff Command to check for truncation of Data

Negative Scenarios

Check for Different Delimiter than specified in SQ
Change Datatype in the column entries
Error Handling Mechanism (Ex: if the file is absent)

ETL Pipeline Project

Source to Target Mapping Sheet and Rules

Design and Development

• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Use Substr and instr functions to trim names to get first name and last name to load in the target table
• Use Joiner transformation to join to two tables with required join type to get the data
• Lookup transformation to get data from the lookup table
• Filter transformation to limit the entry of data into target table
• Router transformation to route the data according to city

Data Validation

• Check Run properties and session logs for the detailed report
• Check Count in Source and Target tables
• Check all the transformation logic are fulfilled such as (First/Last name, Total Sales Field, Null Cities Replaced, Product Price < 2000 are not entered)
• Check the rules of Data Loading are adhered

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
F2F		F2F
Incremental_Load		Incremental_Load
Project		Project
SCD		SCD
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL-Pipeline

Table of Contents

Overview

Incremental Load

Design and Development

Data Validation

SCD Implementation

SCD Type 1

Design and Development

Data Validation

SCD Type 2

Design and Development

Data Validation

SCD Type 3

Design and Development

Data Validation

F2F Pipeline

Design and Development

Data Validation

ETL Pipeline Project

Source to Target Mapping Sheet and Rules

Design and Development

Data Validation

🛠 Tools used

Authors

🛠 Skills

🔗 Links

About

Releases

Packages

varma-prasad/ETL-Pipeline

Folders and files

Latest commit

History

Repository files navigation

ETL-Pipeline

Table of Contents

Overview

Incremental Load

Design and Development

Data Validation

SCD Implementation

SCD Type 1

Design and Development

Data Validation

SCD Type 2

Design and Development

Data Validation

SCD Type 3

Design and Development

Data Validation

F2F Pipeline

Design and Development

Data Validation

ETL Pipeline Project

Source to Target Mapping Sheet and Rules

Design and Development

Data Validation

🛠 Tools used

Authors

🛠 Skills

🔗 Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages