Build ETL Pipelines for Various Scenarios
Design, Develop and Data validation of ETL pipeline with the help of Informatica Powercenter.
Develop Pipeline for Complicated Scenarios such as Incremental Load, SCD and etc.
• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• For Incremental loading of Data, we need to use Mapping Variable
• The control variable can be timestamp or unique Id’s of a column
• With the help of setmaxvariable function, assign a value to the mapping variable
• In SQ, specify condition to load data if and only if load date is greater than the mapping variable (SQL overwrite)
• Check Run properties and session logs for the detailed report
• Validate source and target with Load date in the filter condition
• If source or target is Flat file, then validate the count or records with the help of grep command
• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 1 Implementation: Change of records history is not possible. Only Update and Insert can be performed
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update. Surrogate key (Sequence generator) is not necessary, as only unique record of the employees is present.
• Check Run properties and session logs for the detailed report
• Validate source and target with necessary SQL Queries
• Use Update_date columns to filter records and validate data
• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 2 Implementation: This preserves the changes made to the data. History maintenance is achieved with this type of Design.
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update and Expire. Surrogate key (Sequence generator) is used to update records.
• Check Run properties and session logs for the detailed report • Validate source and target with necessary SQL Queries • Use Active Flag to filter Active records and validate source and Target
• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Creating SCD type 3 Implementation: This provides partial history maintenance
• Use Lookup Transformation to check record exist in the target. Then the source record can be Inserted/ Update. Surrogate key (Sequence generator) is not necessary, as only unique record of the employees is present.
• Check Run properties and session logs for the detailed report
• Validate source and target with necessary SQL Queries
• Use Previous / current columns and Update_date columns to filter records and validate data
Flat file : Simple data sources, where data is stored in text format. Types of Flat files:
- Delimited flat File: Data is separated with delimiter (ex: CSV)
- Fixed width Flat File: Where fields are defined on fixed width
• Import source and target details to the Designer (Both are Flat files)
• Design Mapping from source to target with necessary transformation
• Joiner Transformation used to get data from Multiple sources
• Sorter transformation is used to sort the data and distinct properties defined to get unique records in the target.
Check Run properties and session logs for the detailed report
Validate source and target with Unix Commands
Positive Scenarios:
- Check Record counts in Source and Target
- Check for Duplicates in Target
- Check Transformation logic applied in Target
- Use Diff Command to check for truncation of Data
Negative Scenarios
- Check for Different Delimiter than specified in SQ
- Change Datatype in the column entries
- Error Handling Mechanism (Ex: if the file is absent)
• Import source and target details to the Designer
• Design Mapping from source to target with necessary transformation
• Use Substr and instr functions to trim names to get first name and last name to load in the target table
• Use Joiner transformation to join to two tables with required join type to get the data
• Lookup transformation to get data from the lookup table
• Filter transformation to limit the entry of data into target table
• Router transformation to route the data according to city
• Check Run properties and session logs for the detailed report
• Check Count in Source and Target tables
• Check all the transformation logic are fulfilled such as (First/Last name, Total Sales Field, Null Cities Replaced, Product Price < 2000 are not entered)
• Check the rules of Data Loading are adhered
SQL, ETL, Python, Power BI...