ETL Testing Course

Home >> Courses Offered >> Software Testing >> ETL Testing Course

ETL Testing Course

The data in a Data Warehouse system is loaded with an ETL (Extract, Transform, Load) tool. As the name suggests, it performs the following three operations
  • Extracts the data from your transactional system which can be an Oracle, Microsoft, or any other relational database.
  • Transforms the data by performing data cleansing operations, and then
  • Loads the data into the OLAP data Warehouse.
You can also extract data from flat files like spreadsheets and CSV files using an ETL tool and load it into an OLAP data warehouse for data analysis and reporting. Let us take an example to understand it better.

 

Example:

  • Let us assume there is a manufacturing company having multiple departments such as sales, HR, Material Management, EWM, etc. All these departments have separate databases which they use to maintain information w.r.t. their work and each database has a different technology, landscape, table names, columns, etc. Now, if the company wants to analyze historical data and generate reports, all the data from these data sources should be extracted and loaded into a Data Warehouse to save it for analytical work.
  • An ETL tool extracts the data from all these heterogeneous data sources, transforms the data (like applying calculations, joining fields, keys, removing incorrect data fields, etc.), and loads it into a Data Warehouse. Later, you can use various Business Intelligence (BI) tools to generate meaningful reports, dashboards, and visualizations using this data.

 

ETL Process:

Let us now discuss in a little more detail the key steps involved in an ETL procedure
    Extracting the Data
  • It involves extracting the data from different heterogeneous data sources. Data extraction from a transactional system varies as per the requirement and the ETL tool in use. It is normally done by running scheduled jobs in off-business hours like running jobs at night or over the weekend.
    Transforming the Data
  • It involves transforming the data into a suitable format that can be easily loaded into a DW system. Data transformation involves applying calculations, joins, and defining primary and foreign keys on the data. For example, if you want % of total revenue which is not in database, you will apply % formula in transformation and load the data. Similarly, if you have the first name and the last name of users in different columns, then you can apply a concatenate operation before loading the data. Some data doesn’t require any transformation; such data is known as direct move or pass through data.
  • Data transformation also involves data correction and cleansing of data, removing incorrect data, incomplete data formation, and fixing data errors. It also includes data integrity and formatting incompatible data before loading it into a DW system.
    Loading the Data into a DW System
  • It involves loading the data into a DW system for analytical reporting and information. The target system can be a simple delimited flat file or a data warehouse.

 

ETL Tool Function:

A typical ETL tool-based data warehouse uses staging area, data integration, and access layers to perform its functions. It’s normally a 3-layer architecture.
  • Staging Layer − The staging layer or staging database is used to store the data extracted from different source data systems.
  • Data Integration Layer − The integration layer transforms the data from the staging layer and moves the data to a database, where the data is arranged into hierarchical groups, often called dimensions, and into facts and aggregate facts. The combination of facts and dimensions tables in a DW system is called a schema.
  • Access Layer − The access layer is used by end-users to retrieve the data for analytical reporting and information.
  • ETL testing is done before data is moved into a production data warehouse system. It is sometimes also called as table balancing or production reconciliation. It is different from database testing in terms of its scope and the steps to be taken to complete this.

    The main objective of ETL testing is to identify and mitigate data defects and general errors that occur prior to processing of data for analytical reporting.

     

    ETL Testing

    – Tasks to be Performed
      Here is a list of the common tasks involved in ETL Testing −
    • Understand the data to be used for reporting.
    • Review the Data Model.
    • Source to target mapping.
    • Data checks on source data.
    • Packages and schema validation.
    • Data verification in the target system.
    • Verification of data transformation calculations and aggregation rules.
    • Sample data comparison between the source and the target system.
    • Data integrity and quality checks in the target system.
    • Performance testing on data.

     

    ETL Testing

      ETL testing involves the following operations -
    • Validation of data movement from the source to the target system.
    • Verification of data count in the source and the target system.
    • Verifying data extraction, transformation as per requirement and expectation.
    • Verifying if table relations – joins and keys – are preserved during the transformation.
    Common ETL testing tools include QuerySurge, Informatica, etc.

     

    ETL Testing – Performance

    ETL performance tuning is used to ensure if an ETL system can handle an expected load of multiple users and transactions. Performance tuning typically involves server-side workload on the ETL system. It is used to test the server response in multiuser environment and to find bottlenecks. These can be found in source and target systems, mapping of systems, configuration like session management properties, etc.

     

    How to Perform ETL Testing Performance Tuning?

      Follow the steps given below to perform ETL testing performance tuning −
    • Step 1 − Find the load that is being transformed in production.
    • Step 2 − Create new data of that same load or move from Production data to your local performance server.
    • Step 3 − Disable the ETL until you generate the load required.
    • Step 4 − Take the count of the needed data from the tables of the database.
    • Step 5 − Note down the last run of ETL and enable the ETL, so that it will get enough stress to transform the entire load created. Run it
    • Step 6 − After the ETL completes its run, take the count of the data created.

     

    ETL Testing – Scalability

    The goal of ETL testing is to achieve credible data. Data credibility can be attained by making the testing cycle more effective.

    A comprehensive test strategy is the setting up of an effective test cycle. The testing strategy should cover test planning for each stage of ETL process, every time the data moves and state the responsibilities of each stakeholder, e.g., business analysts, infrastructure team, QA team, DBA’s, Developers and Business Users.

      To ensure testing readiness from all aspects, the key areas a test strategy should focus on are −
    • Scope of testing − Describe testing techniques and types to be used.
    • Setting up the test environment.
    • Test data availability − It is recommended to have production like data covering all/critical business requirement.
    • Data quality and performance acceptance criteria.
    •  

      ETL Testing – Data Quality

      Checking data quality during ETL testing involves performing quality checks on data that is loaded in the target system. It includes the following tests

      Number check

      The Number format should be same across the target system. For example, in the source system, the format of numbering the columns is x.30, but if the target is only 30, then it has to load not prefixing x. in target column number.

      Date Check

      The Date format should be consistent in both the source and the target systems. For example, it should be same across all the records. The Standard format is: yyyy-mm-dd.

      Precision Check

      Precision value should display as expected in the target table. For example, in the source table, the value is 15.2323422, but in the target table, it should display as 15.23 or round of 15.

      Data Check

      It involves checking the data as per the business requirement. The records that don’t meet certain criteria should be filtered out.

      Example − Only those records whose date_id >=2015 and Account_Id != ‘001’ should load in the target table.

      Null Check

      Some columns should have Null as per the requirement and possible values for that field.

      Example − Termination Date column should display Null unless and until its Active status Column is “T” or “Deceased”.

      Other Checks

      Common checks like From_Date should not greater than To_Date can be done.