Strategies for Testing Data Warehouse Applications

Testing Goals

  • Data completeness.
    Ensures that all expected data is loaded.
  • Data transformation.
    Ensures that all data is transformed correctly according to business rules and/or design specifications.
  • Data quality.
    Ensures that the ETL application correctly rejects, substitutes default values, corrects or ignores and reports invalid data.
  • Performance and scalability.
    Ensures that data loads and queries perform within expected time frames and that the technical architecture is scalable.
  • Integration testing.
    Ensures that the ETL process functions well with other upstream and downstream processes.
  • User-acceptance testing.
    Ensures the solution meets users' current expectations and anticipates their future expectations.
  • Regression testing.
    Ensures existing functionality remains intact each time a new release of code is completed.

 


Data Completeness

Preconditions.

1. Objects are deinstalled successfully.
2. Objects are installed successfully.
3. -init scripts have been fulfilled.
4. the n.a., n.d. and initial data is in dimensions, facts are empty.


Example 1, DIM. Comparing record counts between source and target data.

Script:

SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT COUNT ( * ) AS cnt
           FROM wrk_table_contacts
         MINUS
         SELECT COUNT ( * ) AS cnt
           FROM mv_table_for_dataload );

Strategy:

1. Run loading process.
2. Run loading process again.
3. Test target minus source.
4. Test source minus target.


Example 2, DIM. The same script but sometimes it's needed to exclude default values.

SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT table_channel_code AS cnt
           FROM dim_table_channels
          WHERE table_channel_id NOT IN (-98, -99)
         MINUS
         SELECT DISTINCT table_channel AS cnt
           FROM wrk_table_contacts
          WHERE table_channel NOT IN ('n.a', 'n.d.') );


Example 3, DIM. Populating the full contents of each field to validate that no truncation occurs at any step in the process.

Script:

 SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT agent_id_ticket
              , agent_id
              , table_theme_id
              , table_channel
              , table_channel_id
              , table_status
           FROM wrk_table_contacts
         MINUS
         SELECT agent_id_ticket
              , agent_id
              , table_theme_id
              , table_channel
              , table_channel_id
              , table_status
           FROM mv_table_for_dataload );


Example 4, FCT. Comparing record counts between source and target data. Sometimes it is hard to compare 1 to 1 source and target tables. Then compare all essences.

Script:

-- Verify total count of tickets
SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT COUNT ( table_srcid ) AS cnt
           FROM mv_table_for_dataload
         MINUS
         SELECT COUNT ( table_srcid ) AS cnt
           FROM fct_table_contacts );


Example 5, FCT.  Find lost data.

-- Verify total count of lost data
SELECT COUNT ( * ) AS LOST
           FROM mv_table_for_dataload a
              , fct_table_contacts b
          WHERE a.table_srcid = b.table_srcid(+)
            AND a.contact_srcid = b.contact_srcid(+)
            AND b.contact_srcid IS NULL );


 

Data Transformation

Example 1. Validate that data types in the warehouse are as specified in the design and/or the data model.

Manually.

Strategy:

1. Review ETL code.
2. Wait  couple of days.
3. Review ETL code again. If there are defects then repeat from 1 point.


Example 2. SCD transformations. Find wrong flags.

Test 1:

SELECT COUNT ( * ) AS ERRORS
  FROM dim_table_contact_actions_scd
 WHERE valid_to = TO_DATE ( '31.12.9999'
                          , 'DD.MM.YYYY' )
   AND active_flag = 'N'
   AND table_contact_action_surr_id NOT IN (-98, -99);

Test 2:

 SELECT COUNT ( * ) AS ERRORS
FROM(SELECT table_contact_action_surr_id, table_contact_action_id, valid_from, valid_to, active_flag cnt
  FROM dim_table_contact_actions_scd
 WHERE valid_to != TO_DATE ( '31.12.9999'
                           , 'DD.MM.YYYY' )
   AND active_flag = 'Y'
   AND table_contact_action_surr_id NOT IN (-98, -99));


Example 3. SCD transformations. Find lost time frames in dimension.

Script:

SELECT SUM ( CASE WHEN valid_to <> NVL ( valid_from_next, '31.12.9999' ) THEN '1' ELSE '0' END ) AS ERRORS
           FROM ( SELECT table_contact_action_surr_id
                       , table_contact_action_id
                       , valid_to
                       , LEAD ( valid_from ) OVER (PARTITION BY table_contact_action_id ORDER BY valid_from ASC)
                            AS valid_from_next
                    FROM dim_table_contact_actions_scd
                   WHERE table_contact_action_id NOT IN (-99, -98) ) );


Example 4. SCD transformations. Verify time filter.

Manually.

...
AND ticket.event_dt >= agent.valid_from(+)
AND ticket.event_dt < agent.valid_to(+) --> if (+) then verify that there is NVL.
...

Don't use BETWEEN function in this case!!!


Data Quality

Example 1. Verify all dimension IDs in fact table, every dim_id.

Script:

SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT product_surr_id
           FROM fct_table_contacts
         MINUS
         SELECT product_surr_id
           FROM dim_products_scd );


Example 2. Verify all level codes in the filters.

...
FROM DIM_PRODUCTS_SCD prod,
...
Verify filter -> WHERE prod.level_code = 'AGENT'


Example 3. Verify comments in code.

...
WHERE system_id = 1
...
Verify filter -> -- source system


Example 4. Verify comments of db objects.

TableStructure.sql - generate the html file with table structure and comments.


Performance and scalability

Example 1.

DataAmount.sql - shows the size and growth index of fact table.


Example 2. Analyse log table.

SELECT *
  FROM lib.vw_log_t
 WHERE SYSTEM like '%Project Title%'
   AND timestamp > TO_DATE('10.05.2010', 'DD.MM.YYYY')
   ORDER BY timestamp DESC;


Regression Testing

Building automation during system testing will make the process of regression testing much smoother.

Fitnesse is a good point.


 
Copyright
Andrey Marchenko