Data Quality Testing

Coalesce provides built-in testing capabilities to assess the quality of your data.

Node Tests

These tests consist of a user-provided SQL SELECT statement, which, if matched by the data in the node's columns, will fail the test. Let's go through a simple example with Snowflake sample data to demonstrate.

Example Node Test

The column C_NATIONKEY is an integer that corresponds to a specific country, and we know that this number must be positive. No countries in the data set should have a negative number, so we will test for this.

  1. Click on Testing
  2. Click on New Test in the Node section to create a new test
  3. In the text box, type in SELECT * FROM {{ ref_no_link('TARGET','STG_CUSTOMER')}} WHERE C_NATIONKEY < 0
    but make sure to replace TARGET with the name of your storage mapping
  4. Uncheck the Continue on Failure toggle
Example Node Test

Example node test

  1. Run the node, it should succeed. You can confirm by opening the Results pane and seeing the different stages.
Example successful node test results

Example successful node test results

  1. Try reversing the < to >, which will match all of our records (nation keys are all positive integers). This will cause the test to fail and the node's data will not be processed.


Before or After?

Node tests can be run before or after the node's transformations, and this is user-configurable.

Column Tests

Column tests are meant to check for null and unique values, and are always run after transformations. This cannot be changed unless using a UDN.

Example Column Test

  1. Delete the node test(s) you made earlier by clicking on their garbage bin icon(s)
  2. Click on TestingColumn
  3. Select both Unique and Null
Example Column Test

Example column test

  1. Now click on Null next to C_NATIONKEY. This will check to ensure all entries for this column are populated (not null).
  2. Run the node and it should succeed, as there's no null entries.
Successful column test

Successful column test

  1. Now try also selecting Unique for the C_NATIONKEY column. This will check to ensure all entries for this column are unique.
  2. Run the node and it will fail, as C_NATIONKEY has repeated entries. Notice that the data was still processed, as column tests will always continue on failure.
Failed column test

Failed column test