Skip to main content

SCD Type 2 Change Detection and String Case

This page explains how Coalesce decides a tracked attribute changed for SCD Type 2 Dimension Nodes, how letter case affects that decision, and what you can do in SQL when you want values such as Monday and monday treated as the same.

How Type 2 Detects a Change

Coalesce uses your Dimension Node Advanced Deploy settings, including change tracking columns and business keys, to build MERGE or equivalent refresh logic for your warehouse. When Coalesce compares the incoming row with the current row, it relies on SQL equality for the columns that participate in that logic. If the database treats the old and new values as not equal, the pipeline can treat the row as changed, which may create a new historical version for Type 2.

Change Tracking Columns

The attributes you selected so that updates create a new historical version instead of only overwriting the current row. Changes to those columns drive Type 2 behavior, together with your business keys and any Last Modified Comparison options your package documents in Advanced Deploy. Coalesce does not apply a separate string comparison mode: detection follows the comparison rules of the warehouse for the data types and expressions in the generated SQL.

String Comparison by Warehouse

Equality for string-like types follows your platform defaults unless you change expressions or collation. The examples below use simple string literals. Always confirm behavior for your data types, collation, and session settings.

In BigQuery:

  • STRING comparisons are case-sensitive by default.
  • Example: 'Monday' != 'monday'.
  • BigQuery SQL has no ILIKE. For case-insensitive pattern matching, use REGEXP_CONTAINS with a case-insensitive regular expression, for example the (?i) flag in the pattern, or normalize with UPPER() or LOWER() and compare those results.
  • UPPER() and LOWER() are available for case normalization in expressions.

When Letter Case Alone Appears as a Change

If a source sends Monday today and monday tomorrow on the same business key, and the changed attribute is change-tracked, the pipeline may open a new Type 2 version because the values are not equal under SQL rules. That is expected when you have not normalized strings upstream.

If your requirement is that only semantic value matters and case-only differences should not create history, you must make those values compare equal before they reach the Dimension Node. Typical approaches include standardizing case in a Stage, View, or column mapping.

What You Cannot Configure Away in the UI

Dimension Advanced Deploy does not expose a setting to turn on case-insensitive Type 2 behavior or to normalize strings inside Coalesce merge comparison for all tracked columns.

Your package may document Exclude Columns from Merge on Snowflake Base Dimension behavior. That option applies to SCD Type 1 merge options: it excludes columns from comparison and update phases for that merge style. It is not a general control to ignore letter case for Type 2.

DDL Case Insensitive applies to external load naming

The DDL Case Insensitive option on External Data Nodes in the External Data Package concerns how column identifiers and load DDL are handled for certain file-based loads. It does not change how a Dimension Node compares cell values during SCD Type 2 refresh. Don't use that toggle expecting different merge value semantics for dimensions.

Normalize Strings Upstream

The supported approach is to apply a consistent expression to any business key or change-tracked attribute where casing would otherwise cause false change signals. Typical patterns use warehouse functions such as UPPER, LOWER, or INITCAP in the layer before the Dimension Node so both the existing target row and the incoming batch share the same canonical form.

Example patterns in a Stage or View select list:

UPPER(customer_name) AS customer_name,
LOWER(email) AS email,
INITCAP(product_title) AS product_title

Pick one standard per column and keep it stable so historic rows and new loads stay aligned.

Optional: Tracked Column Versus Display Column

If you must keep exact source casing for reporting but still want stable Type 2 detection, you can sometimes maintain the following:

  • Tracked column: Normalized expression used in mappings and change detection.
  • Display column: Raw or lightly transformed value for presentation.

This adds modeling and mapping overhead. Verify that business keys, grain, and Advanced Deploy selections still match your Kimball or internal standards and that the generated SQL does what you expect.

Trade-Offs

Use the following table to compare normalizing case upstream with leaving source strings as loaded.

ApproachWhat you gainWhat to watch for
Normalize upstreamFewer spurious Type 2 rows from benign case churnThe stored value in the tracked column may not preserve original letter case unless you keep a separate display field
Leave values as loadedPreserves literal strings from the sourceCase-only flips can inflate history when the warehouse treats them as different values

Troubleshooting

If case behavior causes unexpected Type 2 rows or you relied on the wrong UI control, start with the following table.

IssueResolution
New Type 2 rows appear when only letter case changed on a change-tracked stringNormalize that column upstream with UPPER, LOWER, or INITCAP, or use warehouse-specific case-insensitive comparison in the layer that feeds the Dimension Node so equality matches your intent.
You expected DDL Case Insensitive to fix dimension mergesThat setting applies to External Data Node load DDL and naming, not to Dimension Node value comparison. See External Data Package.

What's Next?