Deleting Records from Fact and Dimension Nodes
Default Behavior
By default, Coalesce Dimension Nodes don’t perform hard deletes. Instead, they use MERGE
statements to manage updates and inserts.
Recommended Approaches
1. Soft Deletes (Recommended)
The most common pattern is to use an IS_ACTIVE or IS_DELETED flag to mark records as inactive rather than physically removing them. This approach:
- Preserves historical data for auditing.
- Allows you to recreate data states at specific points in time.
2. Custom Deletion Logic
You can add deletion handling through custom logic, including:
- Pre-SQL or Post-SQL statements in your Node.
- Custom
MERGE
statements that includeDELETE
clauses when conditions are met. - Source-to-target comparison logic to detect and remove obsolete records.
3. Using Streams for Delete Detection
Snowflake Streams can capture DELETE
operations in source tables. You can use these changes to propagate deletes through your pipeline.
4. Full Refresh Strategy
Some organizations handle deletes with periodic full refreshes. While this removes deleted records, it can be inefficient for larger datasets.
Implementation Considerations
When planning delete strategies, consider the following:
- Fact tables: Often use persistent staging with
MERGE
statements that include delete logic. - Dimension tables: Can be configured for deletes through custom Node types or post-processing.
- Source systems: Ideally, configure source systems to use soft deletes instead of hard deletes when possible.