Streams and Tasks
Overview
Snowflake streams and tasks out of the box.
Installation
- Copy the Package ID:
@coalesce/snowflake/streams-and-tasks - In Coalesce, open the Workspace where you wish to install the package.
- Go to the Build Setting of the Workspace, tab Packages, and click the Install button on the top right of the page.
- Paste the Package ID, and proceed with the installation process.
Description
Snowflake - Streams and Tasks Package - Brief Summary
- Work with Task: Used for intermediate "Work" tables. It automates the cleaning and transformation of data in the middle of a pipeline using a scheduled Task.
- Dimension with Task: Automates the loading of descriptive business data (e.g., Customers, Products). It uses a Task to ensure that attributes are kept up-to-date incrementally.
- Fact with Task: Handles large-scale transactional data. This node uses a Task to process new metrics or events at regular intervals, ensuring large tables stay current without full reloads.
- Task DAG Create Root: Defines the "parent" or starting point of a Directed Acyclic Graph (DAG). This is the first task in a sequence that triggers all other dependent child tasks.
- Task DAG Resume Root: A management node used to "start" or "enable" the entire chain of tasks. In Snowflake, tasks are created in a 'Suspended' state; this node ensures the pipeline is actually running.
- Stream: Directly implements Snowflake's Change Data Capture (CDC). It acts as a bookmark on a source table to track new Inserts, Updates, and Deletes without moving data itself.
- Stream and Insert or Merge: A combined logic node. It reads the "delta" changes from a Stream and applies them to a target table using either a simple INSERT or a MERGE statement.
- Delta Stream Merge: An advanced CDC node designed for high-precision synchronization. It uses the Stream’s metadata (like
METADATA$ACTION) to ensure the target table perfectly mirrors the source, even when complex updates or deletes occur. - Stream for Directory Table: Specifically designed for Unstructured Data. It tracks changes to files (like PDFs or images) stored in a Snowflake Stage, allowing you to trigger processing as soon as a new file is uploaded.
- Insert or Merge with Task: A general-purpose automation node. It encapsulates the SQL logic to update a target table and schedules it to run via a Task, either on a set interval or when data arrives in a stream.
Summary: These nodes work together to create Continuous Data Pipelines. The Stream nodes capture the "Delta" (CDC), and the Task nodes provide the "Schedule" (Orchestration). By using these together, you achieve Transactional Consistency: data is only processed once, and the stream's offset only advances when the task successfully commits.
Nodetypes Config Matrix
Matrix 1: Task-Based Node Types
| Category | Feature | Work with Task | Dimension with Task | Fact with Task | Insert or Merge with Task | Task DAG Create Root | Task DAG Resume Root |
|---|---|---|---|---|---|---|---|
| Development | Development Mode | ✅ | ✅ | ✅ | ✅ | ⬜ | ⬜ |
| Multi Source | ✅ | ✅ | ✅ | ✅ | ⬜ | ⬜ | |
| Create As | ⬜ | ⬜ | ⬜ | ✅ | ⬜ | ⬜ | |
| Logic | Distinct / Group By All | ✅ | ✅ | ✅ | ✅ | ⬜ | ⬜ |
| Order By | ✅ | ✅ | ✅ | ✅ | ⬜ | ⬜ | |
| Truncate Before | ✅ | ⬜ | ✅ | ✅ | ⬜ | ⬜ | |
| Keys | Business / Table Keys | ⬜ | ✅ | ⬜ | ✅ | ⬜ | ⬜ |
| Change Tracking (Type 2) | ⬜ | ✅ | ⬜ | ⬜ | ⬜ | ⬜ | |
| Record Date / Timestamp | ⬜ | ⬜ | ⬜ | ✅ | ⬜ | ⬜ | |
| Cluster Key | ✅ | ✅ | ✅ |