Incremental Loading with Coalesce
Overview
Incremental data loading is a technique used in data warehousing to efficiently update large datasets by only processing and loading new or changed data since the last update, instead of reloading the entire dataset each time.
For example, you're an e-commerce company and only want to load order changes, not all orders every time. You can use the Incremental Node to only update records that have shipped.
Coalesce makes this process easy by offering the Incremental Node Package. By using our Package, you can quickly get started building your pipeline with minimal code and configuration. The package contains the following Node Types:
- Incremental Load - Performs updates by loading only new or changed records. It minimizes data duplication and optimizes resource use by focusing only on changes since the last load.
- Looped Load - Runs multiple loads in a loop, which allows the system to handle large datasets in smaller batches. Each loop processes a subset of data, reducing memory usage and improving performance for larger tables.
- Run View - Creates a view that tracks the most recent runs. The view is refreshed with each execution, allowing for easy identification of the latest updates or changes without reloading the entire dataset.
These Node Types make sure you can follow best practices such as:
- Change detection - Identifying new, modified, or deleted records in the source system since the last data load.
- Efficiency - Reducing processing time and resource usage by only handling changes instead of the full dataset.
- Timestamp or version tracking - Using timestamps or version numbers to determine which records have changed.
- Maintaining historical data - Preserving historical information while updating with new data.
You can learn more about Incremental Processing Strategies in our blog post.
Coalesce uses High Water Mark as a fast and efficient way to process data
Before You Begin
- If you don’t have a Coalesce account, you can sign up through Snowflake Partner Connect.
- This demo uses the Snowflake sample data sets.
- Set up your Storage Locations and Storage Mappings.
- Add your source data.
Install the Incremental Loading Package
You're going to install the Incremental Loading Package.
-
Choose a package from the Coalesce Marketplace.
-
Make note of the Package ID.
-
In the Coalesce App, go to Build Settings > Packages. Click Install.
-
Enter the Package ID you previously copied.
-
Select the version you want to install.
-
Enter a Package Alias. The alias is a unique and descriptive name for the package so you can easily identify it later.
- A package alias can only contain letters, numbers, and dashes. It can’t contain consecutive dashes.
- Package aliases can’t start or end with a dash.
-
Install your package.
Click on Node Types to see the three Node Types included in the package:
- Incremental Load
- Loop Load
- Run View