Transform Onboarding Guide
This guide walks you through onboarding to Coalesce Transform, from account creation through team rollout. Whether you're an admin configuring integrations or a developer building your first pipeline, you'll find the steps you need here. For complete setup instructions and step-by-step configuration guides, see Setup and Configuration.
Who This Guide Is For
Transform onboarding involves two main roles:
- Admins and setup owners: Configure accounts, connect data platforms, create projects and workspaces, and manage environments. See Phase 1: Initial Setup.
- Developers: Build pipelines, add sources, create transformations, deploy, and run refreshes. See Phase 2: Build Your First Pipeline and Phase 3: Deploy and Operate.
Prerequisites
Before starting your Transform onboarding, ensure you have:
- Access to your cloud data warehouse (Snowflake, Databricks, BigQuery, or Fabric)
- A Git repository for version control (or plan to create one)
- Basic understanding of SQL and data transformation concepts
- Admin access to configure integrations and add team members
Google Chrome is the only supported browser for Coalesce. Update your browser before you begin.
Phase 1: Initial Setup
Phase 1 covers account creation, data platform connections, project, workspace setup, and version control. For detailed step-by-step guides, see Setup and Configuration.
Account and Access
New Transform accounts can be created through a trial or by contacting Coalesce.
Network requirements: Coalesce connects to your data platform from specific IP addresses. You must allow inbound traffic from Coalesce and outbound traffic to Coalesce domains. See Network Requirements for IP addresses and platform-specific setup (Snowflake network policies, Databricks egress policies, and so on).
User management: Organization Admins add team members and assign roles at the organization, project, and environment levels. For SSO and automated provisioning, see Authentication and User and Team Provisioning.
Connect Your Data Platform
Coalesce supports Snowflake, Databricks, BigQuery, and Fabric. Each platform has its own connection flow and credential options.
- Snowflake: Supports username and password, key pair authentication, and OAuth. See Snowflake connection guide.
- Databricks: Unity Catalog required. See Databricks connection guide.
- BigQuery: Service account authentication. See BigQuery connection guide.
- Fabric: See Fabric connection guide.
Credentials are configured per Workspace and per Environment. You'll enter account URLs, usernames, passwords, or keys, and optionally roles and warehouses. See Connection guides for step-by-step setup.
Create Your Project
A Project is the top-level container for your pipelines. Each Project has one Git repository and can contain multiple Workspaces and Environments.
- Create a new Project from the Coalesce dashboard.
- Configure Git integration: choose your provider (GitHub, GitLab, Bitbucket, or Azure DevOps), add your repository URL, and create a personal access token for authentication.
- Use one Git repository per Project. See Create Your Project and Set Up Version Control for details.
Create Your Workspace
A Workspace is your sandbox for development. You build Nodes, run previews, and validate changes before merging to the main branch and deploying.
- From your Project, click Create Workspace.
- Follow the Onboarding Wizard to configure storage locations and mappings.
- Connect the Workspace to your data platform using the credentials from Connect Your Data Platform.
See Create a Workspace and Storage Locations and Storage Mappings for full details. For Workspace strategy (one per branch versus one per developer), see Workspaces and Coalesce Best Practices.
Set Up Version Control
Version control is required before you can commit and deploy. You need your own Git provider account and a personal access token. Configure the following:
- Git provider: Supported providers are listed in Git Integration.
- Access token: Create a token for Coalesce (each person needs their own). See Set Up Your Git Integration.
- Branch strategy: Decide how you'll use branches (for example, feature branches, main for deployment). See Git Branches.
Avoid these common issues:
- Never develop directly in the main development Workspace.
- Never merge into a Workspace that has uncommitted changes.
- Only one person should work on a given branch at a time.
- Designate a single developer to commit to the main branch to avoid conflicts.
- Avoid making breaking commits directly to the main Git branch.
- Only open one instance of the Git modal per Workspace at a time.
- Do not change Git repository settings while there are uncommitted changes.
Phase 2: Build Your First Pipeline
Learn the Interface
The Build interface is where you design your pipeline. Key areas:
- Projects dashboard: View and switch between Projects and Workspaces.
- Build vs Deploy: Build is for designing and validating; Deploy pushes changes to your data platform.
- Node graph: Visual representation of your pipeline. Nodes are connected by dependencies.
- Sidebar: Edit Node properties, columns, and transforms.
- Problem Scanner: Surfaces errors and warnings before you deploy.
See The Build Interface and What is Transform? for a tour.
Add Sources
Source Nodes represent tables that already exist in your data platform. You add them before building transformations.
- From the Build screen, click the + plus sign, then Add Sources.
- Select the tables you want to add, then Add Sources. You can preview each source before adding.
- Source Nodes appear on the graph. They cannot be modified; they represent raw data.
See Add a Data Source for details. For a guided walkthrough with sample data, see Snowflake Quick Start or Databricks Build Weather Analytics.
Build Transformations
Transformations are built using Nodes. Coalesce includes core Node types (Stage, Dimension, Fact, View, Custom) and the Marketplace offers many more for specific use cases such as incremental loading, data quality, and platform-specific patterns.
Before building from scratch, check the Marketplace for a Node that fits your needs. You may find a pre-built pattern that accelerates development. Core Node types give you a starting point:
- Stage Nodes: Clean and prepare data. Use column transforms to rename columns, change data types, and apply filters.
- Dimension Nodes: Store slowly changing dimensions. Choose Type 1 (overwrite) or Type 2 (historical) based on your needs.
- Fact Nodes: Store transactional or aggregated facts. Use joins and aggregations.
- View Nodes: Virtual tables; no physical storage.
- Custom Nodes: Reusable patterns for advanced use cases.
See Transforms, Nodes, and Stage Nodes for details. For hands-on practice, see Coalesce Foundational Hands-On Guide.
Create, Run, and Preview
After building your pipeline, validate it before deploying.
- Create: Coalesce generates the SQL for each Node based on your configuration.
- Run: Execute the pipeline (or a subset) to populate tables in your development schema.
- Preview: Inspect data in each Node to verify transformations.
Use the Problem Scanner to catch errors early.
Phase 3: Deploy and Operate
Create Environments
Environments define where you deploy: development, QA, and production. Each Environment has its own Storage Mappings, authentication, and parameters.
- In your Workspace, go to Build Settings > Environments.
- Create Environments for DEV, QA, and Production (or your naming convention).
- For each Environment, configure authentication (username and password, OAuth, or key pair) and storage mappings (database and schema).
- Set parameters if needed (for example, for runtime parameters).
See Create Your Environments and Environments for full setup. Each Environment should map to a distinct database and schema.
Deploy
Deployment creates or updates the physical objects (tables, views) in your data platform. Coalesce compares your current Environment state to your desired configuration and executes the necessary DDL.
- Ensure your Workspace is on the main branch and has no uncommitted changes when deploying to production.
- Go to Deploy and select your target Environment.
- Review the deployment plan, then deploy. You can deploy using the Coalesce App, CLI, or third-party tools.
See Deployment Overview and Deploying to an Environment for details. For troubleshooting (storage mapping errors, plan failures, timeouts), see Troubleshooting Deployments and Refreshes.
Refresh and Jobs
Refresh runs the data transformations (DML) defined in your pipeline. Jobs let you run subsets of Nodes on a schedule.
- Create Jobs on the Build page by selecting the Nodes you want to include.
- Deploy before refreshing; Jobs can only run on deployed Nodes.
- Schedule refreshes using the Coalesce Scheduler, CLI, Jobs API, or external tools such as Airflow and GitHub Actions.
See Refresh Your Pipeline and Parameters for RTP and other options. For third-party orchestration, see Third-Party DevOps Tools.
Phase 4: Team Rollout and Best Practices
Workspace Strategy
Choose a strategy that fits your team size and workflow:
- One Workspace per branch: Create a new Workspace for each feature branch; delete when merged. Keeps work isolated.
- One Workspace per user: Each developer has their own Workspace; they manage branches from it. Good for smaller teams.
- One Workspace per feature: Separate Workspaces for distinct features. Reduces conflicts.
Deploy to production only from the main branch in the Main Workspace. See Workspaces and Coalesce Best Practices.
Git Practices
- Commit frequently: Use meaningful commit descriptions. Each commit should cover a single unit of work.
- Branch for development: Keep in-progress work on feature branches.
- Communicate before committing: Ensure the Workspace is in a commit-ready state. Only one person per branch at a time.
- Single committer for main: Designate one developer to commit to the main branch to avoid merge conflicts.
See Git Commits and Git Branches for details. For merge conflicts and sync issues, see Git Integration and related troubleshooting docs.
Phased Rollout
Roll out Transform in stages:
- Core data engineers: Complete setup, build first pipeline, deploy to DEV and prod.
- Broader data team: Add developers, establish workflows and standards.
- Analysts and consumers: If applicable, enable read-only or limited access for downstream users. See RBAC Roles and Permissions for role options.
Optional: Advanced Paths
AI Features
- Copilot: Use natural language to generate transformations. Paste SQL or describe what you want; Copilot creates Nodes. See Copilot and Migrating SQL to Coalesce with Copilot.
- AI-generated descriptions: Add descriptions to Nodes and columns for documentation and lineage.
See Coalesce AI for the full set of AI capabilities.
Programmatic Setup
- Project APIs: Automate project, Workspace, and Environment creation. See API documentation.
- CLI: Deploy and refresh from the command line. See CLI.
- Automation: Use APIs and CLI for CI/CD, workspace provisioning, and environment management.
Integrations
- Catalog: Sync lineage and documentation to Coalesce Catalog for discovery and governance. See Catalog integration with Coalesce.
- Marketplace packages: Add pre-built Node types and patterns. See Marketplace.
- External orchestrators: Integrate with Airflow, GitHub Actions, GitLab, and others. See Third-Party DevOps Tools.
Best Practices Summary
Before and during onboarding, follow these guidelines:
- Before starting: Chrome browser, network allowlisting, Git repository, data platform access. See Coalesce Best Practices.
- Organization setup: Add users, configure SSO if needed, allow inbound and outbound traffic. See Network Requirements and Authentication.
- Project and Workspace setup: One Git repo per Project, use the Onboarding Wizard for Workspaces, map storage correctly. See Setup and Configuration for the full setup flow, or Create Your Project and Create a Workspace for specific steps.
- Environment management: Separate database and schema per Environment. See Create Your Environments.
- Deploy and refresh: Deploy from main branch only, test in lower Environments first, use Jobs for scheduled refreshes. See Deployment Overview and Refresh Your Pipeline.
Get Help
Support Channels
- Shared Slack or Teams channel: Dedicated channel for your team and Coalesce Customer Success.
- Email: support@coalesce.io for quick assistance.
- In-app support: Click the question mark icon for the AI Assistant, or Get Help to open an email to support.
When contacting support, include your Environment ID, run ID, and error details. Use Copy All to Clipboard in the app to capture system information. See Contacting Support for full details.
Self-Service Resources
- Quick Starts: Snowflake Quick Start, Databricks Build Weather Analytics
- Foundational guide: Coalesce Foundational Hands-On Guide
- FAQ and troubleshooting: FAQ and Troubleshooting Deployments and Refreshes
What's Next?
- Coalesce Best Practices for ongoing setup and workflow guidance
- Coalesce Catalog Onboarding Guide to add discovery and governance with Catalog
- Marketplace to explore pre-built Node types and patterns
- Copilot to accelerate development with AI-generated transformations