Skip to main content

Job Timeouts and Run Failures

Use this guide when a deploy, refresh, or Job run stops making progress, ends with a timeout error, or never appears to start. See Troubleshooting Deployments and Refreshes for the main troubleshooting hub. For performance tuning when slowness is the main symptom rather than a hard failure, see Troubleshooting Performance Issues in Coalesce.

Before You Begin

Use the following as you work through timeouts and queue issues:

Tell Stuck or Timed Out Runs Apart From Slow Progress

A long run is not always a failed run. Use these checks before you assume a timeout.

  1. Open the Individual Run for the deploy or refresh and watch whether Duration counts change over a few minutes. If numbers move and stages complete, the run is likely slow rather than stuck.
  2. Compare against recent runs of similar scope. Large graphs or heavy validation naturally take longer.
  3. If nothing advances for an extended period, or the UI stops updating while other pages in the Coalesce App load normally, treat the run as stuck or timed out and follow the guidance in the next section.

For warehouse sizing, query history, incremental design, and staging patterns when runs are slow but still progressing, see Troubleshooting Performance Issues in Coalesce. That article focuses on optimization; this one focuses on timeouts, blocked refresh, and runs that do not start.

When You See DEADLINE_EXCEEDED or a Frozen UI

You might see DEADLINE_EXCEEDED or a deploy or refresh that appears frozen in the Coalesce app. These outcomes often share common causes:

  • Large scope - Many Nodes or heavy object operations in one run.
  • Data platform latency - The warehouse or engine is slow to return or execute work.
  • Network issues - Unstable connectivity between your environment and Coalesce or your data platform.

Timeout behavior depends on operation type, platform, and how your organization uses Coalesce. There is no single timeout value that applies to every engine or tenant.

What to try:

  1. Retry the run - Transient platform or network issues sometimes clear on a second attempt.
  2. Reduce parallelism - When using the API lower the parallelism in run details so fewer Nodes execute at once, which can ease pressure on the data platform and long-running steps.
  3. Run a smaller batch - Use Job selectors, subgraphs, or a narrower refresh so each run does less work in one pass. Splitting work across multiple runs is often more reliable than one very large run.

If timeouts persist after these steps, contact Coalesce Support with the information in What to Send Coalesce Support.

When a Run Never Starts or Stays Queued

If a run does not appear, never leaves Queued, or a scheduled time passes with no new run, check the following:

  • Job not deployed - The Coalesce Scheduler runs deployed Jobs only. If you created or changed a Job in the Workspace but did not commit and deploy, scheduled runs still use the previous definition or the Job may not run as you expect. See Scheduling Jobs in Coalesce and Managing Jobs.
  • Scheduler configuration - Confirm the Job Schedule is saved, not paused, and that the cron expression is in UTC. Refresh the Job Schedules page to see recent execution status. Email notifications for failures can be configured on the schedule.
  • Credentials and authentication - Failed authentication can prevent a run from starting or cause immediate failure. Update credentials on the Environment where the Job runs. Scheduled Jobs use the credentials of the user who created or last modified the schedule; if you change authentication type or rotate secrets, you may need to update each affected schedule or integration. For the full pattern and automation options, see Jobs Fail After Changing Authentication or Credentials and the credential notes in Snowflake Key Pair Authentication.
  • API and CLI triggers - Integrations must send valid userCredentials or an equivalent credential payload on each Start Run request. A missing or expired secret often surfaces as an error on the first step rather than an endless queue; confirm secrets and Environment alignment with your integration docs.

When Refresh Is Blocked After a Failed Deployment

When deployment is in a failed state, Coalesce blocks refresh until the deployment is repaired or you use an explicit override. For fix, rollback, cautious force refresh, and the forceIgnoreWorkspaceStatus warning, see When Deployment Fails and Refresh Is Blocked. For override procedures and flags, follow Managing Refresh Jobs in Failed Deployment Environments.

What to Send Coalesce Support

Include the following so Support can triage quickly:

  • Environment ID - The target Environment for the run.
  • Run ID - The identifier shown for the deploy, refresh, or Job run in the Coalesce App, in API responses, or in CLI output.
  • Short description - What you were running, such as the deploy, refresh, or Job name; what you observed, for example a frozen UI, DEADLINE_EXCEEDED, or a queued run with no start; and approximately when it happened, with timestamp and timezone.

If the run is tied to a specific Job Schedule or API integration, mention that context in one or two sentences.

You can monitor runs and build alerting using the built-in Job Scheduler email options and Coalesce APIs. See Job Notifications and Monitoring for an overview of scheduler notifications, run status endpoints, and retries.

What's Next?