Databricks Quickstart
Before You Begin
- Go through our Databricks Connection Guide to get your account set up with Coalesce.
- Add the AccuWeather data source to your catalog.
forecast_daily_calendar_metric
- Create a Project and Workspace.
- You can skip creating a version control account.
Reach out to our sales team.
Add a Data Source
In your Workspace, you are on the Build Interface. This is where you'll build your data pipelines.
-
Make sure you are on Nodes.
-
Click on the "+" sign > Add Sources.
-
Add Sources to SQL Pipeline will open. Find the
forecast_daily_calendar_metric
table. Double-clicking the source will give you a preview of the data. -
Then click Add Source.
-
The source Node is added to your DAG. Now you can start building.
We support multiple data types from different sources. Take a look at the External Data Package.
Add a Stage Node
-
Right-click on the source Node > Add Node > Stage.
-
It will open the added stage Node in the Node Editor.
-
Next to the name, click the pencil and name it
stg_weather_metrics
. Databricks Node names are formatted as lowercase. -
Then click Create and Run.
-
You’ve just populated your first table in Databricks. Click on the Results tab. You can see each stage complete and the SQL executed.
Create renders and executes all SQL stages of the create template defined in the Node Type definition. Run renders and executes all SQL stages of the run template defined in the Node Type definition.

Add a Transformation
-
If the Node isn’t open, you can open the
stg_weather_metrics
by double clicking the Node or right-click and select Edit. -
Sort the column by name, by clicking on the Column Name table header.
-
Scroll to the column named
temperature
. -
Double-click in the Transform column next to
temperature
. -
Add
ROUND({{SRC}}, 1)
. This uses standard SQL and Helper Tokens.SRC
evaluates to the fully qualified name and location. After adding, it should update to the correctSRC
. For example,ROUND(`forecast_hourly_metric`.`temperature`, 1)
. -
Click Validate Select. Validate Select lets you validate your transformations and joins.
-
Scroll down in the Results panel to see the SQL applied to
temperature
. -
Click Create and Run.
-
In the Data panel, scroll over to
temperature
, and see that the value has been rounded to one decimal place.
Add a Dimension Node
You’ll create a Dimension Node that will track how weather metrics change over time and each location. First we'll add some transformations to our stage Node to prep the data.
-
Make sure you're in the
stg_weather_metrics
Node. -
You need to delete any columns not being used. Un-sort the columns by clicking the Column Name until the arrows disappear. Click and drag the following columns so they are at the top:
city_name
country_code
datetime_valid_local
temperature
humidity_relative
wind_speed
precipitation_type_desc
cloud_cover_perc_total
-
Click on the first column to be deleted, hold Shift or Command then select the last column to be deleted. Right-click and Delete the columns.
A demo showing selecting multiple columns for deletion. -
Rename the column
datetime_valid_local
tomeasurement_time
. Double click onmeasurement_time
to rename it. -
Rename
cloud_cover_perc_total
tocloud_cover_percent
. -
Add the following transformations to standardize the data:
temperature
-ROUND({{SRC}}, 1)
to round to one decimal place, if not already added.humidity_relative
-ROUND({{SRC}}, 1)
to round to one decimal place.precipitation_type_desc
-COALESCE({{SRC}}, 'None')
to replace null values.
-
Then double click in the Data Type for
cloud_cover_percent
and type INTEGER. -
Click Validate Select to preview the SQL that will be run.
-
Then click Create and Run.
-
Return to your DAG and create a Dimension Node off the
stg_weather_metrics
. Name itdim_weather_metrics
. -
Under Options, select the following as Business Keys by selecting them in the box, then click the arrow to move them over.
city_name
country_code
measurement_time
-
In Change Tracking, select the following:
temperature
precipitation_type_desc
wind_speed
cloud_cover_percent
-
Then click Create and Run.

Create a Fact Node
You'll create a Fact Node to track historical data. You'll also use the Join tab.
-
Create a Stage Node called
stg_weather_facts
off the Dimension Nodedim_weather_metrics
. -
You'll need to create a Join. The Join tab can be used to accept any valid SQL. We are joining the source data,
forecast_hourly_metric
todim_weather_metrics
. The conditions are thecity_name
,country_code
, andmeasurement_time
must match. This will join all results even if there are no matches. They will be null values.FROM {{ ref('ACCUWEATHER', 'forecast_hourly_metric') }} `forecast_hourly_metric`
LEFT JOIN {{ ref('<YOUR-STORAGE-LOCATION-NAME>', 'dim_weather_metrics') }} `dim_weather_metrics`
ON `forecast_hourly_metric`.`city_name` = `dim_weather_metrics`.`city_name`
AND `forecast_hourly_metric`.`country_code` = `dim_weather_metrics`.`country_code`
AND `forecast_hourly_metric`.`datetime_valid_local` = `dim_weather_metrics`.`measurement_time` -
Create and Run the Node.
-
Click on the browser tab to see that the
stg_weather_facts
has a reference to the source and dimension Node. -
Create a Fact Node from the
stg_weather_facts
. -
Name it
fct_weather_facts
. -
Set the Business Keys as:
city_name
country_code
measurement_time
-
Create and Run the Node.

Wrap Up
Here's what you did:
You Joined the source Node to the stg_weather_facts
to get raw measurements like humidity
and precipitation_type_desc
. This is the data you are tracking.
Stg_weather_facts
will have the dim_weather_metrics_key
. This makes sure each row has the correct historical weather data. Another way of saying this is: "On this day, time, and place, the weather description was X." Another example, "This forecast measurement ties to dim_weather_metrics_key
at 124 to 124 in the fact Node and that record said it was cloudy, 68°F, and humid at that time."
Databricks Query
You can also query this in Databricks.
SELECT
d.cloud_cover_percent,
COUNT(*) AS measurement_count,
ROUND(AVG(f.humidity_relative), 1) AS avg_humidity
FROM fct_weather_facts f
JOIN dim_weather_metrics d
ON f.dim_weather_metrics_key = d.dim_weather_metrics_key
GROUP BY d.cloud_cover_percent
ORDER BY d.cloud_cover_percent;
SELECT
f.measurement_time::DATE AS date,
f.city_name,
ROUND(AVG(d.temperature), 1) AS avg_temp
FROM fct_weather_facts f
JOIN dim_weather_metrics d
ON f.dim_weather_metrics_key = d.dim_weather_metrics_key
GROUP BY date, f.city_name
ORDER BY date, f.city_name;
