BigQuery Set Up
Set up BigQuery with the Catalog. Create a Google Service Account, grant the required roles, and add credentials to the Catalog App.
Which Rights Do You Need To Complete the Onboarding?
In order to create the Google Service account, you need to have at minimum one of the following roles:
- Service Account Admin
- Editor Basic
What Will Catalog Do With This Access?
These roles let Catalog read your metadata by querying BigQuery system metadata. Catalog can see the data model, the queries, and the users in that metadata. This access does not let Catalog read your table data.
Need Some More Info?
For more on Google Cloud service accounts and IAM roles, see:
1. Whitelist Catalog's IP on BigQuery
Needed only if you've set up network policies, meaning your BigQuery instance only accepts connections coming from specific IPs.
To allow Catalog to connect to BigQuery, you will first need to whitelist Catalog's IP address.
Here are our fixed IPs:
- For instances on app.us.castordoc.com:
34.42.92.72 - For instances on app.castordoc.com:
35.246.176.138
BigQuery's documentation on configuring public IP connectivity can be found in the Google Cloud docs.
2. Create Catalog User on BigQuery
2.1 Create Google Service Account for Catalog
Create a service account for Catalog from the Google Console. See how to create and manage service accounts.
Make sure to create and download a JSON key for that service account. See how to create service account keys.
2.2 Grant Roles to Google Service Account
Grant the needed access to this new service account with the following roles. See how to grant and change access.
You can find the documentation on the different roles in the BigQuery access control docs.
Make sure to add roles for all projects you would like to see in the Catalog App.
-
BigQuery Read Session User
The
BigQuery Read Session Userrole allows Catalog to create and use read sessions. It allows read capabilities via SQL yet does not grant by itself any data access. -
BigQuery Metadata Viewer
The
metadataViewerrole allows Catalog to fetch the schemas of your data. -
BigQuery Job User
The
jobUserrole is used by Catalog to parse queries and compute lineage and usage stats. -
BigQuery Resource Viewer
The
resourceVieweris also used by Catalog to parse queries and compute lineage and usage stats.
3. Add New Credentials in Catalog App
You must be a Catalog admin to add credentials.
If you use a separate Google Cloud project to host service accounts, update the JSON key. Overwrite the project_id with a project from which the service user can run queries.
You can enter the newly created credentials in the Catalog App integrations settings.
- Go to Settings > Integrations.
- Click BigQuery Add.
- Add the credentials.
.png)
Troubleshooting
This section covers common issues and how to resolve them.
First Ingestion Stays In Progress Longer Than Expected
Large projects, many data sets, or busy query history can extend the first ingestion window. Catalog may still be scanning metadata and query jobs while the UI shows a first-run state.
Resolution:
- Allow at least one full business day before assuming a stall, then confirm the service account still has BigQuery Job User, BigQuery Metadata Viewer, BigQuery Read Session User, and BigQuery Resource Viewer on every GCP project you want in Catalog.
- Verify the JSON key
project_idmatches a project the service account can run jobs from (see the note under Add New Credentials in Catalog App). - If status does not change after roles and project IDs are correct, contact Coalesce Support with your Catalog region and approximate start time of the integration.
Cannot Connect After Saving Credentials
Network policies or VPC restrictions may block Catalog's outbound IPs from reaching BigQuery.
Resolution:
- Confirm you followed Whitelist Catalog's IP on BigQuery and allowed both Catalog fixed IPs for your hostname, whether you use
app.castordoc.comorapp.us.castordoc.com. - Retry credentials in Settings > Integrations after the network team confirms the allowlist.
- If errors persist, capture the error text from the integration screen and contact Coalesce Support.
Lineage or Usage Looks Empty Right After Setup
Query parsing and usage statistics depend on successful job metadata reads in BigQuery.
Resolution:
- Keep BigQuery Job User and BigQuery Resource Viewer on the projects where analysts run queries.
- Wait for ingestion to finish a full cycle so jobs from the last day can appear in Catalog's query sampling windows.
- For ongoing gaps on specific tables, verify table-level metadata is visible to the Catalog service account in the BigQuery console.
Wrong Projects or Data Sets Appear in Catalog
Roles might be granted on only one project while analysts use several.
Resolution:
- Revisit Grant Roles to Google Service Account and apply the same role bundle to each GCP project that should appear.
- Remove or narrow scope only after confirming with your governance team so you do not hide production assets unexpectedly.
Credentials Rejected After Rotating the Service Account Key
Catalog stores the JSON key you uploaded; a rotated key in Google Cloud alone does not update Catalog.
Resolution:
- Download the new JSON key from Google Cloud and replace the full credential payload in Settings > Integrations > BigQuery.
- Save and trigger a connection test if your UI offers one.
- Revoke the old key in Google Cloud after Catalog validates successfully.
Appendix
Queries Run by the Catalog User
-
Databases
WITH catalogs AS (SELECT DISTINCT catalog_nameFROM `{project}.region-{region}.INFORMATION_SCHEMA.SCHEMATA`)SELECTcatalog_name AS database_id,catalog_name AS database_nameFROM catalogs -
Schemas
SELECTcatalog_name AS database_id,catalog_name AS database_name,schema_name,schema_owner,creation_time,last_modified_time,location,CONCAT(catalog_name, '.', schema_name) AS schema_idFROM `{project}.region-{region}.INFORMATION_SCHEMA.SCHEMATA` -
Tables
WITH d AS (SELECTtable_catalog,table_schema,table_name,option_value AS `comment`FROM`{project}.region-{region}.INFORMATION_SCHEMA.TABLE_OPTIONS`WHERE TRUEAND option_name = 'description'AND option_value IS NOT NULLAND option_value != ''AND option_value != '""'),t AS(SELECTtable_catalog,table_schema,table_name,option_value AS tagsFROM`{project}.region-{region}.INFORMATION_SCHEMA.TABLE_OPTIONS`WHERE TRUEAND option_name = 'labels'AND option_value IS NOT NULLAND option_value != ''AND option_value != '""'),dt AS(SELECTcatalog_name,schema_name,option_value AS tagsFROM`{project}.region-{region}.INFORMATION_SCHEMA.SCHEMATA_OPTIONS`WHERE TRUEAND option_name = 'labels'AND option_value IS NOT NULLAND option_value != ''AND option_value != '""')SELECTi.table_catalog AS database_name,i.table_catalog AS database_id,i.table_schema AS `schema_name`,i.table_name AS table_name,i.table_type,i.is_insertable_into,i.is_typed,i.creation_time,d.comment,CONCAT(COALESCE(t.tags, ""),COALESCE(dt.tags, "")) AS tags,CONCAT(i.table_catalog, '.', i.table_schema) AS schema_id,CONCAT(i.table_catalog, '.', i.table_schema, '.', i.table_name) AS table_idFROM`{project}.region-{region}.INFORMATION_SCHEMA.TABLES` AS iLEFT JOIN d ON i.table_catalog = d.table_catalogAND i.table_schema = d.table_schemaAND i.table_name = d.table_nameLEFT JOIN t ON i.table_catalog = t.table_catalogAND i.table_schema = t.table_schemaAND i.table_name = t.table_nameLEFT JOIN dt ON i.table_catalog = dt.catalog_nameAND i.table_schema = dt.schema_name -
Columns
WITH field_path AS (SELECTtable_catalog,table_schema,table_name,column_name,description,field_path,data_typeFROM`{project}.region-{region}.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`)SELECTc.table_catalog AS database_id,c.table_catalog AS database_name,c.table_schema AS `schema_name`,c.table_name AS table_name,f.field_path AS column_name,c.ordinal_position,c.is_nullable,f.data_type,c.is_generated,c.generation_expression,c.is_stored,c.is_hidden,c.is_updatable,c.is_system_defined,c.is_partitioning_column,c.clustering_ordinal_position,f.description AS `comment`,CONCAT(c.table_catalog, '.', c.table_schema) AS schema_id,CONCAT(c.table_catalog, '.', c.table_schema, '.', c.table_name) AS table_id,CONCAT(c.table_catalog, '.', c.table_schema, '.', c.table_name, '.', f.field_path) AS column_idFROM`{{project}}.region-{{region}}.INFORMATION_SCHEMA.COLUMNS` AS cLEFT JOIN field_path AS f ONc.table_catalog = f.table_catalogAND c.table_schema = f.table_schemaAND c.table_name = f.table_nameAND c.column_name = f.column_nameWHERE TRUEAND c.column_name != '_PARTITIONTIME' -
Users
SELECT DISTINCT user_email AS user_emailFROM `{project}.region-{region}.INFORMATION_SCHEMA.JOBS_BY_PROJECT`WHERE TRUEAND DATE(creation_time) >= DATE_ADD(CURRENT_DATE, INTERVAL -30 DAY) -
Queries
SELECTjob_id AS query_id,creation_time,project_id AS database_name,user_email AS user_name,user_email AS user_id,job_type,statement_type,priority,start_time,end_time,query AS query_text,state,reservation_id,total_bytes_processed,total_bytes_billed,total_slot_ms,error_result,cache_hit,destination_table,referenced_tables,labels,parent_job_idFROM `{project}.region-{region}.INFORMATION_SCHEMA.JOBS_BY_PROJECT`WHERE TRUEAND DATE(creation_time) = DATE_ADD(CURRENT_DATE, INTERVAL -1 DAY)AND EXTRACT(hour FROM creation_time) BETWEEN 0 AND 23AND job_type = 'QUERY' -
View DDL
SELECTtable_catalog AS database_name,table_schema AS schema_name,table_name AS view_name,'{{"project_id": "' || table_catalog || '", "dataset_id": "' || table_schema || '", "table_id": "' || table_name || '"}}' as destination_table,view_definitionFROM `{project}.{dataset}.INFORMATION_SCHEMA.VIEWS`
References
Here's a list of all Google Cloud documentation referenced above: