Skip to main content

Redshift

Extract Redshift metadata into Catalog using the castor-extractor package.

Prerequisites

Installation Required

Follow the castor-extractor installation instructions before running the extraction.

We strongly recommend creating a dedicated user to extract your metadata.

Follow the instructions for creating the Catalog user on Redshift to create the catalog user.

SSL Certificate Verification

This client connects to Redshift using sslmode=verify-ca, which means your certificates must be up-to-date. For more information, see AWS Redshift SSL support.

Run Extraction Script

Once the package has been installed, you should be able to run the following command in your terminal:

castor-extract-redshift [arguments]

The script will run and display logs as following:

INFO - Extracting `DATABASE` ...
INFO - Results stored to /tmp/catalog/1649083626-database.csv


...

INFO - Extracting `USER` ...
INFO - Results stored to /tmp/catalog/1649083626-user.csv
INFO - Wrote output file: /tmp/catalog/1649083626-summary.json

Credentials

  • -H, --host: hostname
  • -P, --port: port number
  • -d, --database: database name
  • -u, --user: user
  • -p, --password: password

Other Arguments

  • -o, --output: target folder to store the extracted files

Optional Arguments

  • --skip-existing: Skip files already extracted instead of replacing them
  • --serverless: Enables extraction for Redshift Serverless
Help

You can also get help with the --help argument.

Use ENV Variables

If you don't want to specify arguments every time, you can set the following ENV in your .bashrc:

export CASTOR_REDSHIFT_HOST=127.0.0.0
export CASTOR_REDSHIFT_PORT=5439
export CASTOR_REDSHIFT_DATABASE=db_name
export CASTOR_REDSHIFT_USER=extraction_user
export CASTOR_REDSHIFT_PASSWORD=******

# Optional to enable Redshift Serverless
CASTOR_REDSHIFT_SERVERLESS=true

export CASTOR_OUTPUT_DIRECTORY="/tmp/catalog"

Then the script can be executed without any arguments:

castor-extract-redshift

It can also be executed with partial arguments (the script looks in your ENV as a fallback):

castor-extract-redshift --output /tmp/catalog