Databricks Destination


With the Databricks Destination, you can ingest event data directly from Segment into your Databricks Lakehouse.

This page will help you get started with syncing Segment events into your Databricks Lakehouse.

Segment has certified the destination for Databricks on AWS and Azure.

Getting started

Before getting started with the Databricks Destination, note the following prerequisites.

  • The target Databricks workspace must be Unity Catalog enabled. Segment doesn’t support the Hive metastore. Visit the Databricks guide enabling the Unity Catalog for more information.
  • Segment creates managed tables in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables.
  • Segment supports only OAuth (M2M) for authentication.

Segment recommends that you enable Warehouse Selective Sync. This feature enables customization of collections and properties sent to the warehouse. By syncing only relevant and required data, it reduces sync duration and compute costs, optimizing efficiency compared to syncing everything. Learn more about Warehouse Selective Sync.

Warehouse size

A SQL warehouse is required for compute. Segment recommends a warehouse with the following characteristics:

  • Size: small
  • Type Serverless otherwise Pro
  • Clusters: Minimum of 2 - Maximum of 6

Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn’t running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the Test Connection button during setup.

Set up Databricks in Segment

Use the following steps to set up Databricks in Segment:

  1. Navigate to Connections > Catalog.
  2. Select the Destinations tab.
  3. Under Connection Type, select Storage, and click on the Databricks storage tile.
  4. (Optional) Select a source(s) to connect to the destination.
  5. Follow the steps below to connect your Databricks warehouse.

Connect your Databricks warehouse

Use the five steps below to connect to your Databricks warehouse.

You’ll need read and write warehouse permissions for Segment to write to your database.

Step 1: Name your destination

Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to the destination settings (Connections > Destinations > Settings) page.

Step 2: Enter the Databricks compute resources URL

You’ll use the Databricks workspace URL, along with Segment, to access your workspace API.

Check your browser’s address bar when inside the workspace. The workspace URL should resemble: https://<workspace-deployment-name>.cloud.databricks.com. Remove any characters after this portion and note the URL for later use.

Step 3: Enter a Unity catalog name

This catalog is the target catalog where Segment lands your schemas and tables.

  1. Follow the Databricks guide for creating a catalog. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, “Segment”). Note this name for later use.
  2. Select the catalog you’ve just created.
    1. Select the Permissions tab, then click Grant.
    2. Select the Segment service principal from the dropdown, and check ALL PRIVILEGES.
    3. Click Grant.

Step 4: Add the SQL warehouse details from your Databricks warehouse

Next, add SQL warehouse details about your compute resource.

  • HTTP Path: The connection details for your SQL warehouse.
  • Port: The port number of your SQL warehouse.

Step 5: Add the service principal client ID and OAuth secret

Be sure to note the principal ID and the OAuth secret Databricks generates, as you’ll need to enter them in this step.

Segment uses the service principal to access your Databricks workspace and associated APIs.

  1. Follow the Databricks guide for adding a service principal to your account. This name can be anything, but Segment recommends something that identifies the purpose (for example, “Segment Storage Destinations”). Note the principal application ID that Databricks generates to use in this step. Segment doesn’t require Account admin or Marketplace admin roles.
  2. Follow the Databricks instructions to generate an OAuth secret. Note the secret generated by Databricks to use in this step. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one.

Once connected, you’ll see a confirmation screen with next steps and more info on using your warehouse.

View observability metrics about your Databricks Destination with Delivery Overview

Delivery Overview, Segment’s built-in observability tool, is now in public beta for storage destinations. For more information, see the Delivery Overview documentation.

Security

Segment recommends enabling IP allowlists for added security. All Segment users with workspaces hosted in the US who use allowlists in their warehouses must update those allowlists to include the following ranges:

  • 52.25.130.38/32
  • 34.223.203.0/28

Users with workspaces in the EU must allowlist 3.251.148.96/29.

This page was last modified: 21 Oct 2024



Get started with Segment

Segment is the easiest way to integrate your websites & mobile apps data to over 300 analytics and growth tools.
or
Create free account