Home
/
Unify
/
Profiles sync
/
Profiles Sync Sample Queries

Profiles Sync Sample Queries

On this page, you’ll find queries that you can run with Profiles Sync to address common use cases.

The examples in this guide are based on a Snowflake installation. If you’re using another warehouse, you may need to adjust the syntax.

About example schemas

The queries on this page use two example schemas:

ps_segment, a schema where Segment lands data
ps_ materialize, a schema with your produced materializations

These schema names may not match your own.

Monitor and diagnose identity graphs

These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by identity resolution.

Show how many profiles Segment creates and merges per hour

This example queries the id_graph_udpates table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change:

SELECT
    DATE_TRUNC('hour',timestamp) as hr,
    CASE
      WHEN canonical_segment_id=segment_id
      THEN 'profile creation' ELSE 'profile merge'
    END as profile_event,
    triggering_event_type,
    COUNT(DISTINCT triggering_event_id) as event_count
FROM ps_segment.id_graph_updates
GROUP BY 1,2,3

Isolate profiles that have reached an identifier’s maximum configured value

Segment’s configurable identifier limits let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile.

The following query lets you view Profiles that have reached a configured limit for the email identifier:

WITH agg AS (
    SELECT
        canonical_segment_id,
        COUNT(LOWER(TRIM(external_id_value))) as value_count,
        LISTAGG(external_id_value,', ') as external_id_values
    FROM ps_materialize.external_id_mapping
    WHERE external_id_type='email'
    GROUP BY 1
)
SELECT
    canonical_segment_id,
    external_id_values,
    value_count
FROM agg
WHERE value_count > 5 -- set to your configured limit

Reconstruct a profile’s traits

Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles

When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the profile_traits table.

You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait’s origin.

The following example inspects a particular profile, use_XX, and trait, trait_1. The query reports the profile’s last observed trait, its source ID, and any profiles Segment has since merged into the profile:

SELECT * FROM (
  SELECT
    ids.canonical_segment_id,
    ident.segment_id,
    ident.event_source_id,
    ident.trait_1,
    row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn
  FROM ps_segment.identifies as ident
  INNER JOIN ps_materialize.id_graph as ids
ON ids.segment_id = ident.segment_id
AND ids.canonical_segment_id = 'use_XXX'
AND ident.trait_1 IS NOT NULL
) WHERE rn=1

Measure and model your customer base

Pull a complete list of your customers, along with their merges, external identifiers, or traits

The following three snippets will provide a full list of your customers, along with:

The profile IDs merged into that customer:

SELECT
  canonical_segment_id,
  LISTAGG(segment_id, ', ') as associated_segment_ids
FROM ps_materialize.id_graph
GROUP BY 1

The external IDs associated with that customer:

SELECT
  canonical_segment_id,
  LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids
FROM ps_materialize.external_id_mapping
GROUP BY 1

The customer’s traits:

SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL

Pull the latest subscription status set for every profile identifier in the space

Provides the latest subscription status set for all identifiers in the space. This query will not include identifiers that have no subscription status ever set.

SELECT evt1.user_id,  evt1.channel, evt1._id id, evt1.status, evt1.received_at
FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED evt1
JOIN (
  SELECT _id, MAX(received_at) AS max_received_at
  FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED
  GROUP BY _id
) evt2
ON evt1._id = evt2._id AND evt1.received_at = evt2.max_received_at
ORDER BY 1

Show all pages visited by a user

To get complete user histories, join event tables to the identity graph and aggregate or filter with id_graph.canonical_segment_id:

SELECT
    id_graph.canonical_segment_id,
    pages.*
FROM ps_segment.pages
LEFT JOIN ps_materialize.id_graph
    ON id_graph.segment_id = pages.segment_id
WHERE canonical_segment_id = ‘use_XX..’

Show the complete history of a trait or audience membership associated with a customer

Suppose you want to track a user’s entrances and exits of the audience aud_1. Running the following query would return all qualifying entrance and exits:

SELECT
    id_graph.canonical_segment_id,
    identifies.aud_1,
    identifies.timestamp
FROM ps_segment.identifies
INNER JOIN ps_materialize.id_graph
    ON id_graph.segment_id = identifies.segment_id
    AND identifies.aud_1 IS NOT NULL

This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream.

FAQs

Can I view Engage Audience membership and Computed Trait values in my Warehouse?

Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse.

The column name corresponds to the Audience or Trait key shown on the settings page:

Surface these values the same way as any other trait value:

The Trait’s complete history will be in identifies
The Trait’s current state for each customer will be in profile_traits

What is the relationship between `segment_id` and `canonical_segment_id`? Are they unique?

Identity merges change Segment’s understanding of who performed historical events.

For example, if profile_b completed a “Product Purchased” event but Segment understands that profile_b should be merged into profile_a, Segment deduces that profile_a performed that initial “Product Purchased” event.

With that in mind, here’s how to differentiate between segment_id and canonical_segment_id:

segment_id is a unique identifier representing Segment’s understanding of who performed an action at the time the action happened.
canonical_segment_id is a unique identifier representing Segment’s current understanding of who performed that action.

The mapping between these two identifiers materializes in your id_graph table. If a profile has not been merged away, then segment_id is equivalent to canonical_segment_id. If a profile has been merged away, id_graph reflects that state.

As a result, you can retrieve a customer’s complete event history by joining an event table, like product_purchased to id_graph.

For more information, view the Profiles Sync tables guide.

Does Profiles Sync data ever differ from Unify data?

Profiles Sync mimics the materialization performed by Segment Unify. A user’s merges, external IDs, and traits should be expected whether they’re queried in the warehouse, Profile API, or viewed in the UI.

The following edge cases might drive slight (<0.01%) variation:

Data processed by Unify hasn’t yet landed in Profiles Sync.
If you rebuild or use non-incremental materialization for profile_traits, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users.

By contrast, Segment Unify and incrementally-built Profiles Sync materializations won’t combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile’s traits will be chosen across the board.

What hash function is used for the external_id_hash field by Profiles Sync?

The external_id_hash is a hash of the external_id_type and external_id_value using SHA-1. This field corresponds to the primary_key for the table: hash (external_id_type and external_id_value). For example, in BigQuery the logic is: TO_HEX(SHA1(concat(external_id_type, external_id_value))) as seg_hash.

This page was last modified: 04 Jul 2024

Need support?

Questions? Problems? Need more info? Contact Segment Support for assistance!

Visit our Support page

Help improve these docs!

Edit this page Request docs change

Was this page helpful?

Get started with Segment

Segment is the easiest way to integrate your websites & mobile apps data to over 300 analytics and growth tools.

Create free account

Profiles Sync Sample Queries

About example schemas

Monitor and diagnose identity graphs

Show how many profiles Segment creates and merges per hour

Isolate profiles that have reached an identifier’s maximum configured value

Reconstruct a profile’s traits

Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles

Measure and model your customer base

Pull a complete list of your customers, along with their merges, external identifiers, or traits

Pull the latest subscription status set for every profile identifier in the space

Show all pages visited by a user

Show the complete history of a trait or audience membership associated with a customer

FAQs

Can I view Engage Audience membership and Computed Trait values in my Warehouse?

What is the relationship between segment_id and canonical_segment_id? Are they unique?

Does Profiles Sync data ever differ from Unify data?

What hash function is used for the external_id_hash field by Profiles Sync?

Need support?

Help improve these docs!

Was this page helpful?

Get started with Segment

What is the relationship between `segment_id` and `canonical_segment_id`? Are they unique?