Profiles Sync Sample Queries
On this page, you’ll find queries that you can run with Profiles Sync to address common use cases.
The examples in this guide are based on a Snowflake installation. If you’re using another warehouse, you may need to adjust the syntax.
About example schemas
The queries on this page use two example schemas:
ps_segment, a schema where Segment lands data
ps_ materialize, a schema with your produced materializations
These schema names may not match your own.
Monitor and diagnose identity graphs
These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by identity resolution.
Show how many profiles Segment creates and merges per hour
This example queries the
id_graph_udpates table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change:
DATE_TRUNC('hour',timestamp) as hr,
THEN 'profile creation' ELSE 'profile merge'
END as profile_event,
COUNT(DISTINCT triggering_event_id) as event_count
GROUP BY 1,2,3
Isolate profiles that have reached an identifier’s maximum configured value
Segment’s configurable identifier limits let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile.
The following query lets you view Profiles that have reached a configured limit for the email identifier:
WITH agg AS (
COUNT(LOWER(TRIM(external_id_value))) as value_count,
LISTAGG(external_id_value,', ') as external_id_values
GROUP BY 1
WHERE value_count > 5 -- set to your configured limit
Reconstruct a profile’s traits
Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles
When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the
You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait’s origin.
The following example inspects a particular profile,
use_XX, and trait,
trait_1. The query reports the profile’s last observed trait, its source ID, and any profiles Segment has since merged into the profile:
SELECT * FROM (
row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn
FROM ps_segment.identifies as ident
INNER JOIN ps_materialize.id_graph as ids
ON ids.segment_id = ident.segment_id
AND ids.canonical_segment_id = 'use_XXX'
AND ident.trait_1 IS NOT NULL
) WHERE rn=1
Measure and model your customer base
Pull a complete list of your customers, along with their merges, external identifiers, or traits
The following three snippets will provide a full list of your customers, along with:
- The profile IDs merged into that customer:
LISTAGG(segment_id, ', ') as associated_segment_ids
GROUP BY 1
- The external IDs associated with that customer:
LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids
GROUP BY 1
- The customer’s traits:
SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL
Show all pages visited by a user
To get complete user histories, join event tables to the identity graph and aggregate or filter with
LEFT JOIN ps_materialize.id_graph
ON id_graph.segment_id = pages.segment_id
WHERE canonical_segment_id = ‘use_XX..’
Show the complete history of a trait or audience membership associated with a customer
Suppose you want to track a user’s entrances and exits of the audience
aud_1. Running the following query would return all qualifying entrance and exits:
INNER JOIN ps_materialize.id_graph
ON id_graph.segment_id = identifies.segment_id
AND identifies.aud_1 IS NOT NULL
This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream.
Can I view Engage Audience membership and Computed Trait values in my Warehouse?
Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse.
The column name corresponds to the Audience or Trait key shown on the settings page:
Surface these values the same way as any other trait value:
- The Trait’s complete history will be in
- The Trait’s current state for each customer will be in
What is the relationship between
canonical_segment_id? Are they unique?
Identity merges change Segment’s understanding of who performed historical events.
For example, if
profile_b completed a “Product Purchased” event but Segment understands that
profile_b should be merged into
profile_a, Segment deduces that
profile_a performed that initial “Product Purchased” event.
With that in mind, here’s how to differentiate between
segment_idis a unique identifier representing Segment’s understanding of who performed an action at the time the action happened.
canonical_segment_idis a unique identifier representing Segment’s current understanding of who performed that action.
The mapping between these two identifiers materializes in your
id_graph table. If a profile has not been merged away, then
segment_id is equivalent to
canonical_segment_id. If a profile has been merged away,
id_graph reflects that state.
As a result, you can retrieve a customer’s complete event history by joining an event table, like
For more information, view the Profiles Sync tables guide.
Does Profiles Sync data ever differ from Unify data?
Profiles Sync mimics the materialization performed by Segment Unify. A user’s merges, external IDs, and traits should be expected whether they’re queried in the warehouse, Profile API, or viewed in the UI.
The following edge cases might drive slight (<0.01%) variation:
- Data processed by Unify hasn’t yet landed in Profiles Sync.
- If you rebuild or use non-incremental materialization for
profile_traits, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users.
By contrast, Segment Unify and incrementally-built Profiles Sync materializations won’t combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile’s traits will be chosen across the board.
What hash function is used for the external_id_hash field by Profiles Sync?
external_id_hash is a hash of the
external_id_value using SHA-1. This field corresponds to the
primary_key for the table:
hash (external_id_type and external_id_value).
For example, in BigQuery the logic is:
TO_HEX(SHA1(concat(external_id_type, external_id_value))) as seg_hash.
This page was last modified: 22 Nov 2023
Questions? Problems? Need more info? Contact Segment Support for assistance!