Question 1

How do you define the identity domain model for Customer 360?

Accepted Answer

We start by separating business semantics from implementation constraints. The identity domain model typically includes entities such as person, account, household, device, and sometimes location or subscription, plus the relationships between them. We define cardinality (one-to-many, many-to-many), relationship direction, and which edges are allowed to drive merges versus only provide context. Next, we map each entity and relationship to concrete data sources and identifiers. For example, a person may be anchored by a verified login ID, while a device relationship may be derived from app instance IDs. We also define which attributes are mastered where (CRM, support, ecommerce) and how updates propagate. Finally, we validate the model against priority use cases: segmentation, suppression, personalization, and measurement. If the model cannot support a use case without unsafe assumptions (e.g., household-based suppression in a shared email scenario), we adjust the model or constrain the use case. The output is a reference model that becomes the basis for data contracts and implementation patterns.

Question 2

Where should identity resolution run: CDP, data warehouse, or both?

Accepted Answer

The right placement depends on latency requirements, governance needs, and how many downstream systems must consume identity outputs. Running identity resolution in the warehouse (or a dedicated identity service) often provides stronger versioning, testability, and auditability because the logic is expressed as code and can be validated with controlled datasets. It also makes identity outputs reusable across analytics and multiple activation tools. Running identity resolution in the CDP can be appropriate when the CDP is the primary system of record for profiles and you need near-real-time stitching for personalization. The trade-off is that CDP-native rule configuration can be harder to version, test, and reproduce outside the platform. In many enterprises, a hybrid approach works best: foundational identity entities and stable identifiers are resolved upstream (warehouse/lakehouse), while the CDP performs limited, well-governed stitching for real-time events. We document the boundary explicitly: which merges are authoritative, how identity versions are published, and how to prevent conflicting logic across layers.

Question 3

What operational metrics should we monitor for identity resolution?

Accepted Answer

We recommend monitoring metrics that detect both quality drift and unsafe behavior. Core metrics include duplicate rate (multiple profiles representing the same person), merge rate (how often identities are combined), unmerge rate (how often merges are reversed), and profile churn (how frequently a profile’s identifiers or key attributes change). Sudden spikes often indicate a source change, a parsing issue, or an overly permissive rule. Coverage metrics are equally important: percentage of events linked to a known identity, percentage of profiles with verified identifiers, and match yield by source. These show whether identity resolution is improving or degrading as new channels are added. Finally, add guardrails tied to risk: rate of merges driven by weak identifiers, number of profiles exceeding expected identifier counts, and consent propagation failures. We define thresholds, alerting, and investigation runbooks so teams can respond quickly. The goal is to treat identity as an operational system with observable behavior, not a static configuration.

Question 4

How do we change matching rules without breaking downstream reporting and activation?

Accepted Answer

We treat identity rules as versioned artifacts with a controlled release process. Changes start with an impact assessment: which identifiers and sources are affected, expected changes to merge behavior, and which downstream systems consume the identity outputs. We then run the new rules against a representative dataset to compare metrics such as duplicates, merges, and segment membership deltas. For rollout, we typically recommend a phased approach. First, publish the new identity version in parallel (shadow mode) while keeping the current version as the operational default. This allows validation with real traffic and real downstream queries without changing production behavior. Once validated, we coordinate a cutover window and communicate expected changes to stakeholders. We also define rollback criteria and procedures, including how to handle profiles created or merged during the transition. This approach reduces surprises in dashboards, attribution, and campaign audiences while still allowing the identity layer to evolve.

Question 5

How do you onboard new data sources into an existing identity strategy?

Accepted Answer

Onboarding starts with a source assessment focused on identifiers, data quality, and update behavior. We identify which identifiers the source provides, whether they are verified, and how stable they are over time. We also evaluate how the source represents entities (person vs account) and whether it introduces new relationships that must be modeled. Next, we define a data contract: required fields, normalization rules, validation checks, and how the source should publish changes (full snapshots vs incremental updates). We map the source into the identity domain model and specify which edges it can create. For example, a support system may contribute verified email and account relationships but should not drive merges based on free-text fields. Finally, we validate match yield and risk using test datasets and monitoring during an initial release. The goal is to integrate the source predictably, without silently changing the meaning of “customer” for downstream teams.

Question 6

How do you handle identity mapping to activation platforms and ad ecosystems?

Accepted Answer

We start by defining which identifiers are allowed for activation (e.g., hashed email, phone, platform-specific IDs) and the consent requirements for each destination. We then map identity entities to activation needs: some channels require person-level identifiers, while others operate at account or household level. Next, we specify export rules and suppression logic. This includes how to prevent sending identities that are unverified, recently changed, or outside retention windows, and how to ensure opt-outs propagate across all linked identities. We also define how identity versions are represented so that activation systems can handle changes without duplicating audiences. Finally, we address operational concerns: rate limits, incremental exports, and reconciliation. We recommend maintaining an auditable record of what identifiers were exported, when, and under which consent state. This supports compliance inquiries and helps debug discrepancies between CDP audiences and destination platform counts.

Question 7

Who should own identity resolution rules and decisions?

Accepted Answer

Ownership should be split between decision rights and implementation responsibility. Typically, a data governance or Customer 360 steering group owns the definition of identity semantics: what constitutes a person, which identifiers are authoritative, and what merge constraints are required for risk management. Platform architecture and data engineering own implementation details: pipelines, performance, testing, and operational monitoring. Marketing operations and analytics stakeholders should be involved because they experience the downstream effects of identity changes in segmentation and reporting. Privacy and security stakeholders must have explicit approval points when identity resolution affects consent propagation, retention, or cross-region data movement. We formalize this as a RACI model and a change workflow. Identity rule changes should have documented rationale, expected impact, validation results, and an approval trail. This reduces ad-hoc changes made under delivery pressure and makes identity behavior stable and explainable over time.

Question 8

How do you make identity decisions auditable and explainable?

Accepted Answer

Auditability requires lineage at both the identity and attribute levels. For identity links, we capture why a match occurred: which identifiers matched, which rule version was applied, and the timestamp and source of the contributing records. For probabilistic links, we record the signals used and the confidence score or threshold that triggered the link. For profile attributes, we define survivorship rules and store provenance: which source provided the current value, what precedence logic selected it, and what alternative values were available. This is especially important for regulated environments and for operational debugging when stakeholders question why a profile changed. We also recommend versioning identity rules and publishing release notes that summarize changes and expected impacts. Combined with monitoring metrics, this creates a traceable chain from a business question (“why did this customer move segments?”) to the underlying identity events and rule decisions that caused the change.

Question 9

How do you reduce the risk of over-merging identities?

Accepted Answer

Over-merging is usually caused by treating weak or shared identifiers as unique. We mitigate this by classifying identifiers by strength and context, then restricting which identifiers can drive merges. For example, we may allow merges on verified login IDs and confirmed emails, but treat unverified emails, shared phone numbers, or device signals as relationships that add context rather than merge authority. We also implement merge constraints and anomaly detection. Constraints can include limits on how many distinct people can share an identifier, rules that prevent merges across incompatible account contexts, and quarantine paths when a record triggers conflicting signals. Monitoring focuses on spikes in merge rate, unusually large identity clusters, and sudden changes in identifier coverage. Finally, we define unmerge policies and operational procedures. Even with good design, edge cases occur. Having a documented and tested unmerge path reduces the long-term damage of a bad rule and makes teams more confident in evolving the identity layer safely.

Question 10

How do privacy, consent, and retention affect identity resolution design?

Accepted Answer

Privacy constraints shape what identifiers you can store, how long you can retain them, and how you can link identities across contexts. We start by mapping consent categories and lawful bases to identity operations: collection, linking, enrichment, and activation. This determines which edges in the identity graph are permitted and under what conditions. Retention policies affect both identifiers and derived relationships. If an identifier must be deleted after a period, the identity graph must support edge expiration and downstream propagation so that activation and analytics do not continue using stale links. We also consider regional data residency: identity resolution may need to run in-region, or identity outputs must be partitioned to avoid cross-border linkage. We design for minimization and separation of concerns: store only required identifiers, prefer hashed or tokenized forms where appropriate, and ensure consent state is part of the identity contract. The goal is to make compliance enforceable by architecture, not dependent on manual process.

Question 11

What are the concrete deliverables of an identity resolution strategy engagement?

Accepted Answer

Deliverables are designed to be implementable by engineering teams and usable for governance. Typically this includes an identity domain model (entities, relationships, cardinality), an identifier taxonomy with normalization and validation rules, and a match/merge specification covering deterministic rules, optional probabilistic inputs, and merge constraints. We also deliver survivorship rules for key attributes, including precedence and lineage requirements, plus unmerge/split policies and exception handling. On the operational side, we provide monitoring metrics, alert thresholds, and runbooks for investigating anomalies and responding to incidents. Finally, we produce an operationalization plan: where identity resolution runs, how identity versions are published, data contracts for source onboarding, and a phased rollout backlog aligned to priority use cases. The intent is that teams can move directly from strategy to implementation without re-interpreting ambiguous guidance.

Question 12

How long does an identity resolution strategy typically take?

Accepted Answer

Timing depends on the number of sources, complexity of identity semantics, and the level of governance required. For a focused scope with a handful of primary sources (e.g., CRM, web/app events, email platform), strategy definition and specifications commonly take 4–8 weeks. This includes discovery, modeling, rule design, and validation planning. For larger enterprises with many sources, multiple regions, and strict compliance requirements, the work may extend to 8–12 weeks or be structured as phases. In those cases, we prioritize a minimal viable identity model and deterministic rules for the highest-value use cases, then expand coverage iteratively. We also distinguish between “strategy complete” and “fully implemented.” Implementation and rollout often run in parallel with strategy work once the core decisions are made, especially when teams need to stabilize Customer 360 outputs quickly. We plan milestones around decision points and measurable identity metrics rather than arbitrary dates.

Question 13

When should we use deterministic versus probabilistic identity matching?

Accepted Answer

Deterministic matching is preferred when you have strong, verified identifiers and when identity outputs drive operational actions such as suppression, consent enforcement, or regulated communications. Deterministic rules are easier to explain, test, and audit, and they produce stable behavior that downstream teams can rely on. Probabilistic matching can be useful when deterministic identifiers are sparse, especially for analytics use cases that benefit from broader linkage (e.g., cross-device measurement). The key is to constrain where probabilistic links are allowed and to avoid using them as authoritative merges for operational profiles unless you have strong governance and clear risk acceptance. We typically recommend a layered approach: deterministic identity as the core Customer 360, with probabilistic relationships represented as separate edges or annotations that can be used selectively. This allows teams to gain value from weaker signals without contaminating the canonical identity layer or creating hard-to-reverse merges.

Question 14

How do you version identity rules and manage backward compatibility?

Accepted Answer

We version identity rules the same way you version APIs: with explicit releases, documented changes, and compatibility expectations. Each rule set has an identifier (version number or date-based tag), and identity outputs include the version used to compute matches and survivorship. This makes it possible to reproduce results and compare behavior across releases. Backward compatibility depends on downstream dependencies. If reporting and activation systems cannot tolerate sudden changes, we recommend publishing identity outputs in parallel for a period (v1 and v2) and providing mapping guidance for consumers. For example, segments may be computed against v2 while dashboards continue to use v1 until stakeholders validate the deltas. We also define deprecation policies: how long old versions remain available, how to migrate consumers, and what metrics must be stable before retiring a version. This reduces operational risk and prevents identity logic from becoming an unmanageable set of one-off exceptions.

Question 15

How do data quality issues affect identity resolution, and how do you address them?

Accepted Answer

Identity resolution is highly sensitive to data quality because small inconsistencies in identifiers can produce large downstream effects. Common issues include unnormalized emails and phones, placeholder values, recycled identifiers, inconsistent account keys, and late-arriving updates. These lead to under-linking (fragmented profiles) or over-linking (incorrect merges). We address this by defining validation and normalization as part of the identity contract, not as an optional cleanup step. Sources must meet minimum quality thresholds to participate in matching, and we specify how to handle invalid or ambiguous identifiers (reject, quarantine, or treat as non-mergeable context). We also recommend ongoing quality monitoring tied to identity metrics: identifier coverage by source, invalid rate, and changes in match yield. When quality drifts, teams can isolate the source and prevent it from driving merges until it is corrected. This keeps the identity layer stable even when upstream systems are imperfect.

Question 16

How does collaboration typically begin for identity resolution strategy work?

Accepted Answer

Collaboration typically begins with a short alignment and discovery phase focused on scope and evidence. We start with stakeholder interviews across data, architecture, marketing operations, analytics, and privacy to confirm the priority use cases and the decisions that must be made (identity semantics, merge constraints, consent propagation). In parallel, we request a lightweight inventory of sources, identifiers, and current stitching logic. Next, we run a working session to map the current identity flow end-to-end: where identifiers are created, how they are transformed, where merges happen today, and which downstream systems consume identity outputs. We also agree on baseline metrics to measure change, such as duplicate rate and profile churn. From there, we propose a phased plan with clear decision points, required inputs, and a delivery cadence that fits your teams. The first tangible outputs are usually the identity domain model and identifier taxonomy, followed by match/merge specifications and governance controls that engineering can implement incrementally.

See where identity resolution is creating CDP risk

Identity Resolution Strategy

Cross-channel identity stitching with governed matching rules

Unified customer profile architecture for identifiers, merges, and survivorship

Scalable identity foundations for Customer 360 and activation

Inconsistent Identity Stitching Breaks Customer 360

Identity Resolution Strategy Delivery Method

Identity Discovery

Domain Modeling

Identifier Taxonomy

Deterministic vs Probabilistic Match Design

Survivorship Rules

Privacy-Aware Governance and Controls

Operationalization Plan

Core Capabilities for CDP Identity Resolution Design

Identity Domain Model

Identifier Normalization

Deterministic Matching Rules

Probabilistic Matching Inputs

Merge and Split Policies

Survivorship and Lineage

Governance and Monitoring

Surface the identity risks that block reliable CDP outcomes

Delivery Model

Discovery and Inventory

Current-State Assessment

Target Architecture Design

Rule and Policy Specification

Integration and Activation Mapping

Validation and Test Design

Governance and Handover

Business Impact

More Reliable Customer 360

Reduced Duplicate Profiles

Lower Operational Risk

Faster Onboarding of Sources

Improved Consent Enforcement

Better Attribution Consistency

Maintainable Governance Model

Validate CDP identity architecture before scaling changes

Related Services

CRM Data Integration

Customer Journey Orchestration

Data Activation Architecture

Marketing Automation Integration

Personalization Architecture

Customer Analytics Platforms

Customer Intelligence Platforms

Customer Segmentation Architecture

Experimentation Data Architecture

CDP Platform Architecture

Customer 360 Data Architecture

Customer Data Modeling

FAQ

Identity Resolution and Customer Data Platform Case Studies

JYSKGlobal Retail DXP & CDP Transformation

OrganogenesisScalable Multi-Brand Next.js Monorepo Platform

Testimonials

Further reading on identity resolution governance

CDP Survivorship Rules: How to Reconcile CRM, Product, and Support Data Without Polluting the Customer Profile

CDP Schema Registry Strategy: How Enterprise Teams Keep Event Contracts Governable Across Channels

CDP Implementation Pitfalls: Why Customer Data Programs Stall After the Pilot

Identity Resolution Pitfalls: How False Merges Damage CDP Trust

Why Customer Data Platforms Fail Without Activation Ownership

Define a stable identity foundation

Oleksiy (Oly) Kalinichenko

CTO at PathToProject

Do you want to start a project?