Lookalike Audience Modeling
“Which users look like our best converters?”
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.
By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example
Which users look like our best converters?
Platform-native lookalike tools operate on limited signals and treat each user in isolation. They miss the behavioral graph: which content users consume, which products they browse, and how their engagement patterns cluster. For a DTC brand spending $20M on acquisition, a 25% improvement in lookalike quality means $5M in incremental revenue from the same ad spend.
Quick answer
Graph neural networks build lookalike audiences by learning deep behavioral similarity across browsing patterns, purchase history, content engagement, and demographic signals. Unlike platform-native tools that match on surface-level demographics, GNN-based audiences capture multi-dimensional behavioral patterns, producing 25-40% higher conversion rates from the same ad spend.
Approaches compared
4 ways to solve this problem
1. Platform-native lookalikes (Meta, Google)
Upload a seed list of converters to the ad platform. The platform finds similar users based on its own signals (demographics, interests, in-platform behavior).
Best for
Fast setup, zero engineering. Works well when your seed list is large (10K+) and your product appeals broadly.
Watch out for
You have no visibility into what 'similar' means. The platform optimizes for its own metrics, not yours. Quality degrades sharply as you scale beyond 1-2% lookalike size.
2. Propensity scoring on CRM data
Train a classification model (logistic regression, XGBoost) on your first-party data to score non-converters by conversion likelihood.
Best for
Good when you have rich CRM data and want full control over the model. Interpretable and auditable.
Watch out for
Limited to the features you engineer. Misses behavioral graph signals like 'users who browse similar product sequences' or 'users connected to multiple converters.'
3. Collaborative filtering / embedding similarity
Learn user embeddings from interaction matrices (views, clicks, purchases) and find nearest neighbors to your seed audience in embedding space.
Best for
Captures behavioral co-occurrence patterns. Works well for media and e-commerce with dense interaction data.
Watch out for
Treats each interaction type independently. Cannot combine browsing, purchasing, and demographic signals in a single model without extensive engineering.
4. KumoRFM (relational graph ML)
Connect users, behaviors, segments, conversions, and demographics into a single graph. The GNN learns holistic user embeddings that encode behavioral similarity across all dimensions simultaneously.
Best for
Highest-quality audiences. Captures behavioral graph similarity (browsing sequences, category affinities, social connections to converters) that no single-table model can represent.
Watch out for
Requires first-party behavioral data in normalized tables. Adds most value when you have multiple interaction types (browse, click, purchase, engage) to connect.
Key metric: Graph-learned lookalike audiences convert 25-40% better than platform-native demographic-based lookalikes at equivalent audience sizes.
Why relational data changes the answer
Audience quality is determined by how well you measure similarity between users. Platform-native lookalikes measure similarity on a handful of demographic and interest signals. CRM-based propensity models add purchase history but still treat each user as an independent row. Neither approach captures the behavioral graph: which product categories a user browses in sequence, which content they engage with deeply, and how their behavior patterns cluster with existing converters.
Relational models encode all of this into a single user embedding. Two users who look nothing alike demographically (different age, different city, different income) but who exhibit the same browsing-to-purchase sequence for the same product categories will end up close in the embedding space. On the RelBench benchmark, relational models score 76.71 vs 62.44 for single-table approaches -- a gap that translates directly to audience quality and downstream conversion rates.
Platform-native lookalikes are like a dating app that matches people by height, age, and city. You get surface-level similarity but miss compatibility. Graph-based audience modeling is like a matchmaker who watches how people spend their weekends, what they read, what they cook, and who their friends are. The behavioral fingerprint is what predicts a real match, not the demographic profile.
How KumoRFM solves this
Graph-powered intelligence for advertising
Kumo encodes users, behaviors, segments, conversions, and demographics into a single graph. The GNN learns user embeddings that capture deep behavioral similarity, not just demographic overlap. PQL's RANK TOP operator surfaces the highest-scoring non-converters, giving media buyers a ready-to-activate audience list ranked by predicted conversion probability.
From data to predictions
See the full pipeline in action
Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.
Your data
The relational tables Kumo learns from
USERS
| user_id | signup_date | geo | device |
|---|---|---|---|
| U301 | 2024-06-15 | US-West | iOS |
| U302 | 2024-09-20 | US-East | Android |
| U303 | 2025-01-05 | EU-West | iOS |
BEHAVIORS
| event_id | user_id | action | category | timestamp |
|---|---|---|---|---|
| E601 | U301 | page_view | Electronics | 2025-02-28 |
| E602 | U302 | add_to_cart | Fashion | 2025-03-01 |
| E603 | U303 | page_view | Electronics | 2025-03-01 |
SEGMENTS
| segment_id | user_id | segment_name |
|---|---|---|
| SEG01 | U301 | High-intent |
| SEG02 | U302 | Browsers |
| SEG03 | U303 | New-visitor |
CONVERSIONS
| conversion_id | user_id | value | timestamp |
|---|---|---|---|
| CVR201 | U301 | $320 | 2025-02-28 |
DEMOGRAPHICS
| user_id | age_range | income_tier | interests |
|---|---|---|---|
| U301 | 25-34 | High | Tech, Fitness |
| U302 | 35-44 | Medium | Fashion, Travel |
| U303 | 25-34 | High | Tech, Gaming |
Write your PQL query
Describe what to predict in 2–3 lines — Kumo handles the rest
PREDICT BOOL(CONVERSIONS.conversion_id, 0, 30, days) FOR EACH USERS.user_id WHERE COUNT(CONVERSIONS.*, -365, 0, days) = 0 RANK TOP 100000
Prediction output
Every entity gets a score, updated continuously
| USER_ID | CONVERSION_PROB | RANK | SEGMENT |
|---|---|---|---|
| U303 | 0.34 | 1 | New-visitor |
| U302 | 0.18 | 2 | Browsers |
| U508 | 0.15 | 3 | Re-engaged |
Understand why
Every prediction includes feature attributions — no black boxes
User U303 -- New-visitor segment
Predicted: 34% conversion probability (Rank #1)
Top contributing features
Browsing pattern similarity to converters
92% match
33% attribution
Category affinity overlap
Electronics
25% attribution
Device and geo match to seed audience
iOS + US-West
18% attribution
Session depth last 7 days
12 pages
14% attribution
Connected users who converted
3 of 8
10% attribution
Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability
PQL Documentation
Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.
Python SDK
Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.
Explainability Docs
Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.
Frequently asked questions
Common questions about lookalike audience modeling
How do you build lookalike audiences without third-party data?
Focus on first-party behavioral data: browsing sequences, content engagement, purchase patterns, and email interactions. Graph models extract maximum signal from this data by connecting multiple interaction types through their natural relationships. The richer your first-party behavioral data, the less you depend on third-party audience signals.
What is the best way to find high-value customers for targeting?
Build behavioral similarity models on your first-party data rather than relying on platform-native lookalikes. Graph neural networks learn which behavioral patterns (browsing sequences, category affinities, engagement depth) predict high-value conversion, producing audiences that convert 25-40% better than demographic-only targeting.
How do lookalike audiences scale without losing quality?
Quality degrades with scale because as you expand the audience, you include less-similar users. Graph-based models degrade more gracefully because they measure similarity on a richer set of behavioral dimensions. Where platform lookalikes lose effectiveness at 2-3% expansion, graph-learned audiences maintain quality up to 5-8% because the similarity signal is stronger.
What data do you need for audience modeling?
A seed list of converters, user behavioral events (page views, clicks, add-to-cart, purchases) with timestamps, and user profiles. For best results, add content metadata, product categories, and segment memberships. More connected tables give the model more dimensions of similarity to learn from.
How do you measure lookalike audience quality?
Track incremental conversion rate and incremental ROAS against a holdout group. The only metric that matters is whether the lookalike audience converts at a meaningfully higher rate than a random sample. Graph-based audiences typically show 25-40% higher conversion rates than platform-native lookalikes of the same size.
Bottom line: A DTC brand spending $20M on acquisition generates $5M in incremental revenue by replacing platform-native lookalikes with Kumo's graph-learned audience models. Behavioral graph similarity outperforms demographic-only targeting by 25-40%.
Related use cases
Explore more ad tech use cases
Topics covered
One Platform. One Model. Infinite Predictions.
KumoRFM
Relational Foundation Model
Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.
For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.
Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.




