What data is needed for content personalization?

Kumo connects directly to your existing relational tables: USERS, ARTICLE_VIEWS. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

2Multi-Label · Content Personalization

Content Personalization

“For each user, what content categories will they engage with in the next 30 days?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

For each user, what content categories will they engage with in the next 30 days?

Content platforms show the same trending articles to everyone or rely on simple tag-based matching. Users who read technology articles about AI infrastructure are shown generic tech news instead of the specific sub-topics they care about. Engagement drops, session duration shrinks, and ad revenue follows. For a major publisher with 50M monthly users, a 10% improvement in engagement translates to $8-12M in additional annual ad revenue.

Quick answer

Content personalization predicts which content categories each user will engage with over a future time window. Graph-based models learn from reading time, scroll depth, sharing behavior, and cross-user patterns to surface relevant content categories, delivering 2-4x engagement lift over rule-based tag matching.

Approaches compared

4 ways to solve this problem

1. Trending / Most Popular

Show the same trending articles to everyone. Rank by recency and aggregate view count. The default for most news and media platforms.

Best for

Breaking news and viral content where universal relevance is high. Also a solid fallback for anonymous or new users.

Watch out for

Zero personalization. A sports fan sees the same feed as a finance reader. Session depth suffers because users skip irrelevant content, reducing time on site and ad impressions.

2. Tag-Based Matching

Match content tags to user interest tags. If a user reads 'Technology' articles, show more 'Technology' articles. Basic implementation available in most CMS platforms.

Best for

Content platforms with well-tagged articles and users with clear, single-category preferences.

Watch out for

Treats categories as independent. A user who reads about AI infrastructure is also likely interested in cloud architecture, but tag matching does not know this because the categories have different tags. Also creates filter bubbles by never exposing users to adjacent interests.

3. Collaborative Filtering on Reading History

Find users with similar reading histories and recommend what they read. Standard recommendation approach applied to content.

Best for

Platforms with dense interaction data where 'readers who read X also read Y' patterns are strong.

Watch out for

Only sees the user-article interaction matrix. Cannot incorporate reading time (a 10-second bounce is treated the same as a 5-minute deep read), scroll depth, or sharing behavior. Also fails for new articles with no reads yet.

4. KumoRFM (Graph Neural Networks on Relational Data)

Models the full user-content-interaction graph: reading time, scroll depth, sharing behavior, and cross-user patterns. Discovers category connections (AI infrastructure readers engage with cloud architecture) that tag-based systems miss.

Best for

Content platforms with rich interaction data (reading time, scroll depth, shares) and diverse content categories with hidden cross-category connections.

Watch out for

Requires interaction-level data with engagement depth metrics (reading time, scroll). If your platform only logs page views without duration, the model has fewer quality signals to distinguish deep engagement from accidental clicks.

Key metric: Graph-based content personalization delivers 2-4x engagement lift over tag-based matching, driven by cross-category discovery patterns and interaction quality signals that tag systems cannot capture.

Why relational data changes the answer

User U001 (age 25-34) read 18 Technology articles in the past 30 days with an average reading time of 4.1 minutes per article. Tag-based matching would show more Technology articles. But the relational graph reveals much richer patterns: 82% of users with similar reading profiles also read Technology content, confirming the category affinity. But the graph also shows a Sports-Tech crossover pattern, because U001 reads sports analytics articles alongside pure tech pieces. The session depth for Technology is 3.2 articles per session (high engagement), and the share rate is 0.15, which is 3x the platform average.

These signals live in different parts of the data. Reading time is in the ARTICLE_VIEWS table. Cross-user patterns require comparing U001's behavior to similar users through the user-article interaction graph. The Sports-Tech crossover pattern emerges from multi-hop graph traversal: users who read both Technology and Sports articles form a distinct cluster with specific sub-topic preferences. Tag-based matching would never surface a sports analytics article to someone tagged as a 'Technology reader,' but the graph model discovers this cross-interest pattern automatically. For publishers with 50M+ monthly users, graph-based content personalization delivers 2-4x engagement lift, translating to $8-12M in additional annual ad revenue from increased session depth and time on site.

Tag-based content matching is like a librarian who shelves books by genre and only recommends within the same genre. A graph-based model is like a librarian who notices that readers of Malcolm Gladwell (social science) also binge-read Michael Lewis (finance narratives) and Mary Roach (popular science). The genres are different, but the reader profiles overlap. The graph discovers these cross-genre connections automatically.

How KumoRFM solves this

Relational intelligence for true personalization

Kumo models the full user-content-interaction graph — reading time, scroll depth, sharing behavior, and cross-user patterns — to predict which content categories each user will engage with next. It discovers that users who read AI infrastructure pieces also engage deeply with cloud architecture content, a connection invisible to tag-based systems that treat categories as independent.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

USERS

user_id	age_group	signup_date
U001	25-34	2024-03-10
U002	35-44	2023-09-22
U003	18-24	2024-08-15

ARTICLE_VIEWS

view_id	user_id	article_id	category	read_sec	timestamp
V001	U001	ART501	Technology	245	2025-02-18
V002	U001	ART302	Sports	180	2025-02-19
V003	U002	ART718	Finance	312	2025-02-18

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT LIST_DISTINCT(ARTICLE_VIEWS.CATEGORY, 0, 30, days)
FOR EACH USERS.USER_ID

Prediction output

Every entity gets a score, updated continuously

USER_ID	CLASS	SCORE	TIMESTAMP
U001	Technology	0.93	2025-03-12
U001	Sports	0.81	2025-03-12
U002	Finance	0.88	2025-03-12

Understand why

Every prediction includes feature attributions — no black boxes

User U001 (age 25-34, signed up 2024-03-10)

Predicted: Will engage with Technology content — score 0.93

Top contributing features

Technology articles read (30 days)

18 articles, avg 4.1 min

36% attribution

Graph neighbors' top category

82% also read Technology

25% attribution

Cross-category signal (Sports + Tech)

Sports-tech crossover pattern

18% attribution

Session depth (Technology)

3.2 articles per session

13% attribution

Share rate (Technology)

0.15 (3x platform avg)

8% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about content personalization

How does content personalization differ from content recommendation?

Content recommendation suggests specific articles. Content personalization predicts which categories a user will engage with, enabling personalized feed curation, homepage layout decisions, and email digest content selection. Personalization operates at the category level, which is more useful for editorial and layout decisions than individual article recommendations.

Can personalization work without user login data?

Yes. Graph models can personalize for anonymous users based on their current session behavior (articles viewed, reading time, scroll patterns) by finding similar patterns in logged-in users. A new visitor who spends 3 minutes reading an AI infrastructure article already has a strong behavioral signal for personalization.

How do you avoid creating filter bubbles?

Graph models naturally surface adjacent interests that tag-based systems miss. The Sports-Tech crossover example is exactly this: the model expands the user's content universe to related but not identical categories. You can also set a diversity parameter that ensures a minimum percentage of recommendations come from unexplored categories.

What engagement metrics matter most for content personalization?

Reading time and scroll depth are more informative than page views. A 10-second bounce should not count as engagement. Session depth (articles per session) measures sustained interest. Share rate indicates content that is valuable enough to pass along. Graph models weight these interaction quality signals, not just interaction volume.

Bottom line: 2-4x engagement lift over rule-based personalization. For publishers with 50M+ monthly users, this translates to $8-12M in additional annual ad revenue.

Related use cases

Explore more personalization use cases

Use Case #1Product RecommendationsLearn more

Use Case #3Search RankingLearn more

Use Case #7Notification RerankingLearn more

Previous#1 Product Recommendations

Next#3 Search Ranking

Topics covered

content personalization AImulti-label classificationcontent recommendation engineuser engagement predictiongraph neural network contentKumoRFMpredictive query languagemedia personalizationcontent feed optimizationarticle recommendationreader engagement AIrelational deep learning

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free