What data is needed for content recommendations?

Kumo connects directly to your existing relational tables: SUBSCRIBERS, CONTENT, WATCH_HISTORY, RATINGS, GENRES. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

1Ranking · Content Recommendations

Content Recommendations

“What should this subscriber watch next?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

What should this subscriber watch next?

Streaming platforms lose $1.5B annually to subscriber churn driven by poor discovery. When users can't find content they enjoy within 60-90 seconds, they disengage. Collaborative filtering alone misses the content graph: genre adjacencies, creator networks, and viewing context (time of day, device, co-viewers). For a platform with 50M subscribers, a 2% engagement lift from better recommendations prevents 200K cancellations worth $24M per year.

Quick answer

Graph neural networks predict what subscribers want to watch next by learning patterns across viewing history, ratings, content metadata, and genre relationships simultaneously. Unlike collaborative filtering that only sees 'users who watched X also watched Y,' GNNs capture viewing context -- time of day, device, genre trajectories -- producing 15-25% higher engagement than traditional recommendation engines.

Approaches compared

4 ways to solve this problem

1. Collaborative filtering (matrix factorization)

Decompose the user-content interaction matrix into latent factors. The foundation of recommendation systems since the Netflix Prize era.

Best for

Strong baseline when you have dense interaction data. Well-understood, easy to implement, and fast at inference time.

Watch out for

Cold-start problem for new users and new content. Treats all interactions equally -- a user who watched 5% of a movie gets the same signal as one who watched 100%. Cannot incorporate content metadata or viewing context.

2. Content-based filtering (metadata similarity)

Recommend content similar to what the user has watched, based on genre, cast, director, and description embeddings.

Best for

Solves the cold-start problem for new content. Works without collaborative signals from other users.

Watch out for

Creates filter bubbles -- recommends more of the same without discovering new tastes. Cannot capture serendipitous discoveries that collaborative signals reveal.

3. Deep learning hybrids (two-tower models, transformers)

Combine user and content embeddings in a deep model. Learn sequential viewing patterns with transformers. Used by Netflix, YouTube, and Spotify.

Best for

Captures sequential viewing behavior and can model complex user preferences. Scales well to large catalogs.

Watch out for

Requires significant engineering to incorporate multiple signal types (viewing history + ratings + search + browse). Each new signal type requires architectural changes.

4. KumoRFM (relational graph ML)

Connect subscribers, content, watch history, ratings, and genres into a unified graph. The GNN learns nuanced preferences from the full relational structure without custom architecture for each signal type.

Best for

Captures multi-hop patterns like 'users who watched X on mobile at night in this genre cluster tend to engage with Y within 48 hours.' Handles cold-start naturally through content graph connections.

Watch out for

Requires content and interaction data in normalized tables with clear relationships. Adds most value when you have rich metadata beyond just view counts.

Key metric: RelBench benchmark: relational models score 76.71 vs 62.44 for single-table baselines on recommendation tasks over multi-table data.

Why relational data changes the answer

Viewing decisions are shaped by context that lives across multiple tables. The same subscriber watches action movies on the TV at night but comedy clips on their phone during lunch. They rate sci-fi dramas highly but only finish 40% of them, while they complete every rom-com without rating any. Their household shares an account, so some viewing history belongs to their partner. A flat interaction matrix collapses all of this into a single 'subscriber watched content' signal, losing the contextual richness that actually predicts the next watch.

Relational models preserve these connections. They learn that Subscriber SUB001 on their Smart TV after 8pm, having just finished an action thriller they rated 4.5 stars, is a different recommendation context than the same subscriber on mobile during their commute. On the RelBench benchmark, relational models score 76.71 vs 62.44 for single-table approaches. In streaming, that gap translates directly to engagement minutes and retention -- the two metrics that determine platform value.

Collaborative filtering is like a bookstore that tracks which books sell together but has no idea why. It knows that people who bought Book A also bought Book B, but it doesn't know that both books were bought as gifts during the holidays, or that they share an author. Graph-based recommendations read the full catalog: genres connect to subgenres, directors connect to their filmography, actors connect to fan bases, and viewing sessions connect to devices and times. The recommendation comes from the story behind the data, not just the co-occurrence.

How KumoRFM solves this

Graph-powered intelligence for media platforms

Kumo connects subscribers, content, watch history, ratings, and genres into a unified graph. The GNN learns nuanced preferences: not just 'users who watched X also watched Y,' but 'users who watched X on mobile at night in genre cluster Z tend to engage with Y-type content within 48 hours.' PQL's RANK TOP operator delivers a ranked watchlist per subscriber, updated continuously.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

SUBSCRIBERS

subscriber_id	plan	signup_date	preferred_device
SUB001	Premium	2023-08-15	Smart TV
SUB002	Standard	2024-02-20	Mobile
SUB003	Premium	2024-06-10	Tablet

CONTENT

content_id	title	genre	release_year
MOV101	Neon Heist	Action/Thriller	2025
MOV102	Quiet Garden	Drama	2024
SER201	Code Black	Sci-Fi/Drama	2025

WATCH_HISTORY

watch_id	subscriber_id	content_id	pct_watched	timestamp
W5001	SUB001	MOV101	95%	2025-02-28
W5002	SUB001	SER201	40%	2025-03-01
W5003	SUB002	MOV102	100%	2025-03-01

RATINGS

rating_id	subscriber_id	content_id	score
R301	SUB001	MOV101	4.5
R302	SUB002	MOV102	5.0
R303	SUB003	SER201	3.8

GENRES

genre_id	name	parent_genre	avg_completion_rate
G01	Action/Thriller	Action	72%
G02	Drama	Drama	81%
G03	Sci-Fi/Drama	Sci-Fi	65%

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT BOOL(WATCH_HISTORY.watch_id, 0, 7, days)
FOR EACH SUBSCRIBERS.subscriber_id, CONTENT.content_id
RANK TOP 10

Prediction output

Every entity gets a score, updated continuously

SUBSCRIBER_ID	CONTENT_ID	TITLE	WATCH_PROB	RANK
SUB001	MOV305	Steel Rain	0.78	1
SUB001	SER202	Deep Signal	0.65	2
SUB002	MOV102	Quiet Garden	0.71	1
SUB002	SER201	Code Black	0.52	2

Understand why

Every prediction includes feature attributions — no black boxes

Subscriber SUB001 -- Content MOV305 (Steel Rain)

Predicted: 78% watch probability (Rank #1)

Top contributing features

Genre overlap with high-rated content

Action/Thriller

30% attribution

Similar subscribers' completion rate

88%

25% attribution

Director overlap with watched titles

Same director

20% attribution

Time-of-day viewing pattern match

Evening

14% attribution

Content freshness (release recency)

2 weeks old

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about content recommendations

What is the best recommendation algorithm for streaming platforms?

Graph neural networks that operate on the full relational structure (subscribers, content, watch history, ratings, genres) outperform collaborative filtering and even deep learning hybrids. The advantage comes from capturing viewing context (device, time, completion rate) and content relationships (genre adjacency, creator networks) in a single model rather than engineering separate features for each signal.

How do you solve the cold-start problem for new content?

Graph models handle cold-start naturally because new content is connected to existing entities through genre, cast, director, and production metadata. Even with zero viewing data, the model can predict engagement based on how similar content performed for similar subscriber segments. This is structurally impossible with pure collaborative filtering.

How do recommendations affect subscriber churn?

Poor content discovery is the leading driver of streaming churn. When subscribers cannot find content they enjoy within 60-90 seconds of browsing, they disengage. A 15-25% improvement in recommendation quality prevents 2-5% of churn-driven cancellations, worth $20-50M annually for a major platform.

What data do you need for a streaming recommendation engine?

Subscriber profiles, content metadata (genre, cast, director, release date), watch history with completion percentages and timestamps, and ratings. For best results, add device context, household membership, and browse/search behavior. Each additional table adds a dimension of signal the model can learn from.

How do you measure recommendation quality?

Track engagement metrics: completion rate of recommended content, time-to-first-play after opening the app, and sessions where the subscriber found something to watch vs. abandoned. Offline metrics like NDCG and recall matter during development, but business impact shows up in engagement minutes per subscriber and 30-day retention rate.

Bottom line: A streaming platform with 50M subscribers prevents 200K cancellations worth $24M per year by improving content discovery. Kumo's graph captures viewing context, genre relationships, and social signals that collaborative filtering alone misses.

Related use cases

Explore more media & entertainment use cases

Use Case #2Subscriber ChurnLearn more

Use Case #3Engagement PredictionLearn more

Use Case #4Content Demand ForecastingLearn more

Next#2 Subscriber Churn

Topics covered

content recommendation AIstreaming recommendations MLnext-watch predictionvideo recommendation enginegraph-based recommendationsKumoRFM mediasubscriber engagement predictioncollaborative filtering GNN

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free