Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

Recommendation Systems: From Collaborative Filtering to Graph Transformers

Amazon drives 35% of revenue through recommendations. Netflix saves $1B/year in retention. The technology has evolved through four generations. Each one finds signals the previous generation missed.

TL;DR

  • 1Recommendations drive 35% of Amazon's revenue, save Netflix $1B/year in retention, and account for 70% of YouTube watch time. The highest-leverage ML application in consumer tech.
  • 2Four generations: content-based filtering, collaborative filtering, deep learning, and graph-based approaches. Each captures signals the previous generation missed.
  • 3Cold-start is the oldest unsolved problem. Graph models solve it by connecting new items through genre, brand, and category edges, enabling recommendations from day one.
  • 4DoorDash measured a 1.8% engagement lift across 30 million users with graph-based recommendations, without building a single manual feature.
  • 5Foundation models eliminate the feature engineering trap: no pairwise feature computation, no SQL joins across tables. The model reads raw relational data and discovers relevance patterns automatically.

Recommendations are the highest-leverage ML application in consumer tech. Amazon attributes 35% of its revenue to recommendation algorithms. Netflix estimates its system saves $1 billion per year in subscriber retention by reducing churn through personalized content. YouTube reports that 70% of watch time comes from recommended content. Spotify's Discover Weekly drives 30% of all streams for featured artists.

These numbers are not marketing claims. They are measured lifts from A/B tests at scale. The question is not whether recommendations matter. It is whether your recommendation system is capturing the signals available in your data.

The technology has evolved through four generations. Each one extracts patterns that the previous one could not see.

Generation 1: Content-based filtering

The simplest approach: recommend items similar to what the user has liked before. If you watched three action movies, recommend more action movies. If you bought running shoes, recommend running apparel.

Content-based filtering works on item attributes. Each item is represented by a feature vector (genre, price range, brand, color, description keywords). The system finds items with similar feature vectors to the user's historical preferences.

Strengths

  • No cold-start for new users if you know their preferences
  • Transparent reasoning ("recommended because you liked X")
  • Works with no collaborative data (single-user systems)

Limitations

  • Filter bubble. Users only see items similar to what they have already seen. A runner who might love cycling content never gets exposed to it.
  • Feature dependency. Quality depends on the richness of item metadata. If product descriptions are sparse or categories are coarse, recommendations are generic.
  • No serendipity. Cannot discover that users who like item A also tend to like item B, even if A and B have different attributes.

Here is what the underlying data looks like for a streaming platform. The recommendation signal spans users, content, and interactions.

users

user_idnameplansignupregion
U-501Natalie ReevesPremium2023-05-18US-West
U-502Carlos MendezStandard2024-02-10US-South
U-503Yuki TanakaPremium2022-11-03US-East

content

content_idtitlegenrereleaseavg_rating
C-101The SignalSci-Fi Thriller2025-09-014.2
C-102Iron RidgeAction2025-10-153.8
C-103Midnight BloomDrama2025-11-014.6
C-104Zero HourSci-Fi Thriller2025-11-20---

C-104 (Zero Hour) is a new release with no ratings yet. This is the cold-start problem: collaborative filtering cannot recommend it.

interactions

interaction_iduser_idcontent_idtypedaterating
INT-01U-501C-101Watched2025-09-055
INT-02U-501C-102Watched2025-10-183
INT-03U-502C-101Watched2025-09-124
INT-04U-502C-103Watched2025-11-055
INT-05U-503C-101Watched2025-09-085
INT-06U-503C-103Watched2025-11-034
INT-07U-503C-102Browsed2025-11-10---

Users U-501, U-502, and U-503 all loved 'The Signal' (Sci-Fi Thriller). Zero Hour is the same genre but has no interactions yet. Graph-based models connect it through genre edges.

Generation 2: Collaborative filtering

The breakthrough insight: you do not need item attributes to make good recommendations. You just need to know what similar users liked. If users A, B, and C all liked items 1, 2, and 3, and user A also liked item 4, then recommend item 4 to users B and C.

This is the approach that powered Amazon's early recommendation engine and the Netflix Prize, the famous $1 million competition that drove a decade of recommendation research.

Matrix factorization

The dominant technique from 2006 to 2016 was matrix factorization. Represent the user-item interaction matrix (users as rows, items as columns, interactions as values) and decompose it into two lower-dimensional matrices. Each user gets a latent vector. Each item gets a latent vector. The dot product of a user vector and an item vector predicts the interaction strength.

Matrix factorization is elegant and efficient. It handles sparse data well (most users interact with a tiny fraction of items) and scales to millions of users and items with techniques like alternating least squares (ALS) and stochastic gradient descent.

Limitations

  • Cold-start for new items and users. With no interaction history, there is no latent vector. New items never get recommended until they accumulate enough interactions.
  • Ignores side information. Matrix factorization uses only the interaction matrix. Item attributes, user demographics, temporal patterns, and contextual signals are discarded.
  • Static representation. A user's latent vector summarizes their entire history equally. Recent interests are weighted the same as interests from years ago.

Generation 3: Deep learning

Starting around 2016, deep learning entered recommendations. Neural collaborative filtering replaced dot products with neural networks. Sequence models (RNNs, then transformers) captured temporal dynamics in user behavior. Two-tower architectures enabled efficient retrieval at scale.

Key advances

  • Neural collaborative filtering. Replace the linear dot product with a multi-layer network that can learn non-linear user-item interactions. This captures complex preference patterns that matrix factorization misses.
  • Sequential models. Treat a user's interaction history as a sequence and use transformers (like SASRec) to model temporal dynamics. Recent interactions are weighted more heavily. Interest drift is captured.
  • Side information integration. Deep models can incorporate item features, user features, and contextual signals alongside interaction data. This helps with cold-start.

What deep learning still misses

Deep learning recommendation models process the user-item interaction graph implicitly through embeddings, but they do not model the full relational structure of the data. Here is what each generation sees for the same user.

what each generation sees for Natalie (U-501)

generationdata_usedrecommendation_for_nataliesignal_source
Content-basedGenre: Sci-Fi Thriller (from The Signal)More Sci-Fi ThrillersItem attributes only
CollaborativeUsers who watched The Signal also watched...Iron Ridge (popular overlap)User-item matrix only
Deep learningNatalie's watch sequence: Signal then Iron RidgeMidnight Bloom (sequence pattern)Interaction sequence
Graph-basedSignal (5-star) + genre edge + U-502, U-503 also loved Signal + they loved Midnight BloomMidnight Bloom (0.84), Zero Hour (0.91)Full relational graph

Each generation adds a layer of signal. Only the graph-based approach discovers that Zero Hour (a new release with zero interactions) should rank highest for Natalie, because it shares a genre edge with The Signal and all users who rated The Signal highly are in Natalie's graph neighborhood.

These multi-hop, relational signals require a model that represents the full data topology, not just sequences.

Generation 4: Graph-based approaches

The latest generation represents the recommendation problem as a graph. Users, items, categories, brands, and all other entities become nodes. Interactions (purchases, clicks, views, ratings) become edges. The graph captures the full relational structure of the data.

How graph recommendations work

A graph neural network processes this structure by passing messages along edges. In the first layer, each node aggregates information from its direct neighbors. In the second layer, it aggregates from 2-hop neighbors. After several layers, each node's representation encodes information from its entire local neighborhood.

For recommendations, this means a user's representation captures:

  • Their direct interactions (items they bought)
  • The attributes of those items (categories, brands, prices)
  • Other users who bought the same items (collaborative signal via the graph)
  • Items those similar users bought (2-hop recommendations)
  • Temporal patterns (recent vs. historical interactions, trending items)

This captures both content-based and collaborative signals in a single unified model, plus multi-hop relational patterns that neither approach captures alone.

Matrix factorization / Deep learning

  • User-item interaction matrix only
  • Cold-start problem for new items/users
  • Ignores multi-hop relational patterns
  • Side information requires manual integration
  • Static or sequence-based user representation

Graph-based (KumoRFM)

  • Full relational structure as a graph
  • Side information connected through graph edges
  • Multi-hop patterns captured naturally
  • Temporal dynamics preserved on edges
  • Unified content-based + collaborative signal

Production evidence

Graph-based recommendation systems have shown significant lifts in production:

  • Pinterest's PinSage (one of the first large-scale graph recommendation models) processes 3 billion nodes and 18 billion edges to recommend pins, delivering measurable engagement improvements over previous approaches
  • DoorDash used graph-based recommendations and saw a 1.8% engagement lift across 30 million users
  • Alibaba's graph recommendation system handles 1 billion items and showed a 10% conversion rate improvement over deep learning baselines

The feature engineering trap in recommendations

Building a recommendation system with traditional ML requires extensive feature engineering. For each user-item pair, you need features like:

  • User's interaction history with this item's category
  • User's average rating for this brand
  • Time since user's last purchase in this category
  • Popularity of this item in user's geographic region
  • Price relative to user's typical purchase range
  • Similarity score to user's top 10 most-interacted items

Each feature requires SQL joins across users, interactions, items, categories, and potentially more tables. For a catalog of 1 million items and 10 million users, computing pairwise features is computationally expensive and the feature engineering takes weeks.

A foundation model eliminates this entirely. It reads the raw relational tables (users, items, interactions, categories, brands) and generates recommendations without any manual feature engineering. DoorDash's 1.8% engagement lift came without building a single feature.

PQL Query

PREDICT interactions.rating > 3
FOR EACH users.user_id, content.content_id

The model reads users, content, and interactions as a graph. For the new release 'Zero Hour' (C-104), it connects through the Sci-Fi Thriller genre edge to 'The Signal' (C-101), then to users who rated it highly. Cold-start solved through graph connectivity.

Output

user_idcontent_idrelevance_scoretop_signal
U-501C-1040.91Rated The Signal 5/5, same genre
U-503C-1040.88Rated The Signal 5/5, browsed Action
U-502C-1040.72Rated The Signal 4/5, prefers Drama
U-501C-1030.84Similar users U-502, U-503 loved it

The cold-start advantage

Cold-start is the oldest unsolved problem in recommendations. A new user with no history gets generic, low-value recommendations. A new item with no interactions never gets surfaced.

Graph-based approaches mitigate cold-start through graph connectivity. A new user who signs up with demographic information and browses three products is immediately connected in the graph to those products, their categories, their brands, and (through the graph) to other users who interacted with similar items. Even sparse initial interactions provide rich graph context.

A new item is connected to its category, brand, price range, and attributes from the moment it enters the catalog. Graph propagation ensures it appears in recommendations to relevant users even without interaction history, based on its relational position.

Where each generation fits

Content-based filtering

Best for simple catalogs with rich metadata and limited interaction data. Works well for editorial or curated recommendations.

Collaborative filtering / matrix factorization

Good baseline for established catalogs with abundant interaction data. Fast, well-understood, and easy to implement.

Deep learning (sequential, two-tower)

Right for high-scale systems where temporal dynamics matter and you have the engineering capacity for neural model training and serving.

Graph-based / foundation models

Best for complex catalogs with multi-table data (products, categories, brands, user attributes, contextual signals) where the relational structure carries predictive value. Strongest advantage on cold-start, multi-hop discovery, and rapid deployment.

The bottom line

Recommendation systems have evolved from matching item attributes to learning latent factors to modeling full relational graphs. Each generation captures signals the previous one missed. The difference is not marginal: DoorDash measured a 1.8% lift across 30 million users by moving to graph-based recommendations. At DoorDash's scale, that is hundreds of millions in incremental revenue.

The relational structure of your data (users connected to items connected to categories connected to brands connected to other users) is not noise. It is signal. Models that ignore it leave money on the table. Models that exploit it find recommendations that collaborative filtering and deep learning cannot.

If your recommendation system is based on matrix factorization or a two-tower neural model, the graph structure of your data is sitting unused. The question is not whether to upgrade. It is how much lift you are leaving on the table.

Frequently asked questions

What are the main types of recommendation systems?

There are four generations: (1) content-based filtering, which recommends items similar to what you've liked based on item attributes; (2) collaborative filtering, which finds users similar to you and recommends what they liked; (3) deep learning approaches (neural collaborative filtering, embeddings), which learn complex user-item interactions; and (4) graph-based approaches, which model the full relational structure of users, items, and interactions as a graph.

Why are graph-based recommendations more accurate?

Graph-based approaches capture signals that matrix factorization misses: multi-hop patterns (users who bought X also browsed Y and returned Z), temporal dynamics (interest shifts over time), and side information (product categories, user demographics, contextual features) all represented as a unified graph. This relational context lets the model find non-obvious connections that improve relevance.

How much revenue do recommendations drive?

Amazon attributes 35% of its revenue to recommendation algorithms. Netflix estimates that its recommendation system saves $1 billion per year in subscriber retention. YouTube reported that 70% of watch time comes from recommendations. Spotify's Discover Weekly drives 30% of all streams for featured artists.

What is the cold-start problem in recommendations?

Cold-start is the challenge of recommending items to new users (who have no interaction history) or recommending new items (that no one has interacted with). Collaborative filtering fails entirely on cold-start because it relies on interaction history. Graph-based and foundation model approaches mitigate cold-start by using side information (user demographics, item attributes, contextual signals) connected through the graph.

How do foundation models improve recommendations?

A relational foundation model like KumoRFM is pre-trained on billions of relational patterns across diverse databases. It understands universal recommendation signals (recency, frequency, co-occurrence, temporal dynamics) without task-specific training. This means it can generate recommendations from raw relational data in seconds, handle cold-start through side information, and adapt to new catalogs without retraining.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.