Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn11 min read

How Reddit Uses Graph Neural Networks for Recommendations

Reddit achieved 4-5 years of iterative recommendation improvement in 2 months using a relational approach. Here's how GNN-based recommendations transform content platforms by seeing the full graph of users, posts, subreddits, and comments.

TL;DR

  • 1On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with XGBoost and 63% for LLM+AutoML - with zero feature engineering and zero training time.
  • 2Content platforms like Reddit have a massive heterogeneous graph: billions of posts, millions of subreddits, hundreds of millions of users, all connected through upvotes, comments, subscriptions, and cross-community activity. Collaborative filtering flattens this into a user-item matrix and loses most of the signal.
  • 3GNN-based recommendations model the full relational structure - community overlaps, comment thread depth, user interest evolution, cross-modality patterns - and discover signals that years of manual feature engineering miss.
  • 4Reddit achieved what took 4-5 years of iterative collaborative filtering improvement in just 2 months with relational deep learning. The GNN discovered patterns that hand-crafted features had never captured.
  • 5On RelBench recommendation benchmarks, KumoRFM achieves 7.29 MAP@K versus GraphSAGE at 1.85 and LightGBM at 1.79 - a 4x improvement that comes from reading the full relational graph instead of a flat feature table.

Reddit is one of the most complex recommendation environments on the internet. Billions of posts across millions of subreddits, hundreds of millions of users with constantly shifting interests, and a community-driven structure where context matters as much as content. A post that thrives in r/MachineLearning might be irrelevant in r/datascience, despite overlapping audiences.

For years, content platforms like Reddit improved their recommendations incrementally: adding new engagement signals, tuning collaborative filtering models, engineering features one at a time. Each iteration cycle took months and yielded small accuracy gains. Then a graph-based approach compressed 4-5 years of that iterative improvement into 2 months.

This article explains why. Not by speculating about Reddit's internal systems, but by analyzing what makes content recommendation fundamentally a graph problem - and why relational deep learning discovers patterns that flat-table approaches structurally cannot.

The headline result: SAP SALT benchmark

The SAP SALT benchmark is an enterprise-grade evaluation where real business analysts and data scientists attempt prediction tasks on SAP enterprise data. It measures how accurately different approaches predict real business outcomes on production-quality enterprise databases with multiple related tables.

sap_salt_enterprise_benchmark

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark: KumoRFM outperforms expert data scientists by 16 percentage points and LLM+AutoML by 28 percentage points on real enterprise prediction tasks.

KumoRFM scores 91% where PhD-level data scientists with weeks of feature engineering and hand-tuned XGBoost score 75%. The 16 percentage point gap is the value of reading relational data natively instead of flattening it into a single table.

The content recommendation challenge

To recommend content effectively on a platform like Reddit, a system needs to understand multiple interacting signals simultaneously:

  • User interests - inferred from upvotes, comments, subscriptions, time spent reading, and what users choose to skip.
  • Subreddit relationships - similar communities share overlapping user bases (r/MachineLearning and r/datascience), topical connections (r/cooking and r/MealPrepSunday), or cultural similarities.
  • Post quality signals - karma, comment depth, upvote-to-view ratio, whether comments are substantive or shallow.
  • Temporal freshness - a breaking news post matters now; a tutorial is relevant for months. Different content types have different decay curves.
  • Cross-community interests - a user active in r/Python and r/datascience might enjoy a post in r/MLOps that they have never visited.

Each of these signals lives in a different table. Users in one. Posts in another. Subreddits in a third. Comments, votes, and subscriptions each in their own tables. The relationships between these entities - who posted where, who commented on what, which subreddits share members - are where the predictive signal lives.

Why collaborative filtering plateaus for content platforms

Traditional recommendation systems treat the problem as a user-item interaction matrix. User A upvoted posts 1, 3, and 7. User B upvoted posts 1, 3, and 9. Therefore User A might like post 9. This is collaborative filtering, and it works reasonably well for simple product recommendations.

But for content platforms with rich relational structure, collaborative filtering hits fundamental limits:

signals_collaborative_filtering_misses

signalwhat_CF_seeswhat_is_lost
Community structureUser upvoted Post XPost X is in a subreddit cluster (r/ML, r/datascience, r/AI) with 60% user overlap
Comment depth as engagementUser interacted with Post YPost Y generated 200+ deep comment threads - a quality signal visible only in the comment graph
User interest evolutionUser's recent upvotesUser shifted from r/learnpython to r/MachineLearning to r/MLOps over 6 months - a trajectory, not a snapshot
Cross-modality patternsUser reads text postsUser who reads ML text posts in r/MachineLearning may engage with ML video tutorials in r/learnmachinelearning
Cold-start subredditsNo data (new community)New subreddit r/LLMOps is topically similar to r/MLOps and r/LangChain - inferrable from description, creator history, and early subscribers

Collaborative filtering sees the user-item interaction matrix. Everything in the third column - community structure, engagement quality, interest trajectories, cross-modality patterns, and cold-start inference - requires reading the full relational graph.

Each of these blind spots can be addressed individually by engineering features: compute subreddit similarity scores, build comment depth metrics, create user interest trajectory features. But each feature takes weeks to design, implement, validate, and deploy. After 4-5 years of this iterative process, a mature recommendation system has hundreds of hand-crafted features - and still misses interaction patterns that were never explicitly engineered.

The graph approach to content recommendations

Users, posts, subreddits, and comments naturally form a massive heterogeneous graph. Each entity type is a different kind of node. Each relationship - upvotes, subscriptions, authorship, comment replies - is a different kind of edge.

reddit_as_heterogeneous_graph

node_typeexampleskey_attributes
UserHundreds of millionsAccount age, karma, activity pattern, subscriptions
PostBillionsTitle, content type (text/image/video/link), karma, timestamp
SubredditMillionsTopic, subscriber count, activity level, rules, related communities
CommentTens of billionsText, depth in thread, karma, timestamp, parent comment

Each entity type becomes a node in the graph. The relationships between them - upvotes, posts, subscriptions, replies - become edges. This is the natural structure of the data.

A graph neural network processes this structure by passing messages along edges. Information about a subreddit's user base flows to the posts within it. Information about comment quality flows up to the post. Information about a user's subscription patterns flows to their activity predictions. Each message-passing layer lets information travel one hop further through the graph.

After multiple layers, the GNN has learned representations that encode multi-hop patterns:

  • Community overlap patterns. Users who subscribe to r/MachineLearning and r/statistics have different content preferences than users who subscribe to r/MachineLearning and r/startups - even though both groups are in the same subreddit. The GNN captures this through the subscription edges.
  • Content quality propagation. A post's quality is not just its karma score. It is the depth and substance of its comment threads, the reputation of its commenters, and the engagement patterns of similar posts in related subreddits. These signals propagate through upvote and comment edges.
  • User interest evolution. By preserving temporal information on edges, the GNN learns trajectories: a user moving from beginner to advanced topics, shifting from one domain to another, increasing or decreasing engagement over time.
  • Cold-start inference. A new subreddit with 50 subscribers is connected to the graph through those subscribers' other activity. If its early members are all active in data engineering communities, the GNN infers what content belongs there - without waiting for thousands of interactions.

4-5 years of improvement in 2 months

The key result: what took 4-5 years of iterative collaborative filtering improvement was achieved in 2 months with relational deep learning. This is not about a better algorithm marginally outperforming the old one. It is about a fundamentally different representation of the data.

Years of manual feature engineering - computing subreddit similarity matrices, building user interest decay functions, engineering comment quality scores, creating cross-community affinity features - were replaced by a model that reads the raw relational structure and discovers these patterns automatically.

The GNN did not just replicate the hand-crafted features. It discovered patterns that years of feature engineering had missed: interaction effects between community structure and content type, temporal patterns in cross-community migration, and engagement signals that only become visible when you model the full graph.

Collaborative Filtering (4-5 years iterative)

  • Flattens data to user-item interaction matrix
  • Each new signal requires manual feature engineering
  • Misses community structure and cross-entity patterns
  • Cold-start requires separate heuristic systems
  • Improvement cycle: months per incremental gain

GNN-Based Recommendations (2 months)

  • Reads the full heterogeneous graph directly
  • Discovers signals automatically from relational structure
  • Captures community overlap, comment quality, interest evolution
  • Cold-start handled through graph connectivity
  • Discovers patterns that years of feature engineering missed

RelBench recommendation benchmarks

The RelBench benchmark provides an independent measure of how different approaches perform on recommendation tasks across real-world relational databases. The results quantify the gap between flat-table approaches and graph-based methods:

relbench_recommendation_results

approachMAP@Kapproach_typewhat_it_reads
LightGBM1.79Tabular ML + manual featuresFlat feature table
GraphSAGE1.85Basic GNNGraph structure (limited message passing)
KumoRFM7.29Foundation model for relational dataFull heterogeneous temporal graph

Highlighted: KumoRFM achieves 4x the MAP@K of both tabular and basic GNN approaches on recommendation tasks. The gap comes from reading the full relational structure with a pre-trained foundation model, not just applying a GNN architecture.

The 4x improvement is not from a better algorithm on the same data. It is from a better representation of the data. LightGBM sees a flat feature table. GraphSAGE sees graph structure but with limited expressiveness. KumoRFM reads the full heterogeneous temporal graph with a model pre-trained on thousands of diverse relational databases.

What each approach captures

collaborative_filtering_vs_gnn_signals

signal_typecollaborative_filteringGNN_based_recommendations
Direct user-item interactionsYes (upvotes, clicks)Yes (plus context from the full graph)
Community structureNo (requires manual clustering)Yes (learned from subscription and activity edges)
Content quality (beyond karma)No (requires engineered features)Yes (propagated from comment depth and engagement patterns)
User interest evolutionLimited (recent window only)Yes (temporal edges preserve full trajectory)
Cross-community discoveryNo (limited to co-occurrence)Yes (multi-hop paths through shared users and topics)
Cold-start entitiesNo (no interaction history)Yes (inferred from graph connectivity)
Cross-modality preferencesNo (separate models per content type)Yes (content type is a node attribute, not a silo)
Multi-hop patternsNo (pairwise only)Yes (user → subreddit → similar subreddit → trending post)

Collaborative filtering captures direct user-item interactions. GNN-based recommendations capture everything else: the relational structure that determines why a user will engage with content they have never seen.

Building recommendations with PQL

With a relational foundation model, building a content recommendation system does not require months of feature engineering and model iteration. It requires describing what you want to predict.

PQL Query

PREDICT engagement
FOR EACH users.user_id, posts.post_id
WHERE posts.created_at > CURRENT_DATE - INTERVAL '7 days'

One query replaces the entire recommendation pipeline: user profiling, content scoring, community analysis, and ranking. The foundation model reads raw relational tables - users, posts, subreddits, comments, votes - and discovers which content each user will engage with.

Output

user_idpost_idengagement_scoreprimary_signal
U-44201P-8910340.92Community overlap + interest trajectory
U-44201P-8911070.87Cross-community topic match
U-44201P-8924410.71High comment quality in related subreddit
U-44201P-8900220.13Low community relevance

Why this matters beyond Reddit

Reddit's experience is a case study in a general pattern. Any platform with rich relational structure - users, items, categories, interactions, temporal dynamics - faces the same fundamental choice: flatten the data into feature tables and iterate for years, or model the relational graph directly and discover patterns in weeks.

E-commerce platforms have customers, products, categories, reviews, and browsing sessions. Streaming services have viewers, content, genres, ratings, and watch patterns. Social networks have users, posts, connections, groups, and engagement events. In every case, the predictive signal lives in the relationships between entities, not in any single flat table.

The 4-5 years vs 2 months result is not specific to Reddit. It is specific to the gap between manually engineering relational patterns into flat features and automatically learning them from the graph structure. That gap exists anywhere relational data powers recommendations.

Frequently asked questions

Why do graph neural networks outperform collaborative filtering for content recommendations?

Collaborative filtering only sees user-item interaction matrices (who upvoted what). GNNs model the full relational structure: users, posts, subreddits, comments, and all the relationships between them. This lets GNNs capture community overlap patterns, content quality signals from comment depth, user interest evolution over time, and cross-modality preferences. On RelBench recommendation benchmarks, GNN-based approaches achieve 7.29 MAP@K versus 1.85 for GraphSAGE and 1.79 for LightGBM - a 4x improvement.

What is a heterogeneous graph and why does it matter for recommendations?

A heterogeneous graph contains multiple types of nodes and edges. In a content platform like Reddit, users, posts, subreddits, and comments are different node types. Upvotes, subscriptions, authorship, and comment replies are different edge types. This structure preserves the full richness of the data, unlike a flat user-item matrix that discards entity types and relationship semantics. GNNs designed for heterogeneous graphs learn different patterns for each relationship type.

How do GNNs handle cold-start recommendations?

Cold-start is one of the hardest problems for collaborative filtering because new users and new items have no interaction history. GNNs solve this by propagating information through the graph structure. A new subreddit can receive recommendations based on the topics it covers, the users who created it, and its similarity to existing subreddits - even before anyone has interacted with it. A new user can get recommendations based on the subreddits they subscribe to during onboarding, because those subscriptions connect them to the broader graph.

What does '4-5 years of improvement in 2 months' mean in practice?

Traditional recommendation systems improve incrementally through manual feature engineering: adding new interaction signals, tuning decay windows, engineering engagement features, and iterating on model architectures. Each cycle of improvement takes months and yields small accuracy gains. A GNN-based approach using relational deep learning can discover many of these patterns automatically from the graph structure, compressing years of iterative feature engineering work into a single training cycle.

Can GNN-based recommendations work for platforms smaller than Reddit?

Yes. The advantage of GNN-based recommendations scales with data complexity, not data volume. Any platform with multiple entity types (users, items, categories, interactions) and rich relational structure benefits from the graph approach. Smaller platforms often benefit more, because they lack the engineering resources to build and maintain the hundreds of hand-crafted features that large platforms invest in over years.

How does Predictive Query Language (PQL) simplify building recommendation systems?

PQL lets you express recommendation tasks in 2-3 lines instead of building an entire ML pipeline. A query like 'PREDICT engagement FOR EACH users.user_id, posts.post_id' tells the foundation model what to predict and for whom. The model reads the raw relational tables - users, posts, subreddits, comments, votes - discovers the predictive patterns, and returns ranked recommendations. No feature engineering, no model selection, no pipeline maintenance.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.