Build agents that predict needs, manage risks, and forecast with KumoRFM MCP Server.

Learn More
Customer Success Story:

Reddit Boosts Ad Performance with Graph Neural Networks and Kumo

September 28, 2025

The Challenge:

Reddit’s Ads Machine Learning team sought to improve the performance of their predicted Click-Through Rate (pCTR) model—the backbone of Reddit’s ad ranking system. With billions of ad impressions to manage and a highly connected forum-style network made up of users, posts, advertisers, and subreddits, the team knew that tapping into the relational structure of their data could unlock significant gains in model accuracy, user experience, and revenue.

The Solution:

Inspired by industry leaders who had achieved massive ROI from graph-based modeling, Reddit partnered with Kumo to integrate Graph Embeddings and Graph Neural Networks (GNNs) into their pCTR model.

With Kumo, the Reddit team was able to:

  • Construct a massive relational graph covering their entire active user base, leveraging entities and events like clicks, upvotes, video views, and subreddit memberships.
  • Train a GNN model using Kumo, which efficiently generated embeddings by aggregating signals from billions of nodes and edges—something that would have taken months to build from scratch with open-source tools.
  • Merge the resulting embeddings into their production pCTR model, alongside hundreds of other features, to power real-time ad recommendations at massive scale.

Key Highlights:

  • Huge Feature Lift: Graph embeddings delivered a much larger lift in comparison to any feature Reddit added to their pCTR model.
  • Rapid Development: Kumo’s built-in modules allowed Reddit to build and deploy a sophisticated GNN architecture in just a few days, versus the months typically required.
  • Fast Experimentation: By experimenting with various edge types, Reddit found that long click events (deeper user engagement) provided 2x better lift than short clicks during A/B testing.

Bonus:

Foundational and Widely Adopted: In the months following the pCTR launch, Reddit adopted Kumo’s graph embeddings as a foundational feature across multiple downstream models—including Shopping pCTR and Light Ranker—consistently seeing similar performance lifts with minimal additional tuning.

1. Representing Reddit as a Graph

Visualize Reddit as a network where each user, subreddit, and ad is a node, and every click, long-click, or view is an edge connecting them. This graph captures the rich context of who interacts with what.

Representing Reddit as a graph

2. Learning Compact Embeddings

At the heart of our approach is a link‐prediction task: we want our Graph Neural Network (GNN) to distinguish true, high-value interactions—like a user’s long click on an ad—from random pairings. Here’s how it works:

  1. Positive Examples: We treat each long-click event as a positive link between user and ad . These edges carry the strongest signal of genuine engagement.
  2. Negative Sampling: For every positive pair, we randomly sample a small set of "negative" ads that the user did not long-click. This keeps training efficient at Reddit scale, even with billions of nodes.
  3. GNN Message Passing: Each node v starts with raw features (e.g., user history, ad metadata). Through rounds of message passing, it aggregates information from its neighbors, producing a final embedding that summarizes its local graph context.
  4. Softmax Link Prediction: To train, we score the true long-click ad a+ against the sampled negatives . The probability of the positive link is given by a softmax over dot-products of embeddings:

Minimizing the negative log-likelihood teaches the GNN to pull true interactions together and push random ones apart.

Graph

3. Using Embeddings in Downstream Models

Embeddings are exported on a regular schedule and stored in a low-latency feature service, making them readily available for downstream use. Because the GNN is inductive, it can generate embeddings for unseen users and ads based on their local graph context, enabling consistent coverage without retraining.

These embeddings were integrated into existing prediction pipelines alongside existing features. Offline evaluations showed consistent improvements in log-loss and AUC, and online A/B tests confirmed measurable gains in key engagement and monetization metrics.

Impact

Kumo enabled Reddit to quickly integrate graph learning into their ad systems by providing advanced GNN architectures out of the box and scalable infrastructure for training on web-scale graphs. These capabilities eliminated the need to build complex models or distributed training pipelines from scratch—tasks that would have required significant engineering effort using open-source frameworks.

Reddit was able to unlock significant business impact more quickly by simplifying the deployment of graph-enhanced models into production.

Get started and try Kumo for free.