Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
Learn14 min read

How to Build a Recommendation Engine That Actually Works

Most recommendation tutorials stop at collaborative filtering. That is fine if every user has a long purchase history and your catalog never changes. In reality, a large percentage of ecommerce sessions come from users with thin or zero purchase history. Here are five approaches ranked, with honest trade-offs, benchmark numbers, and a practical path from basic to best.

TL;DR

  • 1Five approaches ranked from weakest to strongest: (1) rule-based/popularity, (2) collaborative filtering, (3) content-based, (4) hybrid deep learning (Netflix/Amazon), (5) graph-based models like KumoRFM that read the full user-product-category-session-review graph.
  • 2Cold start is where recommendation engines live or die. Collaborative filtering fails completely for new users with no history. Graph models use relational context (signup channel, browsing patterns, demographic connections) to recommend from day one.
  • 3Netflix runs 200+ recommendation algorithms in a hybrid system. Amazon Personalize is the managed service version. Both still operate on flat interaction tables. Graph-based models read the full relational structure that flat matrices miss.
  • 4On the SAP SALT enterprise benchmark, KumoRFM scores 91% accuracy vs 75% for PhD data scientists with manual features and 63% for LLM+AutoML. On RelBench, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM.
  • 5You do not need to build a custom ML pipeline. Connect your relational tables, write a PQL query, and KumoRFM discovers which signals predict what each user will buy next.

Every ecommerce team eventually asks the same question: how do we recommend products that people actually want? The internet is full of tutorials that walk you through building a basic collaborative filter in Python. Those tutorials are not wrong. They are just incomplete. They work on clean demo datasets where every user has rated 50+ movies. They do not work when 60% of your traffic is new or anonymous, your catalog changes weekly, and your CEO wants to know why revenue per session has not moved.

This is a practitioner's guide. Five approaches, ranked honestly, with the trade-offs that the tutorials skip.

Five approaches to recommendations, ranked

Not all recommendation approaches are equal. Here they are, from simplest to most capable, with a direct comparison across the dimensions that matter in production.

recommendation_approaches_comparison

dimensionRule-Based / PopularityCollaborative FilteringContent-BasedHybrid Deep LearningGraph-Based (KumoRFM)
How it worksShow bestsellers or hand-picked itemsFind users with similar purchase/rating historyMatch item attributes to user preference profilesNeural networks combining multiple signal typesReads the full relational graph: users, items, categories, sessions, reviews, merchants
Cold start (new users)Decent - everyone sees the same popular itemsFails completely - no history means no similar usersPoor - needs user preference data to match againstPartial - can use contextual signals but limitedStrong - uses relational context (signup channel, browsing, demographics) from day one
Cold start (new items)Fails - new items have no popularity dataFails - new items have no interaction dataGood - can use item attributes immediatelyGood - combines content and contextual signalsStrong - reads category, brand, merchant, and catalog graph connections
Sparse data handlingNot affected (no user data used)Poor - needs dense interaction overlapModerate - depends on attribute qualityModerate - helps but still needs training dataStrong - propagates signal through relational connections to fill gaps
Personalization depthNone - same recs for everyoneModerate - based on purchase overlapModerate - based on attribute matchingHigh - multi-signal personalizationHighest - captures user context, session behavior, social graph, and full item relationships
Engineering effortLow - rules and SQL queriesModerate - matrix factorization, nearest neighborsModerate - feature extraction, similarity scoringVery high - custom neural architectures, large ML teamsLow with foundation model - connect tables, write PQL query
Who uses thisSmall retailers, early-stage startupsMid-size ecommerce, media platformsContent platforms, news sitesNetflix, Amazon, Spotify (200+ person ML teams)Enterprise teams using KumoRFM
Accuracy ceilingLow - no personalizationModerate - limited by interaction densityModerate - limited by attribute qualityHigh - but requires massive engineering investmentHighest - reads patterns across all connected data

Five recommendation approaches compared across 8 dimensions. Each level adds capability but also adds complexity, except graph-based foundation models which add capability while reducing engineering effort.

Approach 1: Rule-based and popularity models

This is where most teams start, and it is not a bad starting point. Show the top-selling items. Show items frequently bought together. Show "customers who viewed X also viewed Y" based on co-occurrence counts.

Popularity models are easy to build, easy to explain, and they establish a baseline. The problem is the ceiling. Everyone sees the same recommendations. There is no personalization. Revenue lift from popularity-based recs is modest compared to personalized approaches. That is real money, but it is a fraction of what personalized recs can deliver.

  • Best for: Small retailers and early-stage startups that need a baseline with zero ML investment.
  • Watch out for: No personalization means every user sees the same recs. Revenue lift is modest compared to personalized approaches.

Approach 2: Collaborative filtering

Collaborative filtering is the textbook answer. Users who bought similar items in the past will buy similar items in the future. Find the nearest neighbors in the user-item interaction matrix, borrow their preferences, and recommend accordingly.

It works well in one specific scenario: when you have dense interaction data. Netflix circa 2006, when every user had rated dozens of movies, was the ideal case. The Netflix Prize competition (2006-2009) made collaborative filtering famous precisely because the dataset was unusually dense.

In practice, most ecommerce interaction matrices are less than 1% dense. A user has bought 3 items out of a 100,000-item catalog. There is not enough overlap with other users to find meaningful neighbors. And for new users with zero purchases, collaborative filtering returns nothing.

  • Best for: Mid-size ecommerce and media platforms with dense interaction history (users have rated or purchased dozens of items).
  • Watch out for: Fails completely on cold start. If 40-60% of sessions are new or low-activity users, your largest audience segment gets your worst recommendations.

Approach 3: Content-based filtering

Content-based filtering matches item attributes to user preferences. If a user bought running shoes, recommend other running shoes based on attributes like brand, price range, cushioning type, and color. It does not need other users' data, so it avoids the worst of the cold-start problem for new items.

The limitation is that it only recommends more of the same. A user who bought running shoes gets more running shoes, never running socks, water bottles, or GPS watches. Content-based filtering cannot discover cross-category patterns because it only reads item attributes, not the broader context of how products relate to each other through user behavior.

  • Best for: Content platforms and news sites where item attributes are rich and well-structured. Handles new-item cold start well.
  • Watch out for: Only recommends more of the same. No cross-category discovery. Still fails on new-user cold start.

Approach 4: Hybrid deep learning (what Netflix and Amazon actually use)

Netflix does not run one recommendation algorithm. It runs over 200, each specialized for a different signal type, and a meta-algorithm selects which recommendations to show each user. Each row on your Netflix home screen is generated by a different model. The system blends collaborative filtering, content embeddings, sequence models (what you watched recently and in what order), contextual bandits (time of day, device type), and more.

Amazon Personalize is the managed-service version of this approach. It offers real-time personalization through APIs, handling some of the infrastructure complexity, but you still need to structure your data correctly, manage campaigns, and tune recipes.

Hybrid systems deliver the best results of the traditional approaches. The catch: Netflix has a 200+ person ML team. Building and maintaining a hybrid recommendation system at that level requires dedicated infrastructure engineers, ML researchers, and years of iteration. Most ecommerce companies do not have those resources.

  • Best for: Companies with 20-200+ person ML teams and multi-year timelines. Netflix, Amazon, and Spotify operate at this level.
  • Watch out for: Massive engineering investment. Still operates on flat interaction tables, limiting relational signal capture. Cold start is only partially addressed.

Approach 5: Graph-based recommendations (KumoRFM)

Here is where the step change happens. Every approach above operates on some subset of your data: the interaction matrix, the item attribute table, the session log. A graph-based model reads all of it at once, as a connected graph.

Think about what your data actually looks like in your warehouse. You have a users table, an orders table, a products table, a categories table, a sessions table, a reviews table, and a merchants table. These tables are connected by foreign keys. User 1234 placed Order 5678, which contained Product 9012, which belongs to Category "Electronics," which was sold by Merchant "TechStore," and User 1234 wrote Review 3456 for Product 9012 during Session 7890 on a mobile device at 9 PM.

That web of connections is a graph. And it contains far more predictive signal than any single flat table.

  • Best for: Any team that wants near Netflix-level accuracy without building a 200-person ML team. Handles cold start, sparse data, and long-tail items in one pass.
  • Watch out for: Requires relational data in a data warehouse (tables connected by foreign keys). The more tables you connect, the better the results.

Why cold start is the real test

Every recommendation approach looks decent when your user has a long purchase history. The real test is what happens when they do not. Cold start is not an edge case. For most growing ecommerce companies, new and low-activity users are the majority of traffic.

Here is how each approach handles a new user who just signed up and has not bought anything:

  1. Rule-based/popularity: Shows bestsellers. No personalization. Works as a fallback but leaves money on the table.
  2. Collaborative filtering: Returns nothing useful. The user has no interactions, so there are no similar users to borrow from. Most systems fall back to popularity, which defeats the purpose.
  3. Content-based: Cannot work. No user preference profile exists yet. Needs at least one interaction to build a profile.
  4. Hybrid deep learning: Can use contextual signals (device type, time of day, referral source) for a partial cold-start solution. Better than collaborative filtering alone, but still limited without interaction data.
  5. Graph-based (KumoRFM): Reads the relational context that exists even for new users. The user signed up from Austin, Texas through a Google Shopping ad for winter jackets, on an iPhone, at 8 PM. The graph connects this user to geographic, channel, device, and temporal patterns from millions of other users. The model recommends relevant products from the first session, no purchase history required.

This is not a theoretical advantage. On benchmarks with cold-start scenarios, graph models consistently outperform collaborative filtering on new-user recommendation accuracy in published benchmarks. For a growing ecommerce business, that gap translates directly to conversion rate on your largest audience segment.

The benchmark evidence

Talk is cheap. Here are the numbers from third-party benchmarks on real relational data.

sap_salt_benchmark_recommendations

approachaccuracywhat_it_means
LLM + AutoML63%Language model generates features, AutoML selects model
PhD Data Scientist + XGBoost75%Expert spends weeks hand-crafting features, tunes XGBoost
KumoRFM (zero-shot)91%No feature engineering, no training, reads relational tables directly

SAP SALT benchmark on enterprise relational data. KumoRFM outperforms expert-tuned models by 16 percentage points. The gap comes from relational patterns that flat feature tables structurally cannot contain.

relbench_benchmark_recommendations

approachAUROCfeature_engineering_time
LightGBM + manual features62.4412.3 hours per task
KumoRFM zero-shot76.71~1 second
KumoRFM fine-tuned81.14Minutes

RelBench benchmark across 7 databases, 30 prediction tasks. KumoRFM zero-shot outperforms manually engineered LightGBM by 14+ AUROC points. Fine-tuning pushes the gap to nearly 19 points.

What this looks like in practice

Traditional recommendation pipelines require months of work: build an ETL pipeline, engineer features from multiple tables, train and tune models, deploy serving infrastructure, build A/B testing, and maintain everything as your catalog and user base change.

With KumoRFM, you connect your relational tables and write a PQL (Predictive Query Language) query. The model reads the full relational graph and predicts which items each user will interact with next.

Traditional recommendation pipeline

  • Build ETL pipeline to join user, product, order, session, and review tables (2-4 weeks)
  • Engineer features: user purchase history, item popularity, co-occurrence counts, session recency (4-8 weeks)
  • Train collaborative filtering + content-based models separately
  • Build hybrid blending layer to combine model outputs
  • Deploy real-time serving infrastructure with low-latency requirements
  • Rebuild pipeline every time catalog or schema changes
  • Cold start users get popularity fallback (no personalization)

KumoRFM recommendation pipeline

  • Connect to data warehouse: users, orders, products, categories, sessions, reviews
  • Write PQL: PREDICT product_id FOR EACH users.user_id
  • Model reads all tables and discovers predictive patterns automatically
  • Handles warm users, cold-start users, and new items in one pass
  • No feature engineering, no model blending, no graph construction
  • Schema changes handled automatically by the foundation model
  • One platform, one query, all user segments

PQL Query

PREDICT product_id
FOR EACH users.user_id
WHERE orders.order_date > '2026-01-01'

One PQL query replaces the full recommendation pipeline: feature engineering across 6+ tables, model training, cold-start handling, and scoring. KumoRFM reads raw relational tables and discovers which products each user is most likely to buy next.

Output

user_idtop_recommendationconfidencesignal_source
USR-1001 (active buyer)Wireless Earbuds Pro0.89Purchase sequence + category affinity + session recency
USR-1002 (new user, 0 purchases)Running Shoes X10.74Signup channel + geo + browse category + similar user graph
USR-1003 (dormant 6 months)Smart Watch V30.68Historical preferences + new product-category graph connections
USR-1004 (1 purchase only)Phone Case Ultra0.81Product-to-product graph + category co-purchase patterns

Why graph models win on sparse and cold-start data

The core insight is simple. A flat interaction matrix is a lossy compression of your actual data. Your database has users connected to orders connected to products connected to categories connected to brands connected to reviews connected to sessions. When you flatten that into a user-item matrix, you throw away most of the signal.

A graph model reads the original structure. Even a user with zero purchases is connected to the graph through their signup attributes, browsing behavior, geographic location, device type, referral channel, and any other data you capture. Every one of those connections provides signal that the model can propagate through the graph to generate recommendations.

This is why graph models show their largest accuracy gains precisely where traditional models struggle most: sparse data and cold start. The denser your interaction data, the smaller the gap between collaborative filtering and graph models. The sparser your data, the larger the gap becomes. Since most real-world ecommerce data is sparse, the gap matters.

The Netflix/Amazon comparison, honestly

Netflix and Amazon built recommendation systems that work very well. But they did it with hundreds of ML engineers over many years. The Netflix recommendation system is not one algorithm. It is a complex ensemble of 200+ specialized models with a meta-ranking layer on top, backed by custom infrastructure for feature computation, model serving, and online experimentation.

If you have that team and that timeline, build a hybrid system. You will get excellent results.

If you do not have that team (and most companies do not), the question is: what gets you closest to Netflix-quality recommendations with the resources you actually have? A collaborative filtering model with manual feature engineering gets you part of the way. A graph-based foundation model that reads your relational data directly gets you further, faster, with a smaller team.

recommendation_approach_roi

approachteam_size_neededtime_to_productioncold_start_qualityaccuracy_ceiling
Rule-based1 engineer1-2 weeksGeneric (popularity only)Low
Collaborative filtering2-3 ML engineers2-3 monthsFails (no data = no recs)Moderate
Hybrid deep learning (Netflix-style)20-200+ ML engineers1-3 yearsPartial (contextual signals)Very high
KumoRFM1 ML engineer or analystDays to weeksStrong (relational context)Very high

Practical comparison of team investment vs. recommendation quality. KumoRFM reaches near Netflix-level accuracy with a fraction of the team size and timeline.

Getting started: a practical path

You do not need to rip out your current system. Here is the practical path from wherever you are to graph-based recommendations:

  1. If you have nothing today: Skip collaborative filtering entirely. Start with KumoRFM. Connect your product catalog, user table, and order history. Write a PQL query. You will have production-quality recommendations, including cold-start coverage, in days instead of months.
  2. If you have a collaborative filtering model: Keep it running. Add KumoRFM as a parallel system. Compare results on cold-start users first, where the gap is largest. Then expand to all users and measure the revenue lift.
  3. If you have a hybrid system: Test KumoRFM against your current system on the segments where your current system is weakest (typically cold start, sparse categories, and long-tail items). The foundation model approach often matches or beats custom hybrid systems on these segments while requiring a fraction of the maintenance.
  4. Regardless of starting point: Connect more tables over time. The more relational context KumoRFM can read (sessions, reviews, categories, merchants, inventory), the more patterns it discovers. Each additional table is a marginal improvement with zero additional feature engineering.

Frequently asked questions

How do I build a recommendation engine that actually works well?

Start by understanding which of the five approaches fits your data maturity: (1) Rule-based and popularity models require no ML but plateau quickly. (2) Collaborative filtering works when you have dense interaction history but fails on cold start. (3) Content-based filtering uses item attributes and works for new items but not new users. (4) Hybrid deep learning, the approach Netflix and Amazon use, combines multiple signals but requires large engineering teams. (5) Graph-based models like KumoRFM read the full relational graph of users, products, categories, sessions, and reviews, handling cold start and sparse data without manual feature engineering. On the RelBench benchmark, KumoRFM zero-shot achieves 76.71 AUROC vs 62.44 for LightGBM with manual features. The fastest path to production-quality recs is connecting your relational tables and writing a single PQL query.

What recommendation engines do companies like Netflix and Amazon use?

Netflix runs a hybrid system with over 200 recommendation algorithms that blend collaborative filtering, content-based signals, deep learning sequence models, and contextual bandits. Each row on the Netflix home screen is generated by a different algorithm, with a meta-algorithm selecting which rows to show each user. Amazon Personalize is the managed service version of Amazon's internal recommendation system, offering real-time personalization through APIs. Both systems still operate primarily on flat interaction tables (user-item matrices), which limits their ability to capture the full relational context. Graph-based models like KumoRFM read the complete relational structure: user-product-category-session-review-merchant, discovering patterns that flat interaction matrices miss.

How do I make recommendations for new users with no purchase history?

This is the cold start problem, and it is where most recommendation engines break down. Collaborative filtering fails completely because it needs historical interactions to find similar users. Content-based filtering needs user preference data it does not have. The standard workarounds are popularity-based fallbacks and onboarding questionnaires, both of which deliver generic results. Graph-based models solve cold start differently. Even a new user with zero purchase history exists in a relational context: they signed up from a specific location, they arrived through a specific channel, they browsed specific categories, their demographic profile connects them to user segments. A graph model reads these relational signals to recommend from day one, without needing any purchase history. On benchmarks with cold-start scenarios, graph models consistently outperform collaborative filtering on new-user recommendation accuracy in published benchmarks.

What ML tools do ecommerce companies use for product recommendations?

The most common ML tools for ecommerce recommendations in 2026 are: Amazon Personalize (managed service, pay-per-API-call, good for teams without ML expertise), Google Recommendations AI (part of Vertex AI, integrates with Google Cloud retail), Algolia Recommend (search-first approach with ML ranking), and custom systems built on TensorFlow or PyTorch (used by large retailers with dedicated ML teams). The emerging category is relational foundation models like KumoRFM, which skip the feature engineering step entirely. Instead of building custom pipelines for each recommendation task, you connect your product catalog, user activity, and transaction tables, then write a predictive query. KumoRFM achieves 91% accuracy on the SAP SALT enterprise benchmark vs 75% for PhD data scientists with manual feature engineering.

What is the cold start problem in recommendation systems?

The cold start problem occurs when a recommendation engine cannot make accurate predictions because it lacks sufficient data about a user or item. There are two variants: user cold start (a new user with no interaction history) and item cold start (a new product with no ratings or purchases). Collaborative filtering is hit hardest because it relies entirely on the interaction matrix. If a user has no interactions, they have no neighbors to borrow preferences from. Content-based approaches handle item cold start better (they can use item attributes) but still struggle with user cold start. Graph-based approaches handle both variants because they use relational context beyond the interaction matrix itself.

Is collaborative filtering still worth using in 2026?

Collaborative filtering is still a reasonable starting point if you have dense interaction data and minimal cold start. Matrix factorization and nearest-neighbor methods are well-understood, fast to train, and easy to explain. But they have hard limits: they fail on cold start, they cannot incorporate side information (item categories, user demographics, session context) without bolting on separate systems, and they scale poorly as your catalog grows. Most teams that start with collaborative filtering eventually move to hybrid or graph-based approaches as their product catalog and user base grow. The ROI of switching is highest when cold start is a significant fraction of your traffic, which it is for most growing ecommerce companies.

How does a graph-based recommendation engine handle sparse data?

In a traditional recommendation system, sparse data means the user-item interaction matrix is mostly empty (a typical ecommerce site has less than 1% density). Collaborative filtering needs dense overlap between users to find meaningful neighbors, so sparsity kills accuracy. A graph-based model sidesteps this by reading the full relational structure, not just the interaction matrix. Even when a user has bought only one product, the graph connects that product to its category, brand, other buyers, related sessions, and review patterns. The model propagates information through these connections to fill in the gaps. This is why graph models show the largest accuracy gains on sparse and cold-start scenarios.

What is the difference between collaborative filtering and graph-based recommendations?

Collaborative filtering operates on a single user-item interaction matrix: who bought what, who rated what. It finds similar users or similar items based on overlapping interactions. Graph-based recommendations operate on the full relational graph: users, items, categories, brands, sessions, reviews, merchants, and all the connections between them. The analogy is the difference between knowing which employees talk to each other (collaborative filtering) vs knowing what department they are in, what projects they work on, who their manager is, and what office they sit in (graph-based). Graph models see the complete organizational picture. They discover patterns like 'users who browse category X in evening sessions from mobile devices tend to buy items in category Y within 48 hours' that collaborative filtering structurally cannot express.

See it in action

KumoRFM delivers predictions on relational data in seconds. No feature engineering, no ML pipelines. Try it free.