Introducing Kumo Coding Agent Skills: From Idea to Production-Ready Predictions

Build state-of-the-art predictive AI models and applications on relational data with natural language prompts. Power recommendations, score leads, prevent churn, pre-empt demand. Powered by KumoRFM-2, KumoRFM Fine-Tuning, and Kumo Coding Agent Skills.

TL;DR

What’s new: Kumo Coding Agent Skills, an open-source library that teaches any LLM coding tool (Claude Code,OpenAI Codex) to build advanced predictive models with the Kumo SDK.
What you can do: Go from a raw multi-table dataset to a working predictions using natural language prompts inside your favorite notebook/IDE (VS Code, Jupyter, Cursor) or terminal.
Why it works: Kumo’s abstractions (Graphs, Predictive Query Language, Explainability) act as guardrails for the LLM. The agent doesn’t need to invent features, decide on model architecture or algorithm. The result: terse, readable, first-try-runnable code resulting in accurate predictions.
One SDK, two modes: the same skills drive KumoRFM-2, Kumo’s foundation model, for instant, training-free predictions and KumoRFM Fine-Tuning for state-of-the-art accuracy on your data.
Available today. Skills repo · Setup guide · Free API key · KumoRFM-2 announcement

Video 1: Claude Code, with the Kumo Coding Agent Skills loaded, building an interactive, end-to-end predictive application on KumoRFM-2.

Coding agents changed software. Predictive AI is next.

Coding agents like Claude Code and OpenAI Codex have transformed software development. In seconds, you can “vibe-code” a new application that queries a database and surfaces insights.

But here’s the harder question: are coding agents actually good at helping you build better predictive models, faster, without a deep ML skill set?

Honest answer: not by default.

Point a coding agent at PyTorch or XGBoost and ask it to build a fraud model on your relational data. Our internal benchmarks tell a consistent story: hundreds of lines of code that may run, but are nearly impossible to debug, and even harder to scale to production data volumes (TB and beyond).

The problem isn’t the agent. It’s the abstractions the agent has to work with, and the platform necessary to scale.

Figure 1. Same task, two approaches. A best-in-class coding agent asked to build a credit-card fraud model on relational data. Top: with PyTorch (sprawling feature engineering, label definition, neural net layers, and evaluation code). Bottom: with the Kumo Coding Agent Skills and KumoRFM Fine-Tuning (a predictive query and a few SDK calls).

Figure 2. Lines of code authored by the same coding agent for the same fraud-prediction task: 1,014 (PyTorch) and 542 (XGBoost) versus 55 with Kumo. After debugging issues with the PyTorch and XGBoost models, Kumo also achieved a roughly 10% higher accuracy.

Kumo gives the agent the right abstractions

A useful analogy: could a coding agent build Kubernetes from scratch? Probably - but not without much strife. Should it? No. Kubernetes already provides the right abstractions for orchestration; coding agents are most powerful when they build on top of those abstractions, not when they reinvent them.

The same logic applies to predictive AI on relational data. Kumo’s abstractions provide:

Graphs: a declarative representation of your tables, foreign-key relationships, and column semantics (categorical, numeric, timestamp, text).
Predictive Query Language (PQL): a SQL-like, declarative way to express what to predict instead of how.
KumoRFM-2: a pre-trained relational foundation model that runs predictions in a single forward pass, no training required.
KumoRFM Fine-Tuning: when you need every last point of accuracy, the same abstractions tune a model on your data.
Performance and explainability: historic-data evaluation without manually crafting train/test splits, worrying about temporal leakage, plus state-of-the-art per-prediction explanations.
Enterprise-grade systems: the scalable training, serving, and graph engine, and security protocols that make all of the above production-ready.

The coding agent doesn’t need to know the low-level PyTorch APIs or how to scale an experiment to a several TB prediction train job. It needs to know how to compose Kumo’s primitives correctly, and that is exactly what the Kumo Coding Agent Skills teach it.

The result is generated code that is terse, readable, debuggable, and correct by construction. The Kumo abstractions act as guardrails that meaningfully reduce LLM hallucination and prevent entire classes of subtle ML bugs from ever appearing in your notebook.

Figure 3. Real mistakes the coding agent made on the same fraud task (wrong label granularity, label leakage, and broken class-imbalance handling) when working with raw XGBoost or PyTorch. With Kumo Coding Agent Skills and KumoRFM Fine-Tuning, none of these mistakes occurred: the abstractions and skill rules prevent them by construction.

What Kumo Coding Agent Skills actually do

First, what’s a coding agent skill?

A coding agent skill is a small playbook (a set of markdown files) that teaches an LLM coding tool (Claude Code, Codex, Cursor) how to do a specific task well. The agent loads it on demand, follows the recipe, and stays inside the guardrails the skill defines. Skills are how you turn a general-purpose coding agent into a domain expert.

The Kumo skill set

The Kumo Coding Agent Skills make Claude Code or Codex experts in the Kumo’s SDK, abstractions, and platform. They embody best practices and deep domain knowledge from the engineers and researchers who built the Kumo platform, architected the Kumo models, and developed PyTorch Geometric, the most popular Graph Transformer library.

The skills cover the full predictive-modeling lifecycle:

You want to…	The skill it loads
Scope a brand-new prediction task	scope-prediction-task
Connect to data and explore your schema	explore-data
Build a relational graph from your tables	build-graph
Get instant predictions with no training	rfm-predict
Write or fix a predictive query (PQL)	write-pql
Train a fine-tuned, state-of-the-art model	train-model
Debug a failed prediction	debug-prediction
Improve a weak model's performance	iterate-model
Decide between RFM pre-trained and fine-tuning	rfm-vs-training

What this looks like in practice

We asked Claude Code (with the Kumo Coding Agent Skills loaded) a single natural-language question inside a fresh VS Code notebook:

“Load the RelBench F1 dataset and predict which drivers will finish in the top three of the next race.”

RelBench F1 is a complex relational dataset spanning decades of Formula 1 driver statistics and race results. Modeling it traditionally requires careful, time-consuming feature engineering and architecture decisions before the first prediction.

Figure 4. The RelBench F1 dataset: nine related tables of Formula 1 history, a typical relational structure. Source: relbench.stanford.edu.

Here’s what the agent did, with no further human input:

Read the Kumo skills to determine the correct SDK calls and Predictive Query syntax.
Loaded the dataset, profiled the tables, and built a Kumo Graph from the foreign-key relationships.
Formulated a Predictive Query targeting the top-three finishers of the next race.
Iterated in its execution sandbox until every cell ran cleanly.
Wrote the final notebook cell by cell, including all imports.
Returned predictions with probabilities and explanations: Mark Webber, Lewis Hamilton, and Sebastian Vettel.

No documentation reading. No PyTorch code. No feature engineering. No train/test-split gymnastics. Just a question and a runnable notebook with a defensible prediction.

Figure 5. End-to-end notebook for the RelBench F1 task, generated by Claude Code with the Kumo Coding Agent Skills and KumoRFM-2 in VS Code. (A) the original natural-language prompt; (B) the agent’s confirmation that it ran the full workflow in its sandbox before writing cells; (C) the resulting predictive query; (D) the model invocation.

Why the agent gets it right (a note for the technical reader)

A few design choices make this combination genuinely robust rather than just demo-friendly:

Hierarchical abstractions match how the agent thinks. Kumo predictive queries separates what to predict from how to compute it. Graphs separate what your data is from how the model consumes it. LLMs are good at the what; Kumo handles the how.
Temporal leakage is structurally prevented. Predictive queries are anchored in time, and KumoRFM replays historical states up to a specified anchor timestamp. The agent cannot accidentally leak the future into training context, because the abstraction won’t let it.
Skills load on demand and are version-tracked against the SDK. The agent only pulls the context it needs for the current sub-task, keeping reasoning focused. Self-maintenance workflows (validate-freshness, verify-content) keep guidance in sync with the package.
Same SDK, two product modes. KumoRFM-2 is the pre-trained foundation model: instant predictions, no training, a strong baseline that already outperforms most best-of-breed models. KumoRFM Fine-Tuning lifts KumoRFM-2 by an additional ~6% on published benchmarks.

Figure 6: Out of the box, KumoRFM-2 outperforms the strongest prior models on published relational benchmarks by 3.3% to 8.0% with zero training. Fine-tuning on your data further improves performance by 12.7% based on the SAP SALT dataset enterprise benchmark.

The business case

If you lead any team that could benefit from predictions (data science, analytics, engineering, or even marketing, GTM, and adoption), the practical implications are:

Predictions are no longer gated by ML expertise, or even by code. Anyone fluent in working with an AI agent in natural language can now use KumoRFM-2 to create new predictions that enable better decision-making.
Time-to-first-prediction collapses from months to minutes. Manual feature engineering, model selection, and pipeline plumbing (the work that historically gated every new use case) is automated end-to-end.
Analysts and ML-curious engineers can ship models without becoming Machine Learning experts or needing to learn new syntax.
Senior data scientists get leverage. The same team runs 10x more experiments with the Kumo Coding Agent and Foundation Model and reserves fine-tuning for the cases that actually move the business metric.
Developers can embed predictions directly into applications. The Kumo SDK and Predictive queries are the same primitives a developer calls from a backend service, so personalization, fraud scoring, churn signals, and recommendations become a single API call from your app.
Production isn't a separate project. With KumoRFM-2, there's no model to train, the prediction in your notebook is the production prediction. With Fine-Tuning, the trained model ships as-is. Same SDK, same predictive query, same Kumo platform. No rewrite, no handoff.

Getting started

Kumo Coding Agent Skills are available today and work in VS Code Notebooks, Jupyter, PyCharm, Cursor, or your terminal, with Claude Code and ChatGPT Codex. We’ve published setup guides for everyone from “I’ve never used a coding agent” to “I’m a Claude Code power user.”

Three steps to your first prediction:

Install the SDK: pip install kumoai
Add the skills to your project:

1cd your-project
2git clone https://github.com/kumo-ai/kumo-coding-agent.git

Then, in your notebook, just ask:

“Load my orders and customers tables from S3 and predict which customers will churn in the next 30 days.”

The agent takes it from there.

References and further reading

KumoRFM-2 announcement (architecture, benchmarks, results): kumo.ai/company/news/kumorfm-2
KumoRFM-2 white paper: https://kumo.ai/kumoRFM-2-scaling-foundation-models-for-relational-learning.pdf
Kumo Coding Agent Skills repo (open source, MIT): github.com/kumo-ai/kumo-coding-agent
Notebook & IDE setup guide: kumo.ai/docs/rfm/setup/jupyter-vscode
KumoRFM platform overview: kumo.ai/docs/rfm/overview
Predictive Query paper: arxiv.org/abs/2602.09572

Kumo's abstractions, paired with modern coding agents, give data scientists, developers, and anyone building with agents real superpowers: better models, in production, fast.