Kumo Wins Fast Company's Next Big Things in Tech Award for Advancing AI on Enterprise Data

Learn More
News/ Industry Insights /

The Next Frontier in Predictive Modeling - Graph Transformers and the New Shape of Machine Learning

October 30, 2025
Zack Drach

Zack Drach

A quiet paradigm shift is underway in machine learning. For most of the last decade, progress came from building bigger or more efficient models: gradient-boosted trees, deep neural networks, and transformer architectures for language and vision. The focus was on scale and speed.

Now, the shift is conceptual. Instead of seeing data as rows in a table, models are beginning to learn directly from relationships — the way entities interact, influence, and depend on each other over time. This change is redefining predictive modeling itself.

Graph Neural Networks (GNNs) and, more recently, Graph Transformers, are leading this shift. They make it possible to capture relational and temporal structure in complex datasets, from supply chains and energy grids to molecules and experiments.

Kumo’s platform operationalizes this idea, bringing these research-grade architectures into the hands of data scientists in a form that fits naturally alongside Python’s open-source ecosystem.

From Tables to Networks

Traditional predictive workflows assume that data points are independent. Customers, transactions, products, or time steps all treated as separate rows. That assumption simplifies modeling, but it erases the connected nature of real systems.

In practice, few datasets are truly independent.

  • Demand for one product often depends on related products, suppliers, and promotions.
  • Sensor readings are correlated across networks of equipment.
  • Molecular interactions depend on shared structural or biochemical properties.

Graph-based learning changes this representation. Each entity becomes a node, each relationship an edge. The model learns not only from what each node looks like, but from how it connects to others. Context becomes part of the data.

Image: Relational tables forming a graph and time-series per node

Forecasting with Structure

Forecasting is one of the clearest examples of how this shift plays out.

Classic approaches (ARIMA, Prophet, LightGBM) are designed to forecast each time series independently. That’s fine for isolated signals, but in many systems, time series are coupled.

Sales of related SKUs rise and fall together. A delay in one supplier can ripple downstream through an entire logistics network. Temperature sensors on connected equipment display shared patterns of drift.

Graph Transformers treat these correlations as part of the model itself. Each entity has a time series and connections to other entities. The model learns from both, using temporal encoders to capture local history and graph encoders to share information across related entities.

Image: Graph + past-sequence encoder architecture

This combination allows for richer, context-aware forecasts - models that understand how systems move together, not just how individual signals evolve in isolation.

Image: Forecast comparison Generative Graph Transformer outputs retain high-frequency detail compared with Prophet and baseline models. To learn more, check out the full blog post here.

Why Graph Transformers Work So Well

Graph Neural Networks already made it possible to model structured relationships, but their reach is often limited to local neighborhoods. As graphs grow deeper, signals tend to blur, a challenge known as over-smoothing.

Graph Transformers solve this through attention-based mixing, allowing information to travel across distant nodes while maintaining local specificity.

Image: GNN message passing vs. global attention

The result is an architecture that can model both fine-grained and system-wide dynamics in a single framework, effectively combining the reasoning strength of graphs with the representational power of transformers.

A Declarative Way to Work

Building these models from scratch used to take thousands of lines of code: managing joins, temporal windows, leakage control, negative sampling, and MLOps. The Kumo SDK replaces that boilerplate with a few simple, declarative steps.

Here’s an example drawn from a real biotech workflow: predicting protein–ligand interactions using a knowledge graph of five connected tables.

python
import kumoai as kumo

# Initialize the Kumo SDK
kumo.init(...)
connector = kumo.SnowflakeConnector(...)

# Create Kumo tables
proteins = kumo.Table.from_source_table(
    source_table=connector.table("PROTEINS"),
    primary_key="PROTEIN_ID"
).infer_metadata()

ligands = kumo.Table.from_source_table(
    source_table=connector.table("LIGANDS"),
    primary_key="LIGAND_ID"
).infer_metadata()

interactions = kumo.Table.from_source_table(
    source_table=connector.table("PROTEIN_LIGAND_INTERACTIONS"),
    primary_key="BIOACTIVITY_ID"
).infer_metadata()

protein_interactions = kumo.Table.from_source_table(
    source_table=connector.table("PROTEIN_PROTEIN_INTERACTIONS"),
    primary_key="PPI_ID"
).infer_metadata()

ligand_similarities = kumo.Table.from_source_table(
    source_table=connector.table("LIGAND_LIGAND_SIMILARITIES"),
).infer_metadata()

# Create a graph connecting the tables
bioactivity_graph = kumo.Graph(
    tables={
        "proteins": proteins,
        "ligands": ligands,
        "interactions": interactions,
        "protein_interactions": protein_interactions,
        "ligand_similarities": ligand_similarities
    },
    edges=[
        dict(src_table="interactions", fkey="PROTEIN_ID", dst_table="proteins"),
        dict(src_table="interactions", fkey="LIGAND_ID", dst_table="ligands"),
        dict(src_table="protein_interactions", fkey="PROTEIN1_ID", dst_table="proteins"),
        dict(src_table="ligand_similarities", fkey="LIGAND1_ID", dst_table="ligands"),
        dict(src_table="ligand_similarities", fkey="LIGAND2_ID", dst_table="ligands")
    ]
)

link_prediction_pquery = kumo.PredictiveQuery(
    graph=bioactivity_graph,
    query=(
        "PREDICT LIST_DISTINCT(interactions.LIGAND_ID)\n"
        "RANK TOP 10\n"
        "FOR EACH proteins.PROTEIN_ID\n"
    )
)

# Train the link prediction model
link_trainer = kumo.Trainer(link_prediction_pquery.suggest_model_plan())
link_training_job = link_trainer.fit(
    graph=bioactivity_graph,
    train_table=
link_prediction_pquery.generate_training_table(non_blocking=True)
)

That’s the full pipeline in about 60 lines: data connection, feature generation, model selection, training. The same task in CatBoost or even PyTorch would require thousands: joining/aggregating data into a single flat training table (dropping information in the process), and then engineering complex features in order to coax the model in to picking up the signal that was lost.

The graph transformer automatically learns features that would otherwise be hand-crafted:

  • Protein–protein and ligand–ligand relationships through multi-hop connections.
  • Similarity-based effects that emerge from shared molecular structure.
  • Temporal patterns that evolve across experiments without manual windowing.

The result is not just fewer lines of code, but a clearer way to express what the model should learn.

To learn more, you can check out the documentation for the Kumo platform as well as Kumo’s python SDK.

Control Where It Matters: Advanced Options for Power Users

While Kumo handles much of the engineering overhead, it’s not a black box. The best results still depend on domain expertise, particularly in how the relational graph is constructed and how the model explores it.

Data scientists and subject-matter experts define the network of learning: which entities are included, which relationships connect them, and how far information should travel. This design shapes what the model can see and what it can infer.

Kumo’s Model Planner exposes the same level of configurability found in open-source frameworks like PyTorch Geometric (github.com/pyg-team/pytorch_geometric), which Kumo’s researchers helped author. Every layer, aggregation, and sampling step can be adjusted with defaults that align to proven research practices.

For example, controlling neighborhood sampling through the num_neighbors parameter defines how many connected nodes each entity attends to during training. Increasing it broadens context, allowing the model to capture long-range dependencies; constraining it enforces locality and reduces noise.

python
model_plan = pquery.suggest_model_plan()

model_plan.override(
    neighbor_sampling=dict(
        num_neighbors=[
            {"hop": 1, "default": 10},
            {"hop": 2, "default": 5},
            {"hop": 3, "default": 2},
        ]
    ),
    model_architecture=dict(
        graph_transformer=dict(
            channels=[256],
            num_heads=[8],
            num_layers=[3],
        )
    ),
)

Neighborhood size, message-passing depth, and attention heads together control the effective “field of view” for each node, a powerful lever for domain experts who know which connections carry real signal.

Other options include:

  • Edge-type weighting, letting certain relationships (e.g., ligand–ligand vs. assay–ligand) carry more influence.
  • Custom aggregation functions, such as mean, sum, or attention-based pooling.
  • Feature selection and masking, useful when some columns represent experimental noise.
  • Embedding sharing and reuse across tasks, allowing one trained graph to seed another.

This interface is what makes declarative relational modeling practical. It removes repetitive work without removing the role of expertise.

Looking Ahead

Graph Transformers represent more than another incremental model improvement. They reflect a broader rethinking of how predictive systems understand the world, as networks of influence rather than lists of records. In short, machine learning is starting to model the world as it actually is: connected.

Join our community on Discord.

Connect with developers and data professionals, share ideas, get support, and stay informed about product updates, events, and best practices.

Join the discussion