What is a random walk on a graph?

A random walk is a sequence of nodes generated by starting at a node and repeatedly moving to a uniformly random neighbor. Starting at node A, if A has neighbors {B, C, D}, the walk moves to B, C, or D each with probability 1/3. The process repeats from the new node for a fixed number of steps. The resulting sequence captures local and semi-local graph structure.

What are random walks used for in graph ML?

Four main uses: (1) Node embedding methods like DeepWalk and node2vec use random walks to define node context (nodes that co-occur in walks are similar). (2) Graph sampling methods like GraphSAINT use random walks to extract connected subgraphs. (3) Positional encodings use random walk statistics as node features for GNNs. (4) PageRank and other centrality measures are based on random walk properties.

How does a random walk capture graph structure?

Nodes that are close in the graph appear together frequently in random walks. Nodes in the same community are visited together more often because walks tend to stay within dense clusters. The return probability (how quickly a walk returns to its starting node) encodes local density. These statistical properties make random walks a powerful tool for encoding graph structure without explicit GNN computation.

What is the difference between uniform and biased random walks?

Uniform random walks choose each neighbor with equal probability. Biased random walks (like node2vec) adjust transition probabilities based on the walk history. Node2vec uses two parameters p and q to control whether the walk explores outward (BFS-like, low q) or backtracks (DFS-like, high q). Biased walks can capture either structural equivalence or community membership depending on the bias.

How do random walks apply to enterprise relational data?

Random walks on enterprise graphs traverse relational paths: customer -> order -> product -> order -> customer. The walk captures multi-hop patterns naturally. Customers who share similar products appear in each other's walks. Products with similar customer bases co-occur in walks. These co-occurrence patterns are exactly the collaborative signals that drive recommendation and risk scoring.

Random Walks on Graphs: Sequences of Random Steps on Graph Edges | Kumo.ai

A random walk on a graph is a sequence of nodes generated by starting at a node and repeatedly moving to a uniformly random neighbor, capturing the local and semi-local structure of the graph through the statistical properties of the walk. Random walks are foundational to graph ML. They power node embedding methods (DeepWalk, node2vec), subgraph sampling strategies (GraphSAINT), positional encodings for GNNs, and classical algorithms like PageRank. Despite their simplicity, random walks encode rich structural information that more complex methods build upon.

Why it matters for enterprise data

Random walks on enterprise relational graphs traverse the foreign-key structure naturally. A walk starting at a customer node might follow: customer → order → product → order → another customer. This path captures the relationship between two customers who bought the same product, exactly the collaborative filtering signal that powers recommendations.

The power of random walks is that they automatically explore multi-hop paths without explicit path enumeration. A 10-step walk from a customer traverses 5 hops of relational structure, discovering connections that would require complex SQL joins to enumerate manually.

How random walks work

random_walk_basic.py

import torch
from torch_geometric.utils import random_walk

# Generate random walks starting from nodes 0, 1, 2
start_nodes = torch.tensor([0, 1, 2])
walk_length = 10

# Each row is a walk: [start, step1, step2, ..., step10]
walks = random_walk(
    data.edge_index[0], data.edge_index[1],
    start=start_nodes,
    walk_length=walk_length,
)
# walks.shape = [3, 11] (3 walks, 11 nodes each including start)

# Example walk from customer node 0:
# [0, 42, 108, 42, 0, 15, 203, 15, 0, 42, 108]
# Node 0 (customer) -> Node 42 (order) -> Node 108 (product)
# -> back to 42 -> back to 0 -> ...
# The walk naturally traverses relational structure

Random walks traverse the relational graph automatically. The walk from a customer visits its orders and products, revealing purchasing patterns.

Concrete example: product co-occurrence from walks

Consider 100,000 random walks of length 20 starting from customer nodes in an e-commerce graph:

Products that frequently co-occur within the same walk are related (co-purchased, same category, complementary)
Customers that co-occur within walks connected through the same products are similar
The co-occurrence frequency is a measure of relatedness: products appearing in the same walk 500 times are more related than those appearing 5 times

This is the foundation of DeepWalk and node2vec: treat walks as sentences, nodes as words, and train a Word2Vec-style model on the co-occurrence patterns. The result is node embeddings where similar nodes are close in embedding space.

Types of random walks

Uniform: Each neighbor has equal transition probability. Simple, captures general structure.
Biased (node2vec): Transition probabilities depend on the previous node. Parameters p and q control exploration vs. exploitation.
Weighted: Transition probabilities proportional to edge weights. A transaction edge with amount $10,000 gets higher probability than one with $10.
Temporal: Only follow edges with timestamps after the current step's time. Respects temporal ordering in enterprise data.

Limitations and what comes next

No feature utilization: Standard random walks only use graph topology. Node features are ignored during walk generation. GNN-based approaches incorporate features directly in message passing.
Transductive: Walk-based embeddings (DeepWalk, node2vec) are transductive: they produce embeddings only for nodes present during training. New nodes require re-running walks and re-training.
Walk length vs. coverage trade-off: Short walks capture local structure but miss long-range patterns. Long walks capture more but become noisy as they wander far from the start node.

Key Takeaways

1A random walk is a sequence of nodes generated by following random edges. It captures local and semi-local graph structure through co-occurrence statistics.
2Four applications: node embeddings (DeepWalk, node2vec), graph sampling (GraphSAINT), positional encodings for GNNs, and centrality measures (PageRank).
3On enterprise graphs, walks traverse relational paths automatically: customer -> order -> product -> order -> customer. Co-occurrence reveals collaborative patterns.
4Biased walks (node2vec) control exploration: low q explores broadly (structural equivalence), high q stays local (community detection). Uniform walks are the default starting point.
5Limitation: walks ignore node features and are transductive. GNN-based approaches are generally preferred for supervised enterprise tasks because they use features and generalize inductively.

Random Walks: Sequences of Random Steps on Graph Edges