Node2Vec is an unsupervised graph embedding method that learns dense vector representations for nodes. It generates biased random walks on the graph, then trains a skip-gram model (similar to Word2Vec) to predict which nodes co-occur in walks. The result is an embedding where nodes with similar structural roles or community membership are close together. Introduced by Grover and Leskovec in 2016.

What do the p and q parameters control?

Parameter p controls the likelihood of returning to the previous node (backtracking). Low p encourages revisiting, producing BFS-like walks that stay local. Parameter q controls the likelihood of exploring outward to new neighborhoods. Low q encourages exploration, producing DFS-like walks that venture further. p=1, q=1 gives uniform random walks. Common starting point: p=1, q=0.5 for community-focused embeddings.

When should I use Node2Vec vs. a GNN?

Use Node2Vec when: you have no node features (topology only), you want unsupervised embeddings for exploration, or you need a quick baseline. Use GNNs when: you have node features, you have labeled training data, you need inductive learning (new nodes at inference), or you need state-of-the-art accuracy. GNNs generally outperform Node2Vec on supervised enterprise tasks.

Is Node2Vec inductive or transductive?

Node2Vec is transductive. It learns a fixed embedding table with one row per node. New nodes that appear after training have no embedding. You must retrain on the full graph to include new nodes. For production systems where new entities arrive continuously, inductive GNN methods (GraphSAGE) are preferred.

How does Node2Vec scale to large enterprise graphs?

Node2Vec scales well because random walk generation is embarrassingly parallel and the skip-gram model trains with negative sampling (O(1) per training sample). It can handle graphs with millions of nodes. The main bottleneck is storing the embedding table: 1M nodes x 128 dimensions x 4 bytes = 512 MB. For 100M+ node graphs, approximate methods or GNN alternatives are needed.

Node2Vec: Node Embeddings from Biased Random Walks | Kumo.ai

Node2Vec is an unsupervised graph embedding method that learns dense vector representations for nodes by performing biased random walks on the graph and training a skip-gram model to predict co-occurring nodes within each walk. Introduced by Grover and Leskovec in 2016, it extends DeepWalk with two hyperparameters (p, q) that control whether walks explore broadly (capturing structural roles) or stay local (capturing community membership). Node2Vec embeddings are useful for clustering, visualization, and as pre-trained features for downstream classifiers.

Why it matters for enterprise data

Node2Vec provides a fast, unsupervised way to extract graph structure into vectors that standard ML models can consume. For enterprise teams without deep GNN expertise, Node2Vec offers a path from relational graph to predictions:

Build graph from relational database (foreign keys = edges)
Run Node2Vec to get embeddings per entity
Feed embeddings as features to XGBoost, LightGBM, or logistic regression
Predict churn, fraud, or recommendations using standard ML pipelines

This “graph feature engineering” approach captures relational patterns without requiring a full GNN pipeline. It is a practical first step for teams exploring graph ML.

How Node2Vec works

Step 1: Generate biased random walks

Starting from every node, generate multiple walks of fixed length (e.g., 20 steps). The bias parameters p and q determine transition probabilities:

p (return parameter): Controls probability of returning to the previous node. Low p = more likely to backtrack = BFS-like local exploration.
q (in-out parameter): Controls probability of moving to nodes further from the start. Low q = more likely to explore outward = DFS-like exploration.

Step 2: Train skip-gram model

Treat each walk as a “sentence” and each node as a “word.” Train a skip-gram model to predict context nodes (within a window) from a center node. Nodes that frequently co-occur in walks get similar embeddings.

node2vec_pyg.py

import torch
from torch_geometric.nn import Node2Vec

# Initialize Node2Vec
model = Node2Vec(
    data.edge_index,
    embedding_dim=128,      # 128-dim embeddings
    walk_length=20,          # 20-step walks
    context_size=10,         # skip-gram window
    walks_per_node=10,       # 10 walks per node
    num_negative_samples=1,  # 1 negative per positive
    p=1.0,                   # return parameter
    q=0.5,                   # in-out parameter
    num_nodes=data.num_nodes,
)

# Training
loader = model.loader(batch_size=128, shuffle=True, num_workers=4)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(100):
    for pos_rw, neg_rw in loader:
        loss = model.loss(pos_rw, neg_rw)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

# Get embeddings for all nodes
embeddings = model()  # shape: [num_nodes, 128]

# Use embeddings for downstream tasks
from sklearn.cluster import KMeans
clusters = KMeans(n_clusters=10).fit_predict(embeddings.detach().numpy())

PyG's Node2Vec handles walk generation and skip-gram training. The resulting embeddings can be clustered, visualized, or fed to any standard ML model.

Concrete example: customer similarity from transaction graphs

A bank runs Node2Vec on a customer-merchant bipartite graph:

500,000 customer nodes, 50,000 merchant nodes
10M transaction edges
p=1, q=0.5 (moderate outward exploration)
128-dimensional embeddings, 20-step walks, 10 walks per node

After training, customers with similar spending patterns (same merchants, same categories) have similar embeddings. K-means clustering on embeddings reveals 15 natural customer segments: luxury shoppers, grocery-focused, travel-heavy, etc. These segments emerge from transaction topology alone, without any customer features or labels.

Limitations and what comes next

Transductive only: Fixed embedding table. New customers get no embedding until the entire model is retrained. Production systems require inductive approaches.
No feature usage: Ignores node attributes. A customer's age, income, and account type are invisible to Node2Vec. GNNs incorporate these features directly.
p, q sensitivity: Optimal p and q values vary by task and graph. Grid search over p, q is necessary, adding to development time.

For supervised enterprise tasks, GNNs (GCNConv, GAT, SAGEConv) outperform Node2Vec because they use features and are optimized end-to-end for the prediction target. KumoRFM takes this further with a pre-trained graph transformer that generalizes across relational databases.

Key Takeaways

1Node2Vec learns unsupervised embeddings via biased random walks + skip-gram. Parameters p (return) and q (explore) control walk behavior. p=1, q=0.5 is a strong default.
2Best use case: quick graph feature engineering for standard ML models. Build graph, run Node2Vec, feed 128-dim embeddings to XGBoost. Captures graph structure without GNN expertise.
3Transductive limitation: fixed embedding table, no support for new nodes. Not suitable for production systems with continuously arriving entities.
4Node2Vec ignores node features. For enterprise data with rich attributes (amounts, dates, categories), GNNs that incorporate features significantly outperform Node2Vec.
5In PyG: Node2Vec class handles everything. Set embedding_dim, walk_length, p, q. Train with model.loader(). Retrieve embeddings with model().

Node2Vec: Node Embeddings Learned from Biased Random Walks

Why it matters for enterprise data

How Node2Vec works

Step 1: Generate biased random walks

Step 2: Train skip-gram model

Concrete example: customer similarity from transaction graphs

Limitations and what comes next

Frequently asked questions

What is Node2Vec?

What do the p and q parameters control?

When should I use Node2Vec vs. a GNN?

Is Node2Vec inductive or transductive?

How does Node2Vec scale to large enterprise graphs?

Related

From the Kumo Learn Hub

Learn more about graph ML