Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Guide8 min read

Node2Vec: Node Embeddings Learned from Biased Random Walks

Node2Vec generates biased random walks on a graph and trains a skip-gram model to learn node embeddings where structurally similar nodes are close in vector space. It is the most popular unsupervised graph embedding method.

PyTorch Geometric

TL;DR

  • 1Node2Vec learns unsupervised node embeddings by generating biased random walks, then training a Word2Vec-style skip-gram model on the walk sequences.
  • 2Parameters p and q control walk behavior: p controls backtracking probability (BFS-like), q controls outward exploration (DFS-like). p=1, q=0.5 is a common starting point.
  • 3Transductive: produces fixed embeddings for a fixed node set. New nodes require retraining. For production systems with new entities, use inductive GNNs instead.
  • 4Best for unsupervised tasks (visualization, clustering, quick baselines) on topology-only graphs. GNNs outperform Node2Vec on supervised tasks with node features.
  • 5In PyG: Node2Vec class handles walk generation, skip-gram training, and embedding lookup. Train with model.loader() and retrieve embeddings with model().

Node2Vec is an unsupervised graph embedding method that learns dense vector representations for nodes by performing biased random walks on the graph and training a skip-gram model to predict co-occurring nodes within each walk. Introduced by Grover and Leskovec in 2016, it extends DeepWalk with two hyperparameters (p, q) that control whether walks explore broadly (capturing structural roles) or stay local (capturing community membership). Node2Vec embeddings are useful for clustering, visualization, and as pre-trained features for downstream classifiers.

Why it matters for enterprise data

Node2Vec provides a fast, unsupervised way to extract graph structure into vectors that standard ML models can consume. For enterprise teams without deep GNN expertise, Node2Vec offers a path from relational graph to predictions:

  1. Build graph from relational database (foreign keys = edges)
  2. Run Node2Vec to get embeddings per entity
  3. Feed embeddings as features to XGBoost, LightGBM, or logistic regression
  4. Predict churn, fraud, or recommendations using standard ML pipelines

This “graph feature engineering” approach captures relational patterns without requiring a full GNN pipeline. It is a practical first step for teams exploring graph ML.

How Node2Vec works

Step 1: Generate biased random walks

Starting from every node, generate multiple walks of fixed length (e.g., 20 steps). The bias parameters p and q determine transition probabilities:

  • p (return parameter): Controls probability of returning to the previous node. Low p = more likely to backtrack = BFS-like local exploration.
  • q (in-out parameter): Controls probability of moving to nodes further from the start. Low q = more likely to explore outward = DFS-like exploration.

Step 2: Train skip-gram model

Treat each walk as a “sentence” and each node as a “word.” Train a skip-gram model to predict context nodes (within a window) from a center node. Nodes that frequently co-occur in walks get similar embeddings.

node2vec_pyg.py
import torch
from torch_geometric.nn import Node2Vec

# Initialize Node2Vec
model = Node2Vec(
    data.edge_index,
    embedding_dim=128,      # 128-dim embeddings
    walk_length=20,          # 20-step walks
    context_size=10,         # skip-gram window
    walks_per_node=10,       # 10 walks per node
    num_negative_samples=1,  # 1 negative per positive
    p=1.0,                   # return parameter
    q=0.5,                   # in-out parameter
    num_nodes=data.num_nodes,
)

# Training
loader = model.loader(batch_size=128, shuffle=True, num_workers=4)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(100):
    for pos_rw, neg_rw in loader:
        loss = model.loss(pos_rw, neg_rw)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

# Get embeddings for all nodes
embeddings = model()  # shape: [num_nodes, 128]

# Use embeddings for downstream tasks
from sklearn.cluster import KMeans
clusters = KMeans(n_clusters=10).fit_predict(embeddings.detach().numpy())

PyG's Node2Vec handles walk generation and skip-gram training. The resulting embeddings can be clustered, visualized, or fed to any standard ML model.

Concrete example: customer similarity from transaction graphs

A bank runs Node2Vec on a customer-merchant bipartite graph:

  • 500,000 customer nodes, 50,000 merchant nodes
  • 10M transaction edges
  • p=1, q=0.5 (moderate outward exploration)
  • 128-dimensional embeddings, 20-step walks, 10 walks per node

After training, customers with similar spending patterns (same merchants, same categories) have similar embeddings. K-means clustering on embeddings reveals 15 natural customer segments: luxury shoppers, grocery-focused, travel-heavy, etc. These segments emerge from transaction topology alone, without any customer features or labels.

Limitations and what comes next

  1. Transductive only: Fixed embedding table. New customers get no embedding until the entire model is retrained. Production systems require inductive approaches.
  2. No feature usage: Ignores node attributes. A customer's age, income, and account type are invisible to Node2Vec. GNNs incorporate these features directly.
  3. p, q sensitivity: Optimal p and q values vary by task and graph. Grid search over p, q is necessary, adding to development time.

For supervised enterprise tasks, GNNs (GCNConv, GAT, SAGEConv) outperform Node2Vec because they use features and are optimized end-to-end for the prediction target. KumoRFM takes this further with a pre-trained graph transformer that generalizes across relational databases.

Frequently asked questions

What is Node2Vec?

Node2Vec is an unsupervised graph embedding method that learns dense vector representations for nodes. It generates biased random walks on the graph, then trains a skip-gram model (similar to Word2Vec) to predict which nodes co-occur in walks. The result is an embedding where nodes with similar structural roles or community membership are close together. Introduced by Grover and Leskovec in 2016.

What do the p and q parameters control?

Parameter p controls the likelihood of returning to the previous node (backtracking). Low p encourages revisiting, producing BFS-like walks that stay local. Parameter q controls the likelihood of exploring outward to new neighborhoods. Low q encourages exploration, producing DFS-like walks that venture further. p=1, q=1 gives uniform random walks. Common starting point: p=1, q=0.5 for community-focused embeddings.

When should I use Node2Vec vs. a GNN?

Use Node2Vec when: you have no node features (topology only), you want unsupervised embeddings for exploration, or you need a quick baseline. Use GNNs when: you have node features, you have labeled training data, you need inductive learning (new nodes at inference), or you need state-of-the-art accuracy. GNNs generally outperform Node2Vec on supervised enterprise tasks.

Is Node2Vec inductive or transductive?

Node2Vec is transductive. It learns a fixed embedding table with one row per node. New nodes that appear after training have no embedding. You must retrain on the full graph to include new nodes. For production systems where new entities arrive continuously, inductive GNN methods (GraphSAGE) are preferred.

How does Node2Vec scale to large enterprise graphs?

Node2Vec scales well because random walk generation is embarrassingly parallel and the skip-gram model trains with negative sampling (O(1) per training sample). It can handle graphs with millions of nodes. The main bottleneck is storing the embedding table: 1M nodes x 128 dimensions x 4 bytes = 512 MB. For 100M+ node graphs, approximate methods or GNN alternatives are needed.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.