What is a social graph?

A social graph represents users as nodes and their relationships as edges. Relationships can be symmetric (friendship, mutual follow) or directed (follow, message). Node features include demographics, activity metrics, and content preferences. Edge features include interaction frequency, recency, and type.

What can GNNs predict on social graphs?

Friend/follow recommendations (link prediction), community detection (clustering), content virality (will this post go viral based on the poster's graph position?), user influence scoring, fake account detection, and churn prediction (users whose friends left are more likely to leave).

How large are production social graphs?

Very large. Facebook's social graph has billions of nodes and hundreds of billions of edges. Processing requires distributed GNN training, neighbor sampling, and mini-batch processing. Production systems use GraphSAGE-style sampling to make GNN inference tractable at this scale.

Social Graphs: Users as Nodes, Relationships as Edges | Kumo.ai

A social graph represents users as nodes and their relationships as edges in a social platform. Friendships, follows, messages, likes, shares, and group memberships are all edges connecting user nodes. GNNs on social graphs learn that a user's behavior, preferences, and outcomes are heavily influenced by their network neighborhood, enabling predictions that non-graph methods cannot make.

Structure of social graphs

Social graphs have distinctive structural properties:

Power-law degree distribution: Most users have few connections, a few have millions (influencers, public figures). This creates hubs that dominate message passing.
High clustering: Friends of friends are likely friends (triadic closure). Social graphs have clustering coefficients 100-1000x higher than random graphs.
Community structure: Dense clusters of interconnected users correspond to friend groups, schools, workplaces, and interest communities.
Small world: Despite billions of nodes, the average shortest path is 4-6 hops (six degrees of separation).

GNN applications on social graphs

Friend/follow recommendation

The most direct application: link prediction. Given two users who are not yet connected, predict the probability that they will connect. GNNs encode each user based on their network neighborhood, then score the similarity of candidate pairs. This outperforms “people you may know” heuristics based on mutual friends alone.

Community detection

Identifying clusters of tightly connected users: friend groups, interest communities, or organizational units. GNN representations naturally cluster similar users, and applying k-means or spectral clustering to GNN embeddings produces high-quality community assignments.

Fake account detection

Fake accounts have distinctive graph signatures: they connect to many real users (to appear legitimate) but rarely connect to each other's friends (they lack organic social context). GNNs learn these structural anomalies, detecting fakes that pass feature-based checks.

Content virality prediction

Will a post go viral? The content matters, but the poster's position in the social graph matters more. A post from a user bridging two large communities has higher viral potential than the same post from a user deep within a single cluster. GNNs learn these structural predictors.

Scale challenges

Production social graphs are among the largest graphs in existence:

Facebook: 3B+ user nodes, 200B+ edges
Twitter/X: 500M+ user nodes, 100B+ follow edges
LinkedIn: 1B+ member nodes, 20B+ connection edges

Full-graph GNN training is impossible at this scale. Production systems use: (1) neighbor sampling (sample 10-25 neighbors per hop), (2) mini-batch training (process subgraphs), and (3) distributed training across hundreds of GPUs.

Privacy considerations

Social graph structure alone reveals sensitive information. Research has shown that graph topology can predict political affiliation (85% accuracy), sexual orientation, and health conditions, even without access to user profiles or content. GNN deployments on social data must address:

Differential privacy in GNN training to limit information leakage
Federated learning to keep graph data on user devices
Audit mechanisms to prevent discriminatory predictions based on graph structure

Key Takeaways

1Social graphs have users as nodes and relationships as edges. They exhibit power-law degrees, high clustering, community structure, and small-world properties that GNNs exploit.
2Key tasks: friend recommendation (link prediction), community detection, fake account detection, virality prediction, and churn prediction. All benefit significantly from graph structure.
3The social influence effect: connected users influence each other's behavior. Churn risk increases 5-15% per churned connection. GNNs propagate these signals naturally.
4Scale is the primary challenge: billions of nodes, hundreds of billions of edges. Neighbor sampling, mini-batching, and distributed training are required for production.
5Privacy is critical: graph structure reveals sensitive attributes even without features. GNN deployments require differential privacy, federated learning, and bias auditing.

Social Graphs: Users as Nodes, Relationships as Edges