A social graph represents users as nodes and their relationships as edges in a social platform. Friendships, follows, messages, likes, shares, and group memberships are all edges connecting user nodes. GNNs on social graphs learn that a user's behavior, preferences, and outcomes are heavily influenced by their network neighborhood, enabling predictions that non-graph methods cannot make.
Structure of social graphs
Social graphs have distinctive structural properties:
- Power-law degree distribution: Most users have few connections, a few have millions (influencers, public figures). This creates hubs that dominate message passing.
- High clustering: Friends of friends are likely friends (triadic closure). Social graphs have clustering coefficients 100-1000x higher than random graphs.
- Community structure: Dense clusters of interconnected users correspond to friend groups, schools, workplaces, and interest communities.
- Small world: Despite billions of nodes, the average shortest path is 4-6 hops (six degrees of separation).
GNN applications on social graphs
Friend/follow recommendation
The most direct application: link prediction. Given two users who are not yet connected, predict the probability that they will connect. GNNs encode each user based on their network neighborhood, then score the similarity of candidate pairs. This outperforms “people you may know” heuristics based on mutual friends alone.
Community detection
Identifying clusters of tightly connected users: friend groups, interest communities, or organizational units. GNN representations naturally cluster similar users, and applying k-means or spectral clustering to GNN embeddings produces high-quality community assignments.
Fake account detection
Fake accounts have distinctive graph signatures: they connect to many real users (to appear legitimate) but rarely connect to each other's friends (they lack organic social context). GNNs learn these structural anomalies, detecting fakes that pass feature-based checks.
Content virality prediction
Will a post go viral? The content matters, but the poster's position in the social graph matters more. A post from a user bridging two large communities has higher viral potential than the same post from a user deep within a single cluster. GNNs learn these structural predictors.
Scale challenges
Production social graphs are among the largest graphs in existence:
- Facebook: 3B+ user nodes, 200B+ edges
- Twitter/X: 500M+ user nodes, 100B+ follow edges
- LinkedIn: 1B+ member nodes, 20B+ connection edges
Full-graph GNN training is impossible at this scale. Production systems use: (1) neighbor sampling (sample 10-25 neighbors per hop), (2) mini-batch training (process subgraphs), and (3) distributed training across hundreds of GPUs.
Privacy considerations
Social graph structure alone reveals sensitive information. Research has shown that graph topology can predict political affiliation (85% accuracy), sexual orientation, and health conditions, even without access to user profiles or content. GNN deployments on social data must address:
- Differential privacy in GNN training to limit information leakage
- Federated learning to keep graph data on user devices
- Audit mechanisms to prevent discriminatory predictions based on graph structure