The business problem
Influencer marketing has grown to a $21 billion market, yet 70% of campaigns fail to achieve target ROI. The core problem: brands select influencers based on follower count and engagement rate, not actual influence. A user with 1 million followers but low propagation reach generates fewer conversions than a user with 50,000 highly engaged, action-taking followers.
True influence is a network property. It depends not just on how many followers you have, but on who those followers are, how connected they are to each other, and how likely they are to propagate your content further. This is inherently a graph problem.
Why flat ML fails
- Follower count is not influence: A user with 1M followers where 95% are bots has less influence than one with 10K highly active, well-connected followers. Flat models cannot see follower quality.
- No propagation modeling: Influence is about cascade depth, not just direct reach. If your followers share your content, and their followers share it further, the total reach is exponentially larger.
- No topic specificity: A tech influencer has zero influence on fashion content. The graph captures topic-specific connections: who follows whom for which topics.
- No network position: Users who bridge communities (connecting otherwise disconnected groups) have outsized influence. This is a purely structural property of the graph.
The relational schema
Node types:
User (id, follower_count, avg_engagement, topic_vector)
Content (id, type, topic, timestamp, text_emb)
Community (id, size, topic_focus, density)
Edge types:
User --[follows]--> User (since_date, interaction_freq)
User --[shared]--> Content (timestamp)
User --[created]--> Content (timestamp)
User --[member_of]--> Community
Content --[reply_to]--> ContentThe social graph captures who follows whom (with interaction frequency), content creation/sharing cascades, and community membership.
PyG architecture: GATv2Conv for influence scoring
import torch
import torch.nn.functional as F
from torch_geometric.nn import GATv2Conv, HeteroConv, Linear
class InfluenceGNN(torch.nn.Module):
def __init__(self, hidden_dim=128, heads=4):
super().__init__()
self.user_lin = Linear(-1, hidden_dim)
self.content_lin = Linear(-1, hidden_dim)
self.community_lin = Linear(-1, hidden_dim)
self.conv1 = HeteroConv({
('user', 'follows', 'user'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
('user', 'shared', 'content'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
('user', 'created', 'content'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
('user', 'member_of', 'community'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
}, aggr='sum')
self.conv2 = HeteroConv({
('user', 'follows', 'user'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
('user', 'shared', 'content'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
('user', 'member_of', 'community'): GATv2Conv(
hidden_dim, hidden_dim // heads, heads=heads),
}, aggr='sum')
# Predict influence reach (regression)
self.influence_head = torch.nn.Sequential(
Linear(hidden_dim, 64),
torch.nn.ReLU(),
Linear(64, 1),
)
def forward(self, x_dict, edge_index_dict):
x_dict['user'] = self.user_lin(x_dict['user'])
x_dict['content'] = self.content_lin(x_dict['content'])
x_dict['community'] = self.community_lin(
x_dict['community'])
x_dict = {k: F.elu(v) for k, v in
self.conv1(x_dict, edge_index_dict).items()}
x_dict = self.conv2(x_dict, edge_index_dict)
return self.influence_head(
x_dict['user']).squeeze(-1)GATv2Conv learns directional influence weights per follow edge. The model predicts expected cascade reach per user, enabling optimal seed selection for campaigns.
Expected performance
Influence prediction is a regression task (cascade reach). Spearman rank correlation with actual cascade size is the standard metric:
- Follower count (heuristic): ~0.25 Spearman correlation
- LightGBM (flat features): ~0.45 Spearman correlation
- GNN (GATv2Conv): ~0.65 Spearman correlation
- KumoRFM (zero-shot): ~0.67 Spearman correlation
Or use KumoRFM in one line
PREDICT cascade_reach FOR user
USING user, content, interaction, communityOne PQL query. KumoRFM constructs the social graph and predicts propagation reach per user for campaign optimization.