How do GNNs model influence propagation in social networks?

GNNs naturally model influence propagation because message passing IS propagation. A 2-layer GNN lets each user's prediction incorporate information from their 2-hop social neighborhood. The attention mechanism (GATConv) learns which connections carry more influence, automatically distinguishing influential friends from passive followers.

What is the influence maximization problem?

Given a budget of K seed users to target with a campaign, which K users maximize the total reach? This is NP-hard with traditional methods. GNNs approximate the solution by learning node-level influence scores that predict how far information will spread from each user, enabling greedy seed selection.

How do you train a GNN for influence prediction without ground truth?

Use historical cascade data: past content shares, retweets, or product adoption events. Each cascade provides a ground-truth propagation tree. Train the GNN to predict which users will be activated in a cascade given the seed users. The cascade dataset provides natural supervision.

What attention mechanism works best for social influence?

GATv2Conv (the corrected attention mechanism from Brody et al., 2021) is preferred because it computes dynamic attention that depends on both source and target features. For influence, this means the attention weight captures how receptive user B is to influence from user A, which varies by user pair.

Can KumoRFM predict social influence?

Yes. KumoRFM takes your social platform data (users, connections, content, interactions) and predicts influence reach or engagement with one PQL query. It automatically constructs the social graph and captures propagation patterns.

Social Influence Prediction with PyG: Attention Layers for Propagation Modeling | PyG Guide

The business problem

Influencer marketing has grown to a $21 billion market, yet 70% of campaigns fail to achieve target ROI. The core problem: brands select influencers based on follower count and engagement rate, not actual influence. A user with 1 million followers but low propagation reach generates fewer conversions than a user with 50,000 highly engaged, action-taking followers.

True influence is a network property. It depends not just on how many followers you have, but on who those followers are, how connected they are to each other, and how likely they are to propagate your content further. This is inherently a graph problem.

Why flat ML fails

Follower count is not influence: A user with 1M followers where 95% are bots has less influence than one with 10K highly active, well-connected followers. Flat models cannot see follower quality.
No propagation modeling: Influence is about cascade depth, not just direct reach. If your followers share your content, and their followers share it further, the total reach is exponentially larger.
No topic specificity: A tech influencer has zero influence on fashion content. The graph captures topic-specific connections: who follows whom for which topics.
No network position: Users who bridge communities (connecting otherwise disconnected groups) have outsized influence. This is a purely structural property of the graph.

The relational schema

schema.txt

Node types:
  User      (id, follower_count, avg_engagement, topic_vector)
  Content   (id, type, topic, timestamp, text_emb)
  Community (id, size, topic_focus, density)

Edge types:
  User    --[follows]-->    User      (since_date, interaction_freq)
  User    --[shared]-->     Content   (timestamp)
  User    --[created]-->    Content   (timestamp)
  User    --[member_of]-->  Community
  Content --[reply_to]-->   Content

The social graph captures who follows whom (with interaction frequency), content creation/sharing cascades, and community membership.

PyG architecture: GATv2Conv for influence scoring

influence_model.py

import torch
import torch.nn.functional as F
from torch_geometric.nn import GATv2Conv, HeteroConv, Linear

class InfluenceGNN(torch.nn.Module):
    def __init__(self, hidden_dim=128, heads=4):
        super().__init__()
        self.user_lin = Linear(-1, hidden_dim)
        self.content_lin = Linear(-1, hidden_dim)
        self.community_lin = Linear(-1, hidden_dim)

        self.conv1 = HeteroConv({
            ('user', 'follows', 'user'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'shared', 'content'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'created', 'content'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'member_of', 'community'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        self.conv2 = HeteroConv({
            ('user', 'follows', 'user'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'shared', 'content'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'member_of', 'community'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        # Predict influence reach (regression)
        self.influence_head = torch.nn.Sequential(
            Linear(hidden_dim, 64),
            torch.nn.ReLU(),
            Linear(64, 1),
        )

    def forward(self, x_dict, edge_index_dict):
        x_dict['user'] = self.user_lin(x_dict['user'])
        x_dict['content'] = self.content_lin(x_dict['content'])
        x_dict['community'] = self.community_lin(
            x_dict['community'])

        x_dict = {k: F.elu(v) for k, v in
                  self.conv1(x_dict, edge_index_dict).items()}
        x_dict = self.conv2(x_dict, edge_index_dict)

        return self.influence_head(
            x_dict['user']).squeeze(-1)

GATv2Conv learns directional influence weights per follow edge. The model predicts expected cascade reach per user, enabling optimal seed selection for campaigns.

Expected performance

Influence prediction is a regression task (cascade reach). Spearman rank correlation with actual cascade size is the standard metric:

Follower count (heuristic): ~0.25 Spearman correlation
LightGBM (flat features): ~0.45 Spearman correlation
GNN (GATv2Conv): ~0.65 Spearman correlation
KumoRFM (zero-shot): ~0.67 Spearman correlation

Or use KumoRFM in one line

KumoRFM PQL

PREDICT cascade_reach FOR user
USING user, content, interaction, community

One PQL query. KumoRFM constructs the social graph and predicts propagation reach per user for campaign optimization.

Key Takeaways

1Social influence is a graph propagation problem. GNN message passing naturally models information spread, learning which connections carry real influence vs passive following.
2GATv2Conv learns directional attention: how receptive user B is to user A's content, capturing the asymmetric nature of social influence.
3GNNs improve cascade size prediction (Spearman correlation ~0.65) dramatically over flat models (~0.45). Network position and propagation patterns are invisible to feature-based approaches.
4Historical cascades (shares, retweets) provide natural training data. Each cascade is a ground-truth propagation tree for supervision.
5KumoRFM delivers competitive influence predictions with one PQL query. No social graph construction, no cascade feature engineering, no attention layer tuning.

Social Influence: Attention Layers for Propagation Modeling