Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Use Case11 min read

Social Influence: Attention Layers for Propagation Modeling

Influencer marketing is a $21B market, but 70% of campaigns underperform because brands pick influencers by follower count, not actual influence. Here is how to build a GNN that predicts real propagation reach.

PyTorch Geometric

TL;DR

  • 1Social influence is a graph propagation problem. GNN message passing naturally models how information spreads through social networks, learning which connections carry real influence.
  • 2GATv2Conv learns directional attention weights per edge: how receptive user B is to user A's influence, which varies by relationship quality and topic alignment.
  • 3On cascade prediction benchmarks, GNNs improve Spearman correlation with actual cascade size by 0.2+ over flat-table baselines. Network position and propagation patterns drive the improvement.
  • 4Historical cascade data (shares, retweets, adoptions) provides natural training supervision. Each cascade is a ground-truth propagation tree.
  • 5KumoRFM predicts influence reach with one PQL query, automatically constructing the social graph and capturing propagation patterns without cascade feature engineering.

The business problem

Influencer marketing has grown to a $21 billion market, yet 70% of campaigns fail to achieve target ROI. The core problem: brands select influencers based on follower count and engagement rate, not actual influence. A user with 1 million followers but low propagation reach generates fewer conversions than a user with 50,000 highly engaged, action-taking followers.

True influence is a network property. It depends not just on how many followers you have, but on who those followers are, how connected they are to each other, and how likely they are to propagate your content further. This is inherently a graph problem.

Why flat ML fails

  • Follower count is not influence: A user with 1M followers where 95% are bots has less influence than one with 10K highly active, well-connected followers. Flat models cannot see follower quality.
  • No propagation modeling: Influence is about cascade depth, not just direct reach. If your followers share your content, and their followers share it further, the total reach is exponentially larger.
  • No topic specificity: A tech influencer has zero influence on fashion content. The graph captures topic-specific connections: who follows whom for which topics.
  • No network position: Users who bridge communities (connecting otherwise disconnected groups) have outsized influence. This is a purely structural property of the graph.

The relational schema

schema.txt
Node types:
  User      (id, follower_count, avg_engagement, topic_vector)
  Content   (id, type, topic, timestamp, text_emb)
  Community (id, size, topic_focus, density)

Edge types:
  User    --[follows]-->    User      (since_date, interaction_freq)
  User    --[shared]-->     Content   (timestamp)
  User    --[created]-->    Content   (timestamp)
  User    --[member_of]-->  Community
  Content --[reply_to]-->   Content

The social graph captures who follows whom (with interaction frequency), content creation/sharing cascades, and community membership.

PyG architecture: GATv2Conv for influence scoring

influence_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import GATv2Conv, HeteroConv, Linear

class InfluenceGNN(torch.nn.Module):
    def __init__(self, hidden_dim=128, heads=4):
        super().__init__()
        self.user_lin = Linear(-1, hidden_dim)
        self.content_lin = Linear(-1, hidden_dim)
        self.community_lin = Linear(-1, hidden_dim)

        self.conv1 = HeteroConv({
            ('user', 'follows', 'user'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'shared', 'content'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'created', 'content'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'member_of', 'community'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        self.conv2 = HeteroConv({
            ('user', 'follows', 'user'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'shared', 'content'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
            ('user', 'member_of', 'community'): GATv2Conv(
                hidden_dim, hidden_dim // heads, heads=heads),
        }, aggr='sum')

        # Predict influence reach (regression)
        self.influence_head = torch.nn.Sequential(
            Linear(hidden_dim, 64),
            torch.nn.ReLU(),
            Linear(64, 1),
        )

    def forward(self, x_dict, edge_index_dict):
        x_dict['user'] = self.user_lin(x_dict['user'])
        x_dict['content'] = self.content_lin(x_dict['content'])
        x_dict['community'] = self.community_lin(
            x_dict['community'])

        x_dict = {k: F.elu(v) for k, v in
                  self.conv1(x_dict, edge_index_dict).items()}
        x_dict = self.conv2(x_dict, edge_index_dict)

        return self.influence_head(
            x_dict['user']).squeeze(-1)

GATv2Conv learns directional influence weights per follow edge. The model predicts expected cascade reach per user, enabling optimal seed selection for campaigns.

Expected performance

Influence prediction is a regression task (cascade reach). Spearman rank correlation with actual cascade size is the standard metric:

  • Follower count (heuristic): ~0.25 Spearman correlation
  • LightGBM (flat features): ~0.45 Spearman correlation
  • GNN (GATv2Conv): ~0.65 Spearman correlation
  • KumoRFM (zero-shot): ~0.67 Spearman correlation

Or use KumoRFM in one line

KumoRFM PQL
PREDICT cascade_reach FOR user
USING user, content, interaction, community

One PQL query. KumoRFM constructs the social graph and predicts propagation reach per user for campaign optimization.

Frequently asked questions

How do GNNs model influence propagation in social networks?

GNNs naturally model influence propagation because message passing IS propagation. A 2-layer GNN lets each user's prediction incorporate information from their 2-hop social neighborhood. The attention mechanism (GATConv) learns which connections carry more influence, automatically distinguishing influential friends from passive followers.

What is the influence maximization problem?

Given a budget of K seed users to target with a campaign, which K users maximize the total reach? This is NP-hard with traditional methods. GNNs approximate the solution by learning node-level influence scores that predict how far information will spread from each user, enabling greedy seed selection.

How do you train a GNN for influence prediction without ground truth?

Use historical cascade data: past content shares, retweets, or product adoption events. Each cascade provides a ground-truth propagation tree. Train the GNN to predict which users will be activated in a cascade given the seed users. The cascade dataset provides natural supervision.

What attention mechanism works best for social influence?

GATv2Conv (the corrected attention mechanism from Brody et al., 2021) is preferred because it computes dynamic attention that depends on both source and target features. For influence, this means the attention weight captures how receptive user B is to influence from user A, which varies by user pair.

Can KumoRFM predict social influence?

Yes. KumoRFM takes your social platform data (users, connections, content, interactions) and predicts influence reach or engagement with one PQL query. It automatically constructs the social graph and captures propagation patterns.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.