What types of anomalies exist in graphs?

Three types: (1) Node anomalies: nodes with unusual features given their neighborhood (an account with normal features but connected to many flagged accounts). (2) Edge anomalies: unexpected connections (a new employee suddenly connected to sensitive documents). (3) Subgraph anomalies: groups of nodes forming unusual structures (a fraud ring forming a dense cluster in an otherwise sparse graph).

How do GNNs detect anomalies differently from traditional methods?

Traditional anomaly detection (Isolation Forest, autoencoders) treats each data point independently. Graph anomaly detection considers both the node's own features AND its structural context. A node might have normal features but be anomalous because of its position in the graph (connected to known anomalies, bridge between unrelated communities, isolated from expected peers).

What is the graph autoencoder approach to anomaly detection?

A graph autoencoder (GAE) learns to reconstruct the adjacency matrix from node embeddings. Nodes that are poorly reconstructed (high reconstruction error) are flagged as anomalies. The intuition: normal nodes fit the learned graph structure patterns well, while anomalous nodes do not. GNN-based autoencoders additionally reconstruct node features, catching both structural and attribute anomalies.

How do you handle the lack of labeled anomalies for training?

Most graph anomaly detection is unsupervised or semi-supervised because labeled anomalies are rare. Approaches include: (1) reconstruction-based: train to reconstruct normal patterns, flag high-error nodes. (2) Contrastive: learn that normal nodes should be similar to their neighbors; dissimilar nodes are anomalous. (3) One-class: learn a boundary around normal node embeddings; nodes outside are anomalous.

Anomaly Detection on Graphs: Finding Unusual Patterns in Graph Data | Kumo.ai

Graph anomaly detection finds nodes, edges, and subgraphs that deviate from learned normal patterns in graph-structured data. Unlike point-based anomaly detection (Isolation Forest, autoencoders on tabular data), graph anomaly detection considers structural context. A data point might look normal in isolation but become suspicious when you see who it connects to, how those connections formed, and whether the local graph structure matches expectations.

Three types of graph anomalies

Node anomalies

A node whose features or structural position deviates from similar nodes. Examples: an employee account with normal access patterns but connected to an unusual number of sensitive document nodes. A network device with normal traffic volume but connecting to IP addresses that no peer device connects to.

Edge anomalies

A connection that should not exist or that appears at an unexpected time. Examples: a new edge between a junior employee and executive-level resources. A transaction edge between two accounts with no shared context (different countries, industries, and connection histories).

Subgraph anomalies

A group of nodes forming an unusual local structure. Examples: a dense cluster of accounts in an otherwise sparse transaction graph (potential fraud ring). A set of network devices forming a communication pattern that deviates from the organizational hierarchy (potential data exfiltration).

Approaches

Reconstruction-based: graph autoencoders

A graph autoencoder (GAE) encodes nodes into low-dimensional embeddings using a GNN encoder, then reconstructs the adjacency matrix (and optionally node features) from these embeddings. The reconstruction loss for normal nodes is low (the model learns their patterns). Anomalous nodes have high reconstruction error because they do not fit the learned normal structure.

This approach is fully unsupervised: train only on the graph data, no anomaly labels needed. The anomaly score is the reconstruction error.

Contrastive-based: neighborhood agreement

Contrastive methods learn that a normal node's embedding should agree with its neighborhood's aggregated embedding. If a node's features say “I am a low-risk retail account” but its neighborhood says “you are connected to multiple flagged accounts,” the disagreement signals an anomaly.

The contrastive loss encourages node embeddings to be similar to a local summary (positive pairs) and dissimilar to random nodes (negative pairs). At inference, nodes with low agreement to their local context are flagged.

One-class approaches

Deep SVDD (Support Vector Data Description) adapted for graphs: learn a GNN that maps normal nodes close to a learned center point in embedding space. Anomalous nodes map far from the center. The training objective minimizes the hypersphere containing all normal node embeddings.

Why structural context matters: examples

Network security: a server with normal traffic volume but connecting to 200 unique external IPs when peer servers connect to 5-10. The volume is normal; the degree distribution deviation is the anomaly.
Financial networks: an account that individually passes all AML checks but is 2 hops from 5 sanctioned entities through intermediary accounts that also only connect to sanctioned entities. The fraud ring pattern is visible only in graph structure.
Manufacturing: a component with normal quality metrics but sourced from a supplier whose other components have elevated defect rates. The supply chain graph reveals the risk.
Social networks: a user account with natural-looking posting patterns but structurally identical to known bot accounts (same follower/following ratio, connected to the same set of seed accounts, similar graph neighborhood structure).

Challenges

Label scarcity: anomalies are rare and expensive to label. Unsupervised and semi-supervised methods dominate.
Class imbalance: even when labels exist, anomalies are less than 1% of nodes. Training requires careful loss weighting or sampling strategies.
Dynamic graphs: what is normal changes over time. Anomaly thresholds must adapt. Temporal GNNs handle this by incorporating time-awareness.
Explainability: flagging an anomaly is not enough; analysts need to understand why it was flagged. Graph explainability methods identify which edges and neighbor features contributed to the anomaly score.

Key Takeaways

1Graph anomalies include unusual nodes (features or position), unexpected edges (connections that should not exist), and abnormal subgraphs (unusual local structures). GNNs detect all three by encoding structural context into node embeddings.
2Graph autoencoders learn to reconstruct normal patterns unsupervised. Anomalous nodes have high reconstruction error because they deviate from learned structure. No labeled anomalies needed.
3Contrastive methods detect nodes whose features disagree with their neighborhood context. A 'normal' node connected to many flagged nodes creates a detectable embedding disagreement.
4The structural dimension is the key advantage over tabular anomaly detection. A node can have perfectly normal features yet be anomalous because of its position in the graph: connected to flagged entities, bridging unrelated communities, or exhibiting unusual degree patterns.
5In production, graph anomaly detection requires handling label scarcity (unsupervised methods), class imbalance (less than 1% anomalies), dynamic evolution (temporal GNNs), and explainability (which edges drove the anomaly score).

Anomaly Detection on Graphs: Finding Unusual Patterns in Graph-Structured Data