Berlin Tech Meetup: The Future of Relational Foundation Models, Systems, and Real-World Applications

Register now:
PyG/Use Case11 min read

Real Estate Valuation: GNN on Location + Transaction Graphs

The US residential real estate market is worth $47 trillion. Automated valuation models (AVMs) still rely on comparable sales selected by distance. Here is how to build a GNN that values properties using the full spatial and transactional context.

PyTorch Geometric

TL;DR

  • 1Property valuation is a spatial graph regression problem. A property's value depends on its neighborhood: nearby sales, schools, transit, and amenities form a location graph.
  • 2SAGEConv on the spatial graph aggregates comparable sale prices, neighborhood features, and amenity proximity into property-level value predictions.
  • 3On property valuation benchmarks, GNNs reduce median absolute error from ~12% (hedonic regression) to ~6-8%. Spatial context from comparable sales and amenities provides the most significant lift.
  • 4Temporal transaction features capture market dynamics: appreciation trends, days-on-market signals, and seasonal patterns.
  • 5KumoRFM predicts property values with one PQL query, automatically discovering spatial relationships and comparable sale patterns from your real estate data.

The business problem

The US residential real estate market is worth $47 trillion. Accurate property valuation underpins mortgage lending ($2.5T annually), insurance pricing, property tax assessment, and investment decisions. A 5% valuation error on a $500K home is $25K, enough to cause lending losses or mispriced insurance.

Traditional AVMs use hedonic regression: predict price from property features (bedrooms, bathrooms, square footage, lot size, age). They incorporate “comparable sales” as manually selected features. But the selection of comparables is itself the hard problem, and the spatial relationships between properties, schools, and amenities are complex and multidimensional.

Why flat ML fails

  • Comparable selection: Flat models use distance-based comparable selection as a preprocessing step. The “right” comparables depend on property type, condition, and market conditions, making rule-based selection suboptimal.
  • No spatial context: A property 0.1 miles from a top-rated school is worth more than one 1.5 miles away. Flat models encode this as “distance_to_school = 0.1” but miss the school quality propagated through the spatial graph.
  • No neighborhood dynamics: Gentrifying neighborhoods show rapid appreciation. The spatial graph captures this through recent comparable sales at increasing prices.
  • Amenity interactions: Near a park is good. Near a park and a transit station is better. The spatial graph naturally captures multi-amenity interactions through message passing.

The relational schema

schema.txt
Node types:
  Property    (id, beds, baths, sqft, lot, year_built, type)
  Neighborhood (id, median_income, crime_rate, walkability)
  School      (id, rating, type, enrollment)
  Amenity     (id, type, quality_score)

Edge types:
  Property --[near]--> Property    (distance_m)
  Property --[in]-->   Neighborhood
  Property --[zoned]--> School     (distance_m)
  Property --[close_to]--> Amenity (distance_m, walk_min)
  Property --[sold]-->  Property   (price, date)  # self-loop with txn

Properties connected by proximity, with neighborhood, school, and amenity context. Transaction edges carry sale prices for comparable-based valuation.

PyG architecture: SAGEConv for spatial valuation

valuation_model.py
import torch
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv, HeteroConv, Linear

class ValuationGNN(torch.nn.Module):
    def __init__(self, hidden_dim=128):
        super().__init__()
        self.property_lin = Linear(-1, hidden_dim)
        self.neighborhood_lin = Linear(-1, hidden_dim)
        self.school_lin = Linear(-1, hidden_dim)
        self.amenity_lin = Linear(-1, hidden_dim)

        self.conv1 = HeteroConv({
            ('property', 'near', 'property'): SAGEConv(
                hidden_dim, hidden_dim),
            ('property', 'in', 'neighborhood'): SAGEConv(
                hidden_dim, hidden_dim),
            ('property', 'zoned', 'school'): SAGEConv(
                hidden_dim, hidden_dim),
            ('property', 'close_to', 'amenity'): SAGEConv(
                hidden_dim, hidden_dim),
        }, aggr='mean')

        self.conv2 = HeteroConv({
            ('property', 'near', 'property'): SAGEConv(
                hidden_dim, hidden_dim),
            ('property', 'in', 'neighborhood'): SAGEConv(
                hidden_dim, hidden_dim),
            ('property', 'zoned', 'school'): SAGEConv(
                hidden_dim, hidden_dim),
        }, aggr='mean')

        self.regressor = torch.nn.Sequential(
            Linear(hidden_dim, 64),
            torch.nn.ReLU(),
            Linear(64, 1),
        )

    def forward(self, x_dict, edge_index_dict):
        x_dict['property'] = self.property_lin(
            x_dict['property'])
        x_dict['neighborhood'] = self.neighborhood_lin(
            x_dict['neighborhood'])
        x_dict['school'] = self.school_lin(x_dict['school'])
        x_dict['amenity'] = self.amenity_lin(x_dict['amenity'])

        x_dict = {k: F.relu(v) for k, v in
                  self.conv1(x_dict, edge_index_dict).items()}
        x_dict = self.conv2(x_dict, edge_index_dict)

        return self.regressor(
            x_dict['property']).squeeze(-1)

SAGEConv aggregates comparable sales, school quality, and amenity proximity. Two hops capture neighborhood-level context: not just direct comparables but the neighborhood of comparables.

Expected performance

Property valuation is a regression task. The standard metric is Median Absolute Percentage Error (MdAPE):

  • Hedonic regression: ~12% MdAPE
  • LightGBM (flat features): ~9% MdAPE
  • GNN (spatial graph): ~6-7% MdAPE
  • KumoRFM (zero-shot): ~6% MdAPE

Or use KumoRFM in one line

KumoRFM PQL
PREDICT sale_price FOR property
USING property, neighborhood, school, amenity, transaction

One PQL query. KumoRFM discovers spatial relationships and comparable patterns for property valuation.

Frequently asked questions

Why are GNNs better than hedonic pricing models for real estate?

Hedonic models (linear regression on property features) treat each property independently. GNNs model the spatial context: nearby comparable sales, neighborhood amenities, school quality, and transit access. A property's value depends heavily on its graph neighborhood, not just its bedrooms and square footage.

What graph structure represents real estate valuation?

Properties are nodes connected to nearby properties (geographic proximity), neighborhoods, schools, transit stations, and amenities. Transaction edges carry sale prices and dates. The graph captures both the property itself and the spatial context that drives value.

How do you handle spatial relationships in a GNN?

Connect properties within a radius (e.g., 0.5 miles) with edges weighted by distance. This creates a spatial graph where the GNN aggregates information from nearby properties. K-nearest-neighbor graphs (connect each property to its K closest) work well when density varies across areas.

Can GNN valuations handle market dynamics?

Yes. Temporal features on transaction edges (sale date, days on market) capture market trends. The GNN learns that recent comparable sales in the neighborhood are more relevant than older ones, and can model appreciation/depreciation trends from the temporal graph structure.

How does KumoRFM handle property valuation?

KumoRFM takes your real estate database (properties, transactions, neighborhoods, amenities) and predicts property values with one PQL query. It automatically discovers spatial relationships and comparable sale patterns.

Learn more about graph ML

PyTorch Geometric is the open-source foundation for graph neural networks. Explore more layers, concepts, and production patterns.