The business problem
The US residential real estate market is worth $47 trillion. Accurate property valuation underpins mortgage lending ($2.5T annually), insurance pricing, property tax assessment, and investment decisions. A 5% valuation error on a $500K home is $25K, enough to cause lending losses or mispriced insurance.
Traditional AVMs use hedonic regression: predict price from property features (bedrooms, bathrooms, square footage, lot size, age). They incorporate “comparable sales” as manually selected features. But the selection of comparables is itself the hard problem, and the spatial relationships between properties, schools, and amenities are complex and multidimensional.
Why flat ML fails
- Comparable selection: Flat models use distance-based comparable selection as a preprocessing step. The “right” comparables depend on property type, condition, and market conditions, making rule-based selection suboptimal.
- No spatial context: A property 0.1 miles from a top-rated school is worth more than one 1.5 miles away. Flat models encode this as “distance_to_school = 0.1” but miss the school quality propagated through the spatial graph.
- No neighborhood dynamics: Gentrifying neighborhoods show rapid appreciation. The spatial graph captures this through recent comparable sales at increasing prices.
- Amenity interactions: Near a park is good. Near a park and a transit station is better. The spatial graph naturally captures multi-amenity interactions through message passing.
The relational schema
Node types:
Property (id, beds, baths, sqft, lot, year_built, type)
Neighborhood (id, median_income, crime_rate, walkability)
School (id, rating, type, enrollment)
Amenity (id, type, quality_score)
Edge types:
Property --[near]--> Property (distance_m)
Property --[in]--> Neighborhood
Property --[zoned]--> School (distance_m)
Property --[close_to]--> Amenity (distance_m, walk_min)
Property --[sold]--> Property (price, date) # self-loop with txnProperties connected by proximity, with neighborhood, school, and amenity context. Transaction edges carry sale prices for comparable-based valuation.
PyG architecture: SAGEConv for spatial valuation
import torch
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv, HeteroConv, Linear
class ValuationGNN(torch.nn.Module):
def __init__(self, hidden_dim=128):
super().__init__()
self.property_lin = Linear(-1, hidden_dim)
self.neighborhood_lin = Linear(-1, hidden_dim)
self.school_lin = Linear(-1, hidden_dim)
self.amenity_lin = Linear(-1, hidden_dim)
self.conv1 = HeteroConv({
('property', 'near', 'property'): SAGEConv(
hidden_dim, hidden_dim),
('property', 'in', 'neighborhood'): SAGEConv(
hidden_dim, hidden_dim),
('property', 'zoned', 'school'): SAGEConv(
hidden_dim, hidden_dim),
('property', 'close_to', 'amenity'): SAGEConv(
hidden_dim, hidden_dim),
}, aggr='mean')
self.conv2 = HeteroConv({
('property', 'near', 'property'): SAGEConv(
hidden_dim, hidden_dim),
('property', 'in', 'neighborhood'): SAGEConv(
hidden_dim, hidden_dim),
('property', 'zoned', 'school'): SAGEConv(
hidden_dim, hidden_dim),
}, aggr='mean')
self.regressor = torch.nn.Sequential(
Linear(hidden_dim, 64),
torch.nn.ReLU(),
Linear(64, 1),
)
def forward(self, x_dict, edge_index_dict):
x_dict['property'] = self.property_lin(
x_dict['property'])
x_dict['neighborhood'] = self.neighborhood_lin(
x_dict['neighborhood'])
x_dict['school'] = self.school_lin(x_dict['school'])
x_dict['amenity'] = self.amenity_lin(x_dict['amenity'])
x_dict = {k: F.relu(v) for k, v in
self.conv1(x_dict, edge_index_dict).items()}
x_dict = self.conv2(x_dict, edge_index_dict)
return self.regressor(
x_dict['property']).squeeze(-1)SAGEConv aggregates comparable sales, school quality, and amenity proximity. Two hops capture neighborhood-level context: not just direct comparables but the neighborhood of comparables.
Expected performance
Property valuation is a regression task. The standard metric is Median Absolute Percentage Error (MdAPE):
- Hedonic regression: ~12% MdAPE
- LightGBM (flat features): ~9% MdAPE
- GNN (spatial graph): ~6-7% MdAPE
- KumoRFM (zero-shot): ~6% MdAPE
Or use KumoRFM in one line
PREDICT sale_price FOR property
USING property, neighborhood, school, amenity, transactionOne PQL query. KumoRFM discovers spatial relationships and comparable patterns for property valuation.