What data is needed for property valuation?

Kumo connects directly to your existing relational tables: PROPERTIES, TRANSACTIONS, NEIGHBORHOODS, AMENITIES, MARKET_DATA. No ETL or feature engineering required. Write a PQL query and get explainable predictions in minutes.

1Regression · Property Valuation

Property Valuation

“What is this property worth?”

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

By submitting, you accept the Terms and Privacy Policy.

Loved by data scientists, ML engineers & CXOs at

A real-world example

What is this property worth?

Traditional AVMs (Automated Valuation Models) rely on comparable sales within a radius, missing the neighborhood graph: how proximity to specific amenities, school districts, transit, and commercial corridors affects value. Median AVM error is 5-8%, meaning a $500K home has a $25K-$40K uncertainty band. For a portfolio lender with $10B in real estate exposure, reducing valuation error by 2% prevents $50M in over-lending losses and missed opportunities annually.

Quick answer

AI property valuation uses graph neural networks to predict property values by modeling the full neighborhood network: amenity proximity, school district effects, transit access, and market momentum. Traditional AVMs rely on comparable sales within a radius and have 5-8% median error. Graph-based models reduce error by 2%+ by capturing non-linear value effects (a new restaurant cluster raising nearby home values, a school rating improvement rippling through the district). For a portfolio lender with $10B in real estate exposure, this prevents $50M in annual over-lending losses.

Approaches compared

4 ways to solve this problem

1. Comparable Sales (Comps)

Value a property based on recent sales of similar properties within a radius. The foundation of traditional appraisals and the simplest AVM approach.

Best for

Homogeneous neighborhoods with frequent transactions and similar housing stock.

Watch out for

Breaks down in heterogeneous areas, thin markets (few recent sales), or rapidly changing neighborhoods. A comp from 6 months ago in a neighborhood with new transit access is already stale. Also cannot capture the non-linear value effects of specific amenities (proximity to a top-rated school vs. an average one).

2. Hedonic Regression

Build a regression model estimating value from property attributes (sqft, bedrooms, year built) and location factors (zip code, school district). The standard AVM approach used by Zillow, Redfin, and lenders.

Best for

Large markets with sufficient transaction data to estimate coefficients for property and location features.

Watch out for

Treats location as a categorical variable (zip code) rather than a continuous spatial relationship. Cannot represent that a property 0.2 miles from a metro station is worth 15% more than one 0.8 miles away, while one 1.5 miles away gets no premium at all. These non-linear spatial relationships require graph-level representation.

3. Gradient-Boosted Trees (XGBoost AVMs)

Train XGBoost or similar models on property features, transaction history, and engineered location features. Captures non-linear relationships better than hedonic regression.

Best for

Markets with rich feature data and enough transaction volume to train robust models. Current industry best practice for production AVMs.

Watch out for

Location features must be manually engineered (distance to nearest school, transit, retail). Cannot capture how amenity clusters interact (a coffee shop next to a bookstore next to a park creates more value than each amenity independently). Also misses the propagation of market momentum through neighborhood networks.

4. Graph Neural Networks (Kumo's Approach)

Connect properties, transactions, neighborhoods, amenities, and market data into a real estate graph. GNNs learn value dynamics from the full neighborhood network, including amenity interactions and market momentum propagation.

Best for

Complex markets with diverse neighborhoods, changing amenity landscapes, and non-linear spatial value effects.

Watch out for

Requires structured data linking properties to amenities, school districts, and transit. The graph approach adds the most value in markets with rapid neighborhood evolution. In stable, homogeneous markets, traditional AVMs perform comparably.

Key metric: Graph-based AVMs reduce median valuation error from 5-8% to 3-4% by capturing neighborhood network effects. For a $10B portfolio, a 2% error reduction prevents $50M in annual over-lending losses.

Why relational data changes the answer

Property value is not an intrinsic attribute of the building. It is a function of the building's position in a network of relationships: 0.4 miles from Oak Park Elementary (rated 8.2), 0.8 miles from the metro station, 0.3 miles from the shopping center, in a neighborhood with 2.1 months of inventory and +4.2% year-over-year price growth. Change any of these relationships and the value changes. Add a new transit stop within 0.5 miles, and values jump 10-15%. Lose a school rating point, and values drop 5-8%. These effects propagate through the neighborhood graph, not through a radius.

Traditional AVMs capture some of this through engineered features (distance to school, distance to transit), but cannot represent the interaction effects. A new restaurant cluster does not just add value to the nearest properties. It shifts foot traffic patterns, increases demand for the neighborhood, and raises rents for commercial spaces that in turn affect residential desirability. Graph-based models represent these cascading effects. SAP's SALT benchmark shows 91% accuracy for graph models vs 63% for gradient-boosted trees on relational tasks. RelBench confirms at 76.71 vs 62.44. In property valuation, reducing median error from 5-8% to 3-6% means tighter confidence intervals on every valuation, which directly prevents over-lending and missed opportunities.

Valuing a property by looking only at the building and recent comps is like valuing a stock by looking only at the company's financials without considering its industry, competitors, or macro environment. The building is the same, but its value changes based on the neighborhood network around it. Just as a company's stock price depends on its position in the market ecosystem, a property's value depends on its position in the neighborhood graph.

How KumoRFM solves this

Graph-powered intelligence for real estate

Kumo connects properties, transactions, neighborhoods, amenities, and market data into a real estate graph. The GNN learns how value propagates through the neighborhood network: how a new restaurant cluster affects nearby residential values, how school rating changes ripple through associated properties, and how transit access creates non-linear value premiums. PQL predicts current market value per property with built-in confidence intervals.

From data to predictions

See the full pipeline in action

Connect your tables, write a PQL query, and get predictions with built-in explainability — all in minutes, not months.

Your data

The relational tables Kumo learns from

PROPERTIES

property_id	type	sqft	bedrooms	year_built
PROP001	Single Family	2,400	4	2005
PROP002	Condo	1,100	2	2018
PROP003	Single Family	3,200	5	1995

TRANSACTIONS

txn_id	property_id	sale_price	date	days_on_market
TXN201	PROP001	$485,000	2023-06-15	22
TXN202	PROP002	$320,000	2024-01-10	45
TXN203	PROP003	$620,000	2022-09-20	18

NEIGHBORHOODS

neighborhood_id	name	median_income	school_rating	crime_index
NBH01	Oak Park	$95,000	8.2	Low
NBH02	Downtown Lofts	$78,000	6.5	Medium
NBH03	Hillcrest	$120,000	9.1	Low

AMENITIES

amenity_id	type	name	distance_to_prop001
AMN01	School	Oak Park Elementary	0.4 mi
AMN02	Transit	Metro Station	0.8 mi
AMN03	Retail	Shopping Center	0.3 mi

MARKET_DATA

neighborhood_id	month	median_price_sqft	inventory_months	yoy_change
NBH01	2025-02	$245	2.1	+4.2%
NBH02	2025-02	$380	3.8	-1.5%
NBH03	2025-02	$285	1.5	+6.8%

Write your PQL query

Describe what to predict in 2–3 lines — Kumo handles the rest

PQL

PREDICT AVG(TRANSACTIONS.sale_price, 0, 90, days)
FOR EACH PROPERTIES.property_id

Prediction output

Every entity gets a score, updated continuously

PROPERTY_ID	TYPE	LAST_SALE	ESTIMATED_VALUE	CONFIDENCE
PROP001	Single Family	$485,000	$528,000	+/- 2.8%
PROP002	Condo	$320,000	$308,000	+/- 4.1%
PROP003	Single Family	$620,000	$695,000	+/- 2.2%

Understand why

Every prediction includes feature attributions — no black boxes

Property PROP003 -- 5BR Single Family in Hillcrest

Predicted: Estimated value: $695,000 (+12.1% since last sale)

Top contributing features

Neighborhood YoY price appreciation

+6.8%

28% attribution

School district rating improvement

9.1 (was 8.7)

24% attribution

Low inventory in neighborhood

1.5 months

20% attribution

Comparable sales in last 90 days

$280/sqft avg

17% attribution

New transit access within 1 mile

Metro opened 2024

11% attribution

Feature attributions are computed automatically for every prediction. No separate tooling required. Learn more about Kumo explainability

PQL Documentation

Learn the Predictive Query Language — SQL-like syntax for defining any prediction task in 2–3 lines.

Read docs

Python SDK

Integrate Kumo predictions into your pipelines. Train, evaluate, and deploy models programmatically.

Read docs

Explainability Docs

Understand feature attributions, model evaluation metrics, and how to build trust with stakeholders.

Read docs

Frequently asked questions

Common questions about property valuation

How accurate are AI-based property valuations?

The best graph-based AVMs achieve 3-4% median absolute percentage error (MdAPE) in data-rich markets with frequent transactions. This compares to 5-8% for traditional comp-based AVMs and 4-6% for XGBoost AVMs. Accuracy varies by market: dense urban areas with high transaction volumes see 2-3% MdAPE, while rural areas with thin transaction data may be 8-12%. The key is how much relational context (amenities, transit, school data) is available.

Can AI property valuation replace human appraisals?

For portfolio monitoring, refinancing, and initial screening, yes. For purchase transactions and regulatory compliance, not yet in most jurisdictions. GSE guidelines still require human appraisals for many loan types. The practical path is: AI for initial valuation and screening (fast, cheap, scalable), human appraisals for high-stakes decisions where regulatory or contractual requirements demand them. Many lenders use AI valuations for 60-70% of their portfolio monitoring and reserve human appraisals for the remainder.

How does AI valuation handle properties with no recent comparable sales?

This is where graph-based models shine. In thin markets with few comps, traditional AVMs fail because they cannot find similar recent sales. Graph-based models value the property based on its neighborhood network context: amenity proximity, school district quality, transit access, and market momentum. Even without a direct comp, the model infers value from the property's position in the graph. Accuracy in thin markets is typically 5-7% vs 10-15% for comp-based methods.

What data sources improve property valuation accuracy?

Beyond MLS transaction data, the highest-value additions are: school quality data (explains 15-20% of residential value variance), transit proximity and ridership data, business license data (new restaurants, retail), permit data (renovations, new construction), and demographic trends (population growth, income shifts). Most of this data is publicly available but fragmented across dozens of sources. The data integration challenge is often harder than the modeling challenge.

How quickly does AI valuation adapt to market changes?

Graph-based models update valuations as new data arrives: transactions, market indicators, amenity changes. In a rapidly shifting market (interest rate changes, new transit opening), the model adapts within 2-4 weeks of sufficient transaction data reflecting the new conditions. Traditional comp-based methods need 3-6 months of post-change transaction data. This faster adaptation is critical for lenders managing portfolios in volatile markets.

Bottom line: A portfolio lender with $10B in real estate exposure prevents $50M in annual losses by reducing valuation error 2%. Kumo's real estate graph captures amenity impacts, school district effects, and market momentum that radius-based comp models miss.

Related use cases

Explore more real estate use cases

Use Case #2Tenant Churn PredictionLearn more

Use Case #3Investment ScoringLearn more

Use Case #4Vacancy PredictionLearn more

Next#2 Tenant Churn Prediction

Topics covered

property valuation AIautomated valuation modelreal estate pricing MLAVM machine learningproperty price predictionKumoRFM real estatehome value estimationcommercial property valuation

From a leadership team with proven experience

Vanja Josifovski

CEO and Co-Founder, ex-CTO Airbnb, ex-CTO Pinterest

Jure Leskovec

Co-Founder & Chief Scientist, Stanford Professor

Hema Raghavan

Co-Founder & Head of Engineering, ex-AI Lead, LinkedIn

One Platform. One Model. Infinite Predictions.

KumoRFM

Relational Foundation Model

Turn structured relational data into predictions in seconds. KumoRFM delivers zero-shot predictions that rival months of traditional data science. No training, feature engineering, or infrastructure required. Just connect your data and start predicting.

For critical use cases, fine-tune KumoRFM on your data using the Kumo platform and Research Agent for 30%+ higher accuracy than traditional models.

Book a demo and get a free trial of the full platform: research agent, fine-tune capabilities, and forward-deployed engineer support.

Book a Demo Try Free