Urban Mobility Index

Python TensorFlow/Keras GIS Walkscore API scikit-learn Walkability

Spring 2022
Exploring Urban Data with Machine Learning
Columbia GSAPP
Prof. Boyeong Hong

Team:
Kirthi Balakrishnan,
Kit Nga Chou,
Lizzie Lee,
Michelle Chen

An ML experiment predicting walkability from bus-stop and intersection densities across 774 neighborhoods in nine cities - and an honest account of where it works (the cities it trained on) and where it doesn't (the ones it hasn't seen).

Explore the project →

Scroll

Approach

Walkability for the neighborhoods Walkscore never rated

Method

Machine Learning
Walkscore API
Street Network Analysis

584 training neighborhoods
190 held out, 3 cities

Walkscore is a great metric - when it exists. For neighborhoods Walkscore.com hasn't pre-computed, planners are stuck. We trained an ML model that predicts a walkability score from street network shape and transit density, so any neighborhood can have one.

Does It Travel?

Trained on six cities, tested on three it never saw

Every dot is one of the 190 held-out neighborhoods: its real Walk Score against the model's prediction, refit from the project's committed data. On the cities it trained on the model explains about 40% of the variance; here the dots barely climb with the real score. Fit is not transfer.

≈0.4

R² on the six cities it trained on

≈0

R² on three cities it never saw

190

Neighborhoods held out

Data Pipeline

Reverse-engineering Walkscore.com's methodology

Three inputs feed the model: road network screenshots from Google Maps (classified with Keras), bus stop locations from the Overpass API, and intersection node counts pulled from OpenStreetMap. We trained on six cities - Boulder, Ann Arbor, Chicago, Washington DC, New York, and San Francisco - and validated on three more: Madison, Seattle, and Tulsa.

Intersection density heatmaps and node maps for Washington DC, New York City, and San Francisco

DENSITY MAPS

Intersection density heatmaps (top) and raw node extractions (bottom) for three training cities.

INTERSECTION NODES

OpenStreetMap nodes extracted and their densities calculated per neighborhood.

01Road-network snapshots. Google Maps screenshots per neighborhood, decoded with a custom polygon reader and classified with Keras.
02Transit density. Bus stops from OpenStreetMap's Overpass API, counted per square kilometer and per 1,000 residents.
03Street connectivity. Intersection nodes extracted from OSM street networks, densified the same two ways.
04Ground truth. Walk Scores scraped for all 584 training neighborhoods; a linear regression maps the densities onto the score.

The Tool

INTERACTIVE TOOL

Input any US address to get a predicted walkability score with feature importance breakdown.

What the Model Learned

And what it couldn't

The regression uses four predictors: bus stops and intersections, each per square kilometer and per 1,000 residents. On the neighborhoods it was fit on, those densities explain roughly 40% of the variation in Walk Score (R-squared 0.38 as originally reported; 0.42 when refit from the project's committed data) with a typical miss of about 17 points on the 0-100 scale. In plain terms: the model gets the broad strokes of a familiar city and misses the fine grain. The scatter above is the harder test - cities the model never trained on - and there the densities alone stop working. One supported detail from the committed data: bus-stop density carries more predictive signal on its own than intersection density does.

Separately, we ran an exploratory clustering pass on the same features - K-Means, Agglomerative, and Gaussian Mixture - to see whether neighborhoods group into recognizable urban types. Gaussian Mixture drew the most realistic boundaries.

K-MEANS

K-Means - clean, fast, but oversimplifies the messier urban cores.

AGGLOMERATIVE

Agglomerative - picks up nested structure but produces some lopsided clusters.

GAUSSIAN MIXTURE

Gaussian Mixture wins. The boundaries actually match what you see on the ground.

DATASET

Prepared dataset 584 neighborhoods with Walk Score, area, population, and bus/intersection densities per sqkm and per 1000 capita.

Limitations & Next Steps

The training cities skew dense and coastal - and the held-out test shows what that costs: a model that explains 40% of walkability at home explains almost none of it in Madison, Seattle, or Tulsa. Bus stops and intersections are part of what makes a place walkable, not the recipe. To get serious, the next version needs more cities, more input variables (block size, street trees, sidewalk width), and a usable web frontend so a planner can paste an address and get a number back.

Team: Kirthi Balakrishnan, Kit Nga Chou, Lizzie Lee, Michelle Chen

Course by Professor Boyeong Hong