Building Precise Maps with Disser

By Brandon Martin-Anderson 08 Apr 2014

Spatially aggregated statistics are pretty great, but what if you want more precision? Here at Conveyal we built a utility to help with that: aggregate-disser. Let me tell you how it works.

Let’s start with a classic aggregated data set - the block-level population counts from the US Census. Here’s a choropleth map of total population for blocks around lower Manhattan and Brooklyn. The darkest shapes contain about five thousand people.


One simple way to disaggregate population is to distribute aggregated units randomly around the aggregating geometry. Here’s the above map disaggregated into individual people:


It’s difficult to see the individual people, so let’s zoom in to the lower east side. The snow-angel shape near the middle is in the middle of the large planned Stuy Town development.


If the goal of disaggregation is to make a reasonable guess at the data in its pre-aggregated form, we’ve done an okay job. There’s an obvious flaw with this map, though. People aren’t evenly distributed over a block - they’re concentrated into residential buildings.

The PLUTO dataset released by the New York City Department of City Planning provides detailed information on every parcel in New York City, including total residential and commercial square footage. By combining the US Census block-level shapefile with the NYC PLUTO we can refine our disaggregation.

For example, say the PLUTO shapefile contains three buildings that overlap with some specific block in the Census shapefile. The PLUTO shapefile indicates that the three buildings contain 2000, 1000, and 0 residential square feet. Let’s say the census block contains 30 people. One building has ⅔ of the total square footage, so we give it ⅔ of the people, or 20 people. The next building has ⅓ of the square footage, so we give it ⅓, or 10 people. The last building gets no people, because it has no residential square footage.

To use the disaggregation tool to do this, we use the command in the form:

$ java -jar aggregate-disser.jar indicator_shapefile indicator_propname diss_shapefile diss_propname

For example, to disaggregate census blocks into pluto parcels we’d execute:

$ java -jar aggregate-disser.jar --discrete /path/to/data/census.shp POP10 /path/to/data/pluto.shp ResArea output.csv

Here’s this CSV opened with QGIS, centered on Manhattan’s lower eastside. Note that Stuy Town is now a large homogenous rectangle, because the PLUTO dataset counts all of Stuy Town as a single parcel, whereas the Census divides it into several blocks.


In the last command we used the switch “--discrete” to direct the disaggregator to output discrete points distributed throughout the aggregation shapes. Alternatively we could have omitted the switch to get a set of aggregation shape centroids, like so:

$ java -jar aggregate-disser.jar ./data/tabblock2010_36_pophu/tabblock2010_36_pophu.shp POP10 ./data/Manhattan/pluto.shp ResArea output.csv


Or we can use the “--shapefile” switch to get shapefile output.

$ java -jar -Xmx1024M aggregate-disser.jar --shapefile ./data/tabblock2010_36_pophu/tabblock2010_36_pophu.shp POP10 ./data/Manhattan/pluto.shp ResArea plutoout.shp


This raises an interesting possibility. New York City provides a shapefile of every building outline in all five boroughs. Each shape doesn’t have much metadata, but we can use it to further refine the census data already refined by the PLUTO parcels.

$ java -jar aggregate-disser.jar --discrete ./data/plutoout.shp mag ./data/Building_Perimeter_Outlines/nycbuildings.shp SHAPE_AREA buildingsout.csv


Finally the buildings in Stuy Town gain some definition! Through this process we now have a reasonably precise estimate of the home location of every person in Manhattan.

Mapping Jobs using LODES data

We can apply a similar process to establish the location of jobs as reported by Census through the LODES dataset. By merging LODES CSVs into a block shapefile we can produce a choropleth of jobs. For this map we shift north to midtown and the south end of Central Park, centered on Columbus Circle. The darkest polygons have about twenty thousand(!) jobs.


PLUTO parcels are have properties corresponding to total commercial, factory, retail, and office space. Using disser, it’s possible to use the sum of several properties as the indicator magnitude:

$ java -jar ./build/libs/aggregate-disser.jar --shapefile ./data/lodes_blocks_projected.shp total ./data/Manhattan/projected_mh.shp ComArea+OfficeArea+RetailArea+FactryArea lodes_parcels.shp


From there we repeat the familiar process of disaggregating into individual buildings:


Putting the disaggregated Census and LODES datasets together, we can make a map of every job, and every bed, in Manhattan. In the following map blue dots represent jobs, and red dots represent residential beds.