What I Discovered About Opportunity Zones From Analyzing Half a Million Data Points

There has been a lot of buzz about Opportunity Zones recently and understandably so; it is the newest federal effort to create long-term investments in low-income urban and rural census tract areas. Once designated as a qualified Opportunity Zone, these places are able to receive investments through Opportunity Funds, which are created specifically to invest in these areas.

However, despite the existence of 8,700 total Opportunity Zones spanning 12% of all census tracts in the United States, there has been surprisingly little analysis to identify ‘ideal’ areas to invest in. Many articles published to date consist of top 10 lists recommending the major metropolitan areas in the US anyone could probably guess, which seems a disservice to the massive amount of areas elected as qualified Opportunity Zones.
My analysis sought to develop a rank system for all Opportunity Zones based on 5 economic indicators, and see if there were areas of above-average growth not commonly discussed. I believe the results showed that there are many Opportunity Zones across the US that ranked highly across all 5 indicators, many of those actually not being located on the coasts.

In aggregate, we found that areas in the Northeast and Southern regions in the US had the best performing overall areas, while pockets in the Midwest and Western areas showed significantly better performance vs. other similarly grouped areas. We also found that Unemployment and Median Value of Owner-Occupied Homes were the biggest differentiators in our dataset, with ranking deviations almost double of the other 3 indicators examined.

Method to the Madness

From the start I applied a more analytical approach to examining the Opportunity Zones and determine if there were potentially highly desirable investment areas that had flown under the radar. To start, I elected to pull 5-year (2011–2016) estimates from 5 datasets found via the US Census Bureau:

Total Population
% of Unemployment for the population 16 and over
Median Value of Owner-Occupied Homes
Median Contract Rent
Median Household Income (HHI) in the past 12 months

For this analysis the importance of each dataset was unweighted, meaning that all 5 rankings per census tract matter equally in the final ‘rank’ for each Opportunity Zone. Weighting the attributes and/or adjusting those included may result in profoundly different outcomes. Based on the attributes examined, there are indeed many opportunity zones showing favorable trends across these attributes, which can exist across the entire United States.

The Results

After pulling the data I assigned a rank to every Opportunity Zone per attribute, 8,700 being the ‘best’, 1 being the ‘worst’. Once each Opportunity Zone had ranks for all 5 attributes I averaged all 5 ranks into a ‘Total Average Rank’ for each Opportunity Zone:

Screen Shot 2019-01-08 at 5.27.15 PM

After doing this for all the Opportunity Zones, I wanted to see how it would look on a map. Since census tracts are really small geographically, I aggregated the results up to US counties, which reduced the dataset to a seemingly more reasonable 2,000 or so points. The result:

Screen Shot 2019-01-15 at 11.44.40 AM

Looking at this visualization we can see……well, nothing. There is too much going on to yield meaningful insights from this map. Looking at this also made me question the lack of regionality taken into account — the ‘best’ census tracts are likely influenced by economic trends in neighboring regions over the 5 year timeframe.

Based on this I created code to automatically group together census tracts in similar areas. K-means clustering is a great method to achieve just this and after reading Carl Anderson’s post on weighted K-means analysis, I adapted it to my use case. My goal was to use each Opportunity Zone’s ‘Total Average Rank’ in order to influence where the center of each cluster would be drawn on the map.

To explain using Carl’s example, if there are 4 points on a map and one has a much higher ‘Total Average Rank’ than the others, the new point created will lean toward the highest ranked point:

Unweighted vs. Weighted

The last thing I needed to do was figure out how many ‘clusters’ I wanted to create. Determining the ‘correct’ number of clusters is still largely an art more than a science, relying on the person’s knowledge of the problem and qualitative insights to determine an ideal amount. I settled on 25 — it provided the best tradeoff of visual digestibility while also evenly segmenting the US into areas large enough to generate macro-level regional insights. Here’s a final output overlayed with a Voronoi visualization to better ‘show’ how the regions were divided up:

This map illustrates 2 things:

How the 8700 Opportunity Zones were divided up into 25 regions
The center points in each region weighted towards concentrations of highly ranked Opportunity Zones

To provide more context, here are the final breakdowns of how each cluster scored overall, higher ranks being ‘best’:

Screen Shot 2019-01-08 at 6.12.20 PM

There were many interesting insights from the analysis, summarized as follows:

At the highest level, the Northeast and Southern regions (as defined by the US Census) had the best performing overall areas (4,683 average), with the Midwest and West area ranks averaging around 4,216
Despite lower regional averages, the Midwest and West claim the top 3 best ranked clusters on the map (10/4/5), with average ranks almost twice as high as the lowest performing cluster areas (15/18)
The most significant attribute affecting rankings appears to be unemployment rate — Unemployment rates in Clusters 5 & 10 were 8.56% and 6.65%,respectively, almost half the average for all 25 clusters (13.70%)
Average 5 year population increases were also the highest in Clusters 4 & 5 (0.46% and 0.58%) vs. the average 0.16% growth across all clusters
Cluster 4 also had the highest median value of owner-occupied homes at 1.10%, which is more impressive given that 13 clusters all had negative values for appreciation, the lowest being Cluster 18 (-2.22%)
Median HHI was fastest growing in Cluster 10 at 1.02%, easily topping the next best cluster (cluster 9 in the south at 0.58%) and almost 4x the average for all clusters (0.26%)
Median contracted rent saw the highest increases in Cluster 4 (1.53%), closely followed by Clusters 8/10/9/11 (all between 1.32–1.22%).
Overlaying the findings with other datasets like major US cities provides perspective as to the degree in which major metropolitan cities influence the ‘center’ of each region. For example, Cluster 6’s center falls almost directly on top of Phoenix despite large surrounding areas eligible for Opportunity Zones.

Ozone regions(left) vs cluster w/ major city pops in orange (right)

There also appear to be several strategies for how states chose to designate Opportunity Zones. For example, the chart below shows the count of Opportunity Zones in each state (Y-axis) along with the ‘Total Average Rank’ of each (X-axis). Half of the Opportunity Zones were held within 8 states (CA/NY/TX/FL/IL/OH/PA/MI), and these Zones tended to be highly concentrated around major cities within these states. Midwestern states in the US tended to elect much fewer Opportunity Zones, although the area of the Zones tended to span much larger areas vs. their coastal counterparts. The Southeast area tended to elect a high amount of areas across their states in both urban and non-urban areas.

States like NY also had a surprisingly large concentration of high-rated Zones, seen by the skew in the graph below. Overall, ND and WY contained the highest rated zones of any state, with states like IL and NV having the lowest ranked concentration.

Screen Shot 2019-01-15 at 11.49.54 AM

... More Data and Maps?!

As a final note, this article was intentionally brief on the technical steps taken to generate the final dataset. Steps such as managing missing data in the datasets (imputation), creating ranks, cluster aggregation, etc., were not discussed but were major components in arriving to the final analysis. If there is enough interest I would be happy to dive into the mechanics in a future post. In the meantime, I have included an interactive ARCGIS map below for you to further explore the clusters along with other datasets to provide a jumping point for further analysis!

ARCGIS MAP WITH VORONOI AND LAYERS