Data "Completeness"

 

Comparing Completeness

In continuing our investigations of data quality, this week we look at data "completeness". How can we ensure that the information we are given is not just accurate but actually covers the areas we need? Going out to the field, making measurements, and taking the records ourselves would be time-consuming and exorbitantly expensive. Instead, we can compare our dataset with a higher-quality dataset. 

For this project, we were tasked with comparing the completeness of Jackson County, Oregon's road network to the federally recognized network. The US Census Bureau's TIGER (Topologically Integrated Geographic Encoding and Referencing) linefiles are a reliable dataset, which we will use as the control variable in this analysis. The method for this analysis is in reference to Haklay's 2010 study, "How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance survey datasets".


         
County road network versus TIGER linefiles


Each road network is clipped to an overlaying grid, where these smaller areas are then calculated for total road length. Each grid cell can then be used to compare "completeness"; the network with more road length recorded is considered the more complete measurement. The differences in length between each dataset is calculated as a percentage and classified into ranges. Areas of blue-purple represent the locally sourced road network, the orange-brown indicates the TIGER linefiles, and the neutral shade indicates similarity between the two datasets.

Click for larger resolution

Here is a simplified tabular representation of the two datasets. As presented, 151 grids report more total lengths through the TIGER linefiles. Ten percent of the study area records similar completeness for each network. However, there are large areas where the local mapping presents more complete coverage. 


Further analysis is needed to explain this discrepancy. Due to the location of cities within the county, my prediction would be that TIGER mapping procedures included recording the lengths of divided highways. In contrast, the county only records these roads as a single entity. 

Further Reading:

Haklay, M. (2010). How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703. https://doi.org/10.1068/b35097

Pairwise Intersect Tool from Esri

US Census TIGER/Line Shapefiles


Comments

Popular posts from this blog

Coastal Flooding & Storm Surge Analysis

Welcome to GIS 5007 Spring 2023

Introduction to Python