Overhead imagery training data quality control: Methods for deep feature label anomaly detection

Spatial analysis of large remotely-sensed imagery (RSI) training datasets for within-class variation and between-class separability is key to uncovering issues of data diversity and potential bias, not just when vetting datasets for usage, but also during the actual dataset creation stage. Project managers of complex imagery annotation campaigns have a largely unaddressed need for tools that continuously monitor for data labeling anomalies which may be due to human bias or error. This presentation outlines a deep-feature change detection approach using Geospatial Fréchet Distance (GFD) for automatically measuring significant regional changes in image label appearance (i.e., within-class variance). An experimental setup is designed to test GFD’s spatial anomaly detection capabilities on the xBd dataset, a multi-class satellite imagery dataset for disaster damage classification with labels created by volunteer imagery annotators.  Requirements for follow-on analysis of regional subsets of dataset image labels will be discussed, pointing the way forward for more detailed investigation of causes of labeling errors.