Collection of Open Source GIScience Projects
This week, we considered the question of how to manage error and uncertainty in geographic analyses. I have had experience with considering uncertainty in my geography classes at Middlebury. For instance, I enjoyed the section in the Longley et al. (2005) reading about confusion matrices, which I recognized from the Remote Sensing and Land Use course I took during the Fall 2020 semester. In Remote Sensing and in this course we have experienced comparing our results to similar outcomes (in Remote Sensing, we compared tree cover classifications to those of the Hansen Tree Cover Loss Dataset, and in this course we compared our hospital catchments to those of the Dartmouth Health Atlas). I also have externally validated Origin-Destination matrix results with Google Maps outputs and realized that the answers generated by my own analyses have been slightly off, which got me thinking about the concept of datasets with different lineages resulting in unsuspected errors. I also considered uncertainty and errors when compiling land cover raster data for my thesis. Due to the fact that most of my data came from the Vermont Open Geodata Portal (a vetted data provider) and because I truthfully feel unprepared to rigorously evaluate data for errors, I more or less assumed that mistakes in these data would be minimal and did not validate beyond performing a general visual “validation” of the land cover classifications in the dataset using aerial imagery on Google Earth Engine.
Figure 6.1 in the reading for this week reminded me of conversations we had in my cartography class about the importance of recognizing the power of maps as a representational tool. They are subject to distortion based on the conceptions and views held by the maker, and such subjectivity must be acknowledged by the map viewer (and hopefully by the cartographer as well). These human preconceptions determine how a phenomenon is measured and represented. This is another reason why the external review process is so important and should include not only a reproduction or replication of results but also a consideration of what might be distorting the analysis. Geographic researchers should be open to considering their own implicit biases and working to minimize their effect on the work at hand.
I appreciated that the reading mentioned the importance of acknowledging that errors are to be expected in GIS analyses. The idea of supplying caveats in reported results is necessary, but I could also see how this would be a fine line to walk, especially in the current context of academic pressure to publish significant and “confident” results. In addition, when there are numerous caveats that correspond to a given finding, it must be difficult to make “advancements” in the field of geography without a great deal of replication (which, of course, links back to the core concept of replicability and reproducibility as core tenets of good science).
Ultimately, it is necessary to acknowledge uncertainty, especially when you are publishing work that others might rely on or build on. Contained within that necessity, therefore, is the responsibility to have some idea of the scope of the impacts of error on output Longley et al. (2005) and the need to assess the quality of data. Analyzing the distributions of data (overall and within units of analysis), checking against similar data, analyzing the impacts of scale on results, and other such considerations are important to include as checks and/or caveats in a final analysis. Learning methods of doing internal and external validation should be a core aspect of geographic education, especially for those going on to do research. I really appreciated learning about confusion matrices in Remote Sensing, as without this form of validating an output, it feels as though a researcher is putting blind trust into the output of their analysis.
Readings:
Longley, P. A., M. F. Goodchild, D. J. Maguire, and D. W. Rhind. 2008. Geographical information systems and science 2nd ed. Chichester: Wiley. Print.