Emma Clinton's Portfolio

Collection of Open Source GIScience Projects


Project maintained by emmaclinton Hosted on GitHub Pages — Theme by mattgraham

Managing Error and Uncertainty

Summarize the analytical techniques applied and how the results of those techniques were communicated in text, numbers, tables, or data visualizations

Wang et al. (2016) attempted to use wildfire-related Twitter data to analyze the power of social media data to “reveal situational awareness.” As wildfire managers continue to incorporate various new methods of wildfire detection into their protocols, the inclusion of human-centric data regarding human activity and information sharing and use may be a useful inclusion. This study analyzes spatial and temporal patterns of wildfire-related Twitter activity and attempts to characterize wildfire based on tweet content and parse out the role of opinion leaders in these crises (Wang et al., 2016).

Using R, the tweet text was mined for fire-related and spatial keywords, and these results were “cleaned.” Kernel density estimation (KDE) was used to detect “hotspots” of spatial tweet data, and Dual KDE was performed to include the influence of population. The frequency of the terms was compared using k-means clustering to determine terms that appeared frequently within the same document. A social analysis was also carried out to assess retweet activity (also done in R, but the authors are fairly unclear regarding how this was done).

The temporal evolution and spatial characteristics of wildfire tweets was compared with the timeline and impact areas of the wildfire event itself. The results of the text mining analysis and dual KDE information is displayed in bar graph form and using a heat map, with the fire temporal and spatial information conveyed using a table. The results of the term frequency analysis are also displayed in graph form and in a table. The social analysis “hub” and “retweeter” analysis is represented using line graphs and a sort of hub-and-spoke model (which is pretty effective).

Consider whether you consider this research paper to be reproducible and whether you consider this paper to be replicable. Refer to the National Academies of Sciences, Engineering and Medicine definitions of reproducibility and replicability and our prior discussions about GIS as a Science.

As most people are able to get a Twitter Developer account, the keyword filtration aspect of the analysis seems fairly reproducible, unless certain information has been removed from the site. The methods are moderately well-documented. The kernel density estimation (KDE) could be more explicit about what is meant by “intensity level” in the creation of the raster map, as this seems to be an integral part of the methods. In addition, the temporal aspect is unclear—I assume they made a single raster map of population and a single raster map of “intensity,” but some clarity on this would be helpful. In addition, for the text mining step of converting a word to its “base form,” perhaps a table of values and what they were eventually converted to would be useful for reproducibility. They seem to rely on referencing studies that they used to design their methods to explain their methods, but this makes reproducing or replicating their work that much more difficult and raises questions about access to the works referenced in this paper. I’m assuming also that this was all done in R, but there is little documentation about the k-means clustering methodology and software. For fires where coordinate information is not available, it is unclear how the fire’s locations were “inferred.” The social network analysis R package (“igraph”) is referenced but the methods regarding how the figures for this aspect of the analysis were generated are sparse. Also, data breaks in the dual KDE analysis would be useful for replication, and there is little information about the clusters that the authors chose to display and how specifically they were generated or coded for. Overall this study seems to be a good framework for some form of replication (although it wouldn’t be very exact), but the holes in the methods section make reproduction potential fairly low, especially without the R code, which makes it impossible to check the original study.

Readings:

Wang, Z., X. Ye, and M. H. Tsou. 2016. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards 83 (1):523–540. https://github.com/GIS4DEV/literature/blob/master/Spatial%20%2C%20temporal%20%2C%20and%20content%20analysis%20of%20Twitter.pdf

Main Page