I. Introduction
The coronavirus disease (COVID-19) pandemic brought an unparalleled opportunity for spatial analysis. More than ever people are creating maps to document the space-time diffusion of COVID-19 pandemic.
Since the 17th century, disease mapping has been considered as a vital tool in tracking and combating disease diffusion. When computerised geographic information systems were born, the possibilities for analysing, visualising and detecting patterns of disease dramatically increased (Kamel Boulos & Geraghty, 2020). And, now, we have seen a revolution in health geography through Web-based tools, which further expanded these technical capacities (Kamel Boulos & Geraghty, 2020).
However, despite the recent technical advances in Geographic Information Systems (GIS) and spatial statistics, spatial analysis of epidemiological surveillance data is affected by a number of challenges, especially when dealing with near-real-time data of an emergent disease. This paper aims to summarize the key challenges for spatial analysis of COVID-19 pandemic and to discuss possible solutions, based on the evidence and practices available in the first months of COVID-19 pandemic (from December 2019 until June 2020). Note that, although these challenges are interconnected, for convenience reasons, this paper treats them as separate issues.
II. Challenges and perspectives of spatial analysis: the case of COVID-19 pandemic
Challenge nº 1: Protection of geoprivacy
Safeguarding patient privacy while preserving the spatial resolution required for spatial analysis and cluster detection is a major challenge in disease mapping. Location and health data from patients are considered identifiable and personal information and, therefore, are subject to the General Data Protection Regulation 2016/679 (GDPR), a legislation that aims to provide control to individuals over their personal information. Similarly, mobile phone geolocation data - increasingly used to track human movement and access the efficacy of lockdown measures - is under the same rules and poses even more data protection questions.
In response, a variety of methods have been proposed to mask patients’ geolocation (Chen et al., 2017). For example, the most common geomasking method is to spatially aggregate data, as done by the Directorate-General of Health (Direção-Geral da Saúde, DGS), which therefore only discloses data at the municipal level. Moreover, the DGS applies an additional restriction by omitting municipalities with less than three cases, which makes spatial analysis challenging and introduces a new layer of uncertainty. While disaggregating data is nearly impossible, dealing with the second problem could be achieved by using, for instance, imputation methods for masked count data.
Challenge nº 2: Low spatial resolution and high geographical uncertainty
In Portugal, so far, individual-level data on COVID-19 cases are publicly disclosed by sub-regions (Nomenclature of territorial units for statistics - NUT, 3) and aggregated data (counts) is accessible by municipality. Using NUT 3, and even municipalities, as spatial units of analysis may conceal local outbreaks and important socioeconomic and biophysical variation.
Therefore, COVID-19 spatial analysis derived from aggregated data may be affected by the Modifiable Areal Unit Problem (MAUP) (Openshaw & Taylor, 1979), which happens when the number of spatial units (the scale) used to define the same area affects the study conclusions, namely geographical patterns and the magnitude of the associations. If the geographical units are large, is more likely that associations found at the aggregate level will diverge from the same associations found at individual level leading to the so-called ecological fallacy (Aikins & Ribeiro, 2020). Choosing the ideal spatial resolution for a particular investigation is difficult. Thus, analysing the same data using multiple geographical scales is a way of assessing the potential impact of MAUP.
A second issue is the Uncertain Geographic Context Problem (UGCoP). Case data is available according to the patients’ municipality of occurrence, but focusing only on occurrence location can introduce substantial uncertainty in research results because people may spend a considerable amount of time in other municipalities and may acquire the disease in these locations (e.g. work, transportation, etc.) (Ribeiro, 2018).
Finally, a common problem in spatial analyses of rare events (e.g. diseases) is the well-known Problem of Small Numbers that is related to statistical instability when calculating rates in areas with low population and few cases, leading to random fluctuation and unreliable rates (Pina, Alves, Ribeiro, & Olhero, 2010). Spatial smoothing methods and the calculation of uncertainty intervals are widely used solutions for the problem.
Challenge nº 3: Lack of completeness and representativeness of patient and covariate data
Case data only includes confirmed cases. Confirmed case counts are not enough to comprehend the true magnitude of the COVID-19 pandemic. Although the true number of undetected cases is still to be ascertained, in Europe, the ratio of the total estimated cases to the observed cases was found to around 2.3 (Böhning, Rocchetti, Maruotti, & Holling, 2020). Compiling datasets that include suspected, probable, and negative test counts could substantially improve our understanding of COVID-19 space-time dynamics (Desjardins, Hohl, & Delmelle, 2020). Data on deaths may be subject to the same issues, as we may be missing deaths among persons infected with SARS-CoV-2, but who were not diagnosed with COVID-19.
For a better understanding of the space-time dynamics, data on population mobility and social networks are increasingly used. In the absence of universal, full-coverage datasets (at least in Portugal and other European nations), population mobility and social networks are being tracked using anonymized phone location data. Yet, these datasets may be prone to data completeness and representativeness limitations too. Previous research demonstrated that mobile phone users and social media users are disproportionally distributed according to age, gender, and geography (Wesolowski, Eagle, Noor, Snow, & Buckee, 2012) .
Finally, the past couple of months we assisted to an exponential increase in web-based surveys to extract data on COVID-19 pandemic. Web-based voluntary recruitment introduces important selection bias, firstly by excluding people not on the internet and secondly by introducing self-selection bias. For instance, the overrepresentation of women and highly educated individuals are common problems in this type of study recruitment strategy (Rossi et al., 2020). In these cases, weighting adjustments can reduce bias due to lack of population representativeness.
Challenge nº 4: Geographical comparisons may be affected by different sources of bias
Differences in the availability and practice of SARS-CoV-2 testing may contribute to spatial disparities in COVID-19 incidence across territories. In addition, if screening practices change through time, we can observe sudden incidence increases in certain regions simply due to increased screening. This means that estimates of incidence, case-fatality rates, and trends in incidence at country, regional and municipality level might not be directly comparable across jurisdictions.
For instance, as advanced by a recent study on COVID-19 spatiotemporal diffusion in the US (Desjardins et al., 2020), the state of New York has a testing rate of 4.9 tests per 1000 population, three times higher than the national average. This high level of testing contributed to a better ascertainment of cases and it partially explains why this state presents a high cumulative incidence of COVID-19.
Similarly, differences in the numbers of deaths might reflect geographical inequalities in testing and disease coding practices, but they can reflect differences in population age-structure. Jurisdictions with older populations will necessarily have a higher number of deaths, which is neither novel nor unexpected. Thus, crude case-fatality and mortality rates cannot be directly compared. To avoid misleading conclusions, age-specific rates should be used instead, or one should calculate age-standardized rates, which denote the number of events that would have been expected if the jurisdictions being compared had similar age distribution.
An ecological analysis is defined as the assessment of the associations between disease incidence and variables of interest (e.g. social or environmental covariates) and it is usually the goal of many spatial analyses. Due to the presence of spatial autocorrelation (i.e. higher similarity of closer units) and due to the general lack of aggregated data on health determinants, these studies are particularly prone to bias. Spatial models that account for the spatial structure of the data are therefore required to correctly estimate the effects of these socioeconomic and environmental correlates (Pina et al., 2010).
Challenge nº 5: True interdisciplinarity is still missing
Research around COVID-19 is not limited to the health and biological sciences, but it has attracted scientists from various fields including geographers. Nonetheless, very few research projects integrating the health (e.g. public health) and social and earth sciences (e.g. geography) have been conducted.
Interdisciplinary work is widely recognized as a breeding ground for innovation and it is, possibly, the only way of understanding complex problems, such as the COVID-19 pandemic. Health researchers, who hold vast knowledge on the biological mechanisms of disease transmission and health surveillance systems, need to team up with geographers (and other scientists) who are widely known for bridging social sciences and natural sciences, for their proficiency in GIS that could be used to track contagion, and for their understanding of human-environment relationships. But the same applies to geographers, who should get familiar with medical and biological terms, epidemiological research methods, causality frameworks, epidemic models, and so on.
As two heads are better than one, not because either is infallible, but because they are unlikely to go wrong in the same direction (C. S. Lewis, 1898-1963), the current COVID-19 should be used to leverage true interdisciplinarity research.
III. Conclusion
Spatial analysis tools help monitor and manage public health. When conducting these analyses, it is crucial to ensure patients’ privacy by masking their geolocation while also choosing the ideal spatial resolution to reduce stigma and geographic uncertainty. In the beginning of an outbreak or when dealing with rare cases of a disease, spatial smoothing methods and the calculation of uncertainty intervals should be considered to avoid random fluctuation and unreliable rates. Health geography can only mirror reality when the quality of the data is assured, therefore, the completeness and representativeness of data are essential to understand the space-time dynamics of a disease. Geographical comparisons can be useful to assess the evolution of the disease between and within areas; however, these comparisons can only be viable if possible sources of bias are taken into account. Thus, on the verge of a public health crisis the collaboration of biomedical, social, and natural sciences experts is essential for making faster and careful informed decisions.