Article Text

Original research
Systematic review of best practices for GPS data usage, processing, and linkage in health, exposure science and environmental context research
  1. Amber L Pearson1,
  2. Calvin Tribby2,
  3. Catherine D Brown3,
  4. Jiue-An Yang2,
  5. Karin Pfeiffer4,
  6. Marta M Jankowska2
  1. 1CS Mott Department of Public Health, Michigan State University, Flint, MI, USA
  2. 2Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, USA
  3. 3Department of Geography, Environment and Spatial Sciences, Michigan State University, East Lansing, Michigan, USA
  4. 4Department of Kinesiology, Michigan State University, East Lansing, Michigan, USA
  1. Correspondence to Dr Amber L Pearson; apearson{at}msu.edu

Abstract

Global Positioning System (GPS) technology is increasingly used in health research to capture individual mobility and contextual and environmental exposures. However, the tools, techniques and decisions for using GPS data vary from study to study, making comparisons and reproducibility challenging.

Objectives The objectives of this systematic review were to (1) identify best practices for GPS data collection and processing; (2) quantify reporting of best practices in published studies; and (3) discuss examples found in reviewed manuscripts that future researchers may employ for reporting GPS data usage, processing and linkage of GPS data in health studies.

Design A systematic review.

Data sources Electronic databases searched (24 October 2023) were PubMed, Scopus and Web of Science (PROSPERO ID: CRD42022322166).

Eligibility criteria Included peer-reviewed studies published in English met at least one of the criteria: (1) protocols involving GPS for exposure/context and human health research purposes and containing empirical data; (2) linkage of GPS data to other data intended for research on contextual influences on health; (3) associations between GPS-measured mobility or exposures and health; (4) derived variable methods using GPS data in health research; or (5) comparison of GPS tracking with other methods (eg, travel diary).

Data extraction and synthesis We examined 157 manuscripts for reporting of best practices including wear time, sampling frequency, data validity, noise/signal loss and data linkage to assess risk of bias.

Results We found that 6% of the studies did not disclose the GPS device model used, only 12.1% reported the per cent of GPS data lost by signal loss, only 15.7% reported the per cent of GPS data considered to be noise and only 68.2% reported the inclusion criteria for their data.

Conclusions Our recommendations for reporting on GPS usage, processing and linkage may be transferrable to other geospatial devices, with the hope of promoting transparency and reproducibility in this research.

PROSPERO registration number CRD42022322166.

  • imputation
  • geographic information system
  • accelerometer
  • built environment
  • mobility
  • exposome

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information. All data are available in online supplemental file 2.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • This systematic review used standard database search to find articles as well as citation assessment of review articles to encompass a comprehensive set of articles.

  • Article types included association focused, methodological development, feasibility studies and tracking tool comparisons providing a broad scope of Global Positioning System (GPS) applications in human health research.

  • We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses reporting guidelines to report on results.

  • We did not consider the necessity of differential reporting for certain health outcomes or subpopulations studied.

  • A potential limitation of this review was the omission of studies using mobile phone apps to collect GPS data.

Introduction

Global Positioning System (GPS) devices are increasingly used in health research to quantify health risks or negative exposures (eg, air pollutants, exposure to fast-food restaurants) and positive exposures or outcomes (eg, green spaces, outdoor physical activity), often embedding GPS data in environmental and contextual features. Applications of using GPS to track participants in air pollution, physical activity, active living, drug and alcohol addiction, obesity and exposomics research are being developed.1–4 Inclusion of mobility in exposure research offers the opportunity to expand our understanding of how movement through space and time contributes to healthy and unhealthy exposures. The use of GPS has emerged as a more accurate and specific measure of individual mobility and results in ‘dynamic’ exposures as compared with ‘static’ exposure measures. Static measures are commonly taken from a home or administrative unit at one point in time,5 6 leading to a ‘stationary bias’.7 GPS measurement of mobility should theoretically bring about a better alignment of dose and response relationships between contextual exposures and health outcomes.8 However, increased application of GPS technology is ushering in new and varied study designs, data collection methods and analytical processing pipelines. This makes cross-study comparison and identification of emergent findings across the literature difficult.

In general, GPS-based health research aims to quantify risks and benefits of environmental and contextual features and typically involves deploying GPS devices to be worn for various lengths of time by participants. Crucial steps in this branch of research consist of the subsequent cleaning, processing and linking of GPS data to other measures, such as survey data, anthropometrics, physical activity data, neighbourhood characteristics or spatially explicit environmental exposures. Increasingly, multiple devices may be worn/used by participants and these data are linked to GPS data, including accelerometers, personal air pollution monitors and wearable cameras. At each stage in this research practice, decisions on data handling and processing are made which may influence the measurement of outcomes, risks or behaviours, and ultimately the study findings. Yet, little research discusses the impact of such decisions, how to report key decisions and/or evaluates best practices for these steps.

The significant variation in techniques used and methodological aspects reported in GPS-based health and exposure research makes building evidence consensus difficult. Few studies examine how differences in data collection methodology or data processing may affect relationships between health outcomes and exposure measures, although there are important exceptions (eg, ref 9). Still, through the process of collecting and analysing GPS data, researchers have several methodological choices which clearly impact the quality and completeness of data collected. For example, several aspects of GPS usage are directly controlled by the researcher, including the choice of GPS device, which has been shown to be important for measurement of location accuracy, consistency and duration,10 11 although most research-grade devices have similar performance when unobstructed.12 Similarly, participant instruction, compliance and length of GPS wear time have also been shown to be important factors in generating reliable and representative mobility data.13–15 Other aspects of GPS usage relate to nuisances with the technology that affect completeness and accuracy of the data, including positional accuracy, uncertainty and missing data. For example, one aspect of GPS data collection involves characterising the amount of noise in the data (ie, error in calculation of the device location due to the low number of satellites available or multipath errors where GPS signals are reflected off buildings). Noise may then be filtered and removed from the dataset by researchers, based on some acceptable positional, altitude or speed error thresholds. Besides noise removal, missing data can also be the result of signal loss, which may occur in similar scenarios as noise or due to errors in operating the GPS device. In such cases, the resulting dataset includes gaps in the time series. Some studies fill these gaps using an imputation method (eg, last known location up to a specified time limit), which has been shown to affect the linkage process and ultimately data loss.16 Yet, it is unknown how consistently spatial health and exposure research studies report these aspects of GPS usage and processing.

As the use of GPS devices in health and exposure research continues to increase, there is a considerable need to identify best methodological practices for data collection and processing. Without consistent methodological reporting, it will become impossible to gauge quality of studies and comparability of results. We define data collection and processing as the steps and procedures employed to collect GPS data, clean it and prepare it for analysis in human health research, but does not include applying data transformations for creation of new variables (eg, trip classification, time spent at home, etc). Identifying collection and processing best practices will aid authors in reporting, making amalgamation and meta-analysis of results easier, and will increase reproducibility across studies. In an effort to promote transparency, replicability and rigour in studies using GPS devices worn by individuals to study human health (mirroring recent advances in physical activity17 and life course epidemiology18 research) this systematic literature review aimed to: (1) identify and review best practices for GPS data collection and processing; (2) quantify reporting of GPS data best practice elements in published studies; and (3) discuss best practice applications with examples found in reviewed manuscripts that future researchers may employ for reporting GPS data usage, processing and linkage. Importantly, this systematic review is the first step in ultimately a two-step process aimed at first understanding the current state of best practices and reporting, and second, building research community consensus on which practices should be reported in research using GPS for the measurement of human mobility and/or exposure assessment for the purposes of health-related research. The focus of this review is the first step.

Methods

Best practice manuscripts

First eight best practice manuscripts were specifically selected due to familiarity with the literature by the two senior authors (first and last). When reviewing best practice manuscripts, themes for relevant considerations and issues related to GPS data usage, processing and linkage were extracted and tallied across the articles (n=8).9 12 13 19–23 Themes were discussed and agreed on by the senior authors based on their combined experience of 40+ years of GPS data collection and processing. Some best practice manuscripts included empirical data to showcase these issues, while others were primarily conceptual. For each theme, subelements or specific practices discussed in the best practice manuscripts were used as data extraction elements for the reviewed manuscripts. The subthemes were defined based on the stage at which they are employed in GPS data collection, processing or linkage. These practices then formed the basis of risk of bias assessment for the reviewed manuscripts.

We identified eight themes related to GPS usage and processing considerations among the best practice manuscripts (table 1). Under each theme, multiple ways of reporting this issue or decision/practice were found. For example, some manuscripts reported methods of imputation while others reported the percentage of GPS coordinates that were imputed. The most common themes were GPS data missingness and noise considerations (in 88% of manuscripts), followed by participant compliance and sampling frequency.

Table 1

Themes and specific best practices discussed as relevant for GPS data collection and processing, identified in best practice manuscripts (n=8)

The first theme identified in the best practice manuscripts was the model/brand of GPS device used. While research-grade devices appear to perform similarly in terms of accuracy when unobstructed, battery life, satellite information and fix time may vary between units.12 Thus, reporting the model/brand of the device may be useful to make comparisons across studies, and to evaluate the study protocol for wear time and participant compliance. The second theme was sampling frequency, or epoch, for capturing coordinates. The best practice manuscripts discussed the variety of sampling frequencies commonly observed (from 1 s to 5 s) and the influence this decision has on processing time and costs.20 23 Consideration of the study population (eg, children) may influence the sampling frequency selected and may also depend on the importance of fine precision in location detection.12 The third theme was wear time. Considerations include the research question of interest and the rarity of the behaviour or exposure under study.12 Yet, some claim that studying behaviours that occur in specific places (eg, physical activity in a park) or seasonal variation in behaviour may require much longer wear time20 than the typical GPS study (often 4 days which may or may not include a weekend day,13 or 7 days22). The fourth theme was GPS data missingness, which may be the result of signal loss, battery issues, non-compliance or memory storage capacity. One study reported missing data for around 70% of the total monitoring time21 and another reported over 17% missing for signal loss alone.19 The fifth theme was noise considerations, whereby signals are scattered by buildings of certain materials,21 or when indoors.13 Detection of noise in the GPS data may include filtering for unrealistic speed and acceleration values.12 The sixth theme was imputation, which entails estimation of coordinates for times with missing GPS data. Most commonly, nearest neighbour or the last known valid point13 19 20 methods were reported as imputation options. The seventh theme was linkage of GPS data to a variety of other data, including other sensor data (eg, personal air pollution monitor).22 The linkage process itself may also result in data loss for the analytical phase.19 The eighth theme was data inclusion, which may vary by subpopulation (eg, age) and lead to substantial data loss.19 Importantly, one manuscript empirically tested the potential for non-compliance to protocols to bias results, whereby certain ethnic or socioeconomic groups may be more likely to have lower rates of compliance and thus not be included in analysis.9

We did not tally the reporting of GPS device locational accuracy as an issue because there can be variation in performance among research-grade devices, though previous research indicated that most were able to detect location within metres when signal was unobstructed.24 Smoothing is discussed in two best practice manuscripts, but its definition can be ambiguous ranging from upsampling to remove possible errors to using kernel density estimators after initial data processing.12 20 The definition of smoothing was ambiguous depending on the researcher. Kerr et al define it as reduction of ‘random noise in complex datasets by focusing on the primary pattern in the data and replacing points outside that pattern with plausible points that match the pattern’12 (p 536). Jankowska et al20 discuss smoothing of data through kernel density functions, which may be considered a postprocessing procedure to develop activity space metrics. The question of smoothing may better be understood in the context of noise processing decisions. While some manuscripts determined how noise was identified, few of those manuscripts disclosed how this was rectified. A better consensus on the definition, methods and reportable results of smoothing may be needed. Thus, due to lack of clear definitions, we elected to not include smoothing as a theme.

Information sources, search strategies and keywords used in systematic review

An extensive search of electronic databases was conducted, including PubMed, Scopus and Web of Science, for relevant studies in the English language that focus on GPS data cleaning, processing or linkage and human health. We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for reporting.25 The search terms used were (gps OR ‘geographic positioning system’ AND NOT ‘general practitioner*’ OR ‘general practice*’) AND (clean* OR imput* OR link* OR process* OR filter* OR join* OR stitch*) AND (human OR public) AND health’]. Exact search strategies, limits and filters for each database are provided in online supplemental file 1. The initial search was intended to capture a broad range of studies across disciplines concerned with GPS data and human health (n=3182) (see figure 1 for a flow diagram of this process). Studies not published in English were excluded at the search stage. Studies that were not peer-reviewed original research were excluded at the screening stage. A protocol of this review strategy was registered with PROSPERO (ID: CRD42022322166). While we did not include reviews in this systematic review, we did conduct backward citation searching by checking the bibliographies of reviews that appeared in our search and cited methods in included manuscripts for additional eligible studies which met at least one of the inclusion criteria. The search was conducted on 24 October 2023 and included any publications prior to that date.

Figure 1

Flow chart of systematic review process. GPS, Global Positioning System.

Eligibility criteria

This systematic review focuses on the published best practices for GPS data usage, processing and linkage in public health research as related to environmental and contextual exposures. Few studies explicitly focus on best practices, and instead include analytical details in methodological sections of a study. We first compiled the few best practice manuscripts. These were identified as manuscripts focused on discussion of best practices in the usage and processing of GPS data (with or without empirical data included). These manuscripts differ from review manuscripts in that the focus was not on systematically reviewing existing literature. Then, to cast a wide net to obtain as many studies as possible that may inform this review, included studies were required to meet at least one of the following criteria: (1) feasibility/pilot studies or protocols involving GPS in populations for exposure/context and health research purposes and containing empirical data; (2) the development of a novel linkage of GPS data to other data intended for research on contextual influences on health; (3) associations between GPS-measured mobility or exposures and health outcomes; (4) derived variable methods (including algorithms) using GPS data in health research; or (5) comparison of GPS tracking with other methods (eg, travel diary). We permitted all manuscripts using the same cohort to be included because different research questions might yield different processing protocols. Existing literature reviews and commentaries on existing research were excluded.

Not included in the scope of this review were: (1) studies on GPS devices not worn by humans (eg, ref 26); (2) exclusive environmental measurement without any health component (eg, ref 27); (3) anonymised GPS data not linked to individuals (eg, ref 28); (4) the use of GPS to monitor people for healthcare or emergency services (eg, dementia patient tracking) (eg, ref 29); (5) comparisons of geocoding techniques or the spatial accuracy of GPS devices (eg, ref 30); (6) studies not containing empirical GPS data (eg, ref 31); and (7) studies that used a mobile phone or smartwatch for GPS tracking (eg, ref 32) because of the heterogeneity in these devices/apps and their unknown calibration, and source(s) of locational data (ie, potential reliance on cell towers rather than satellites).

Data extracted from reviewed manuscripts

Using the themes and subelements found in the best practice manuscripts (table 1, P1–P8), we extracted information about adherence to or reporting of each subelement from all other studies included in our review. The full dataset can be found in online supplemental file 2.

P1–P4 (GPS device, sampling frequency, wear time, GPS missing data)

For P1, GPS brand and model were identified. For non-commonly used brands, custom devices or those that were inclusive of other monitoring devices (eg, air pollution devices), we classified brand and model as ‘other’. For studies noting more than one type of GPS device, we noted both devices’ brands and models. For P2, the sampling frequency in seconds was extracted. P3 wear time was coded by days of wear time as specified in the protocol or study guidelines, which does not necessarily indicate adherence by participants. Some protocols had unique wear periods by subgroups within a study or indicated a range of days. In these cases, the average wear time was reported. For protocols requesting less than a day of wear time, we calculated the proportion of a 24-hour day. P4 was coded as the per cent of GPS signal loss or missing data before any imputation was performed.

P5 and P6 (noise and imputation)

As there is no overarching definition of ‘noise’ in the GPS and health literature, we define noise generally as GPS data that is not missing but is likely erroneous due to signal issues from interference in the environment or satellite connectivity. The method of noise detection was included if the study specified how noise was identified (eg, rapid speed changes, satellite inaccuracy readings, rapid elevation changes). Some studies reported visually assessing the data and removing points they considered erroneous. For these studies the method of noise detection was coded as ‘manual’. If no specific discussion of noise was included, the subsequent subcategories in P5 were coded as ‘not applicable’ (n/a). Noise removal or correction thresholds (P5a) were extracted if specified (eg, altitude >800 m), as well as the per cent of points identified as noise (P5b). If a specific tool (P5c) was indicated in the manuscript to handle noise, it was coded as the tool name or the custom software or toolbox. For some studies the authors also indicated additional manual noise cleaning, which was also indicated under P5c. For manuscripts that cited commonly used tools, we included default noise parameters if they were findable and unless otherwise specified by the manuscript authors.

P6 (imputation performed) indicated if the manuscript specified its imputation choices (yes or none) or did not mention imputation. If a manuscript indicated no imputation was performed, or did not specify if imputation was performed, all other P6 columns were coded as n/a. P6a indicated the method used (eg, last known point), P6b identified the imputation threshold used to impute missing data, P6c identified the specific tool or algorithm used for imputation and, finally, P6d indicated the per cent of GPS points imputed.

P7 (linkage)

Data considered as ‘linked’ to GPS data were defined as data collected concurrently with the GPS device. This excluded postprocessing linkage with Geographic Information System (GIS) data as well as survey responses or biometrics collected and then appended to individual datasets. For linkage we focused on if data were linked to GPS and the type of data linked, as well as the epoch or interval of linkage (P7a). If more than one data type was linked to GPS, each data type and epoch were recorded in seconds and averaged. When linkage was reported at the trip or location level, those epochs were not used to calculate median values. We also identified the tool used for data linkage (P7b) and the percentage of data lost due to linkage if specified (P7c). For studies that did not link any other data to GPS data, we indicated ‘none’, and all following P7 categories were coded as n/a.

For themes P5 (noise), P6 (imputation) and P7 (linkage), many studies cite other manuscripts for details on processing procedures or decisions. If details on the themes and subelements could be found in those cited manuscripts, they were included as data for the original manuscript.

P8 (data inclusion)

Data inclusion was coded as ‘specified’ or ‘not specified’ (P8) to indicate if the manuscript had indicated how authors deemed a data point, wear period/day or participant’s compliance as valid for study inclusion. If specified, P8a and P8b provide further detail for noted wear periods/days or participant compliance required. For some types of studies, especially feasibility studies, the nature of the research did not require a data inclusion criterion (eg, all available data were used). For such studies we indicated ‘n/a’. For many studies, the authors did not indicate if inclusion criteria were specific to GPS data or more generally applied to linked data (eg, accelerometer non-wear periods). We chose to include any noted data inclusion criteria. Other studies collected data over several periods and we noted the minimum requirements for each data collection period. Similarly, for P8c—per cent of participants lost—many studies did not indicate why participants were lost. Thus, for studies that reported participants lost to GPS data issues, that number is reported, while for studies that did not differentiate, the total number of participants lost is reported.

In addition to practices identified through the best practice manuscript themes, we also extracted year of publication, name of the journal, focal health outcome, risk or behaviour, and type of data linked with GPS data. The best practices were divided into GPS usage practices, or those related to the collection of GPS data in a study, and GPS processing practices, or those related to preparing GPS data for analysis after collection. For each manuscript included, we calculated the total number of practices reported separately for GPS usage and GPS processing practices.

Results

Study selection process

Two reviewers (lead and senior author) conducted title and abstract screening of the articles using the program Covidence. Inclusion/exclusion conflicts between reviewers were identified in Covidence and were resolved in a meeting whereby inclusion criteria were reviewed, and reasons for not meeting criteria were discussed. A total of 255 publications progressed to full-text screening by reviewers (figure 1). Next, bibliographies of excluded review manuscripts33–46 were checked and methods cited in included manuscripts were checked for possible inclusion. This process yielded an additional 36 manuscripts that were reviewed in full. Finally, within included manuscripts, if additional methods manuscripts were cited as key information sources for GPS data processing, we reviewed those manuscripts (n=21 additional manuscripts). We excluded manuscripts based on the following exclusion criteria: GPS data collection was planned but not yet carried out (eg, protocol without empirical data) (n=10), GPS devices were not used for exposure/contextual measures (eg, the GPS was only used for identifying the coordinates of an individual) (n=52), a mobile phone app was used for collecting coordinates (n=49), GPS device was not used for human mobility (n=7), only anonymised GPS data collected (n=1), a sole focus on GPS device comparison (n=3), a review paper or commentary (n=18), abstract only retrieved (n=11) or not available in English (n=2). Ultimately, 157 total manuscripts were selected for inclusion in this review.

Characteristics of studies included in the systematic review

Table 2 identifies characteristics of studies included. Out of the 157 publications included in this review, 107 were associations between GPS-measured mobility or exposures and health outcomes,33 47–152 11 were comparisons of GPS tracking with other methods (eg, travel diary),153–162 22 were feasibility/pilot studies or protocols involving GPS in populations for exposure/context and health research purposes and containing empirical data,41 163–183 5 were focused on the development of a novel linkage of GPS data to other data intended for research on contextual influences on health16 184–187 and 12 were focused on derived variable methods (including algorithms) using GPS data in health research.14 188–198 All papers were published from 2007 onwards. The most common journals of publication were Health & Place and International Journal of Environmental Research and Public Health, followed by American Journal of Preventive Medicine and International Journal of Behavioral Nutrition and Physical Activity (data not shown in tabular form).

Table 2

Characteristics of studies included in analyses (n=157)

Of the focal health outcomes and risks, almost half (45.9%) evaluated physical activity, followed by mobility (34.4%), neighbourhood-built environment exposures (29.9%) and other outcomes (eg, therapeutic experience, asthma, community participation) (17.8%) (table 2). The most commonly linked data were accelerometry (54.1%), followed by GIS data (49.0%) and travel diary/log (29.9%).

Consistency in reporting of best practices

Results of evaluating the selected four GPS usage practices and five GPS processing practices are shown in table 3. In evaluating practices reported for GPS usage, 93.6% reported brand of GPS device (most commonly Qstarz, followed by GlobalSat and Garmin), 91.7% reported model (most commonly Qstarz BT-1000XT) and 88.5% reported GPS sampling frequency (median=15 s). All but one study reported GPS days of wear time (median=7).

Table 3

Practices reported in included manuscripts (n=157)

In evaluating practices reported for GPS processing, only 52.9% of studies reported identifying noise and more specifically their method for noise detection (most commonly speed, elevation and satellite accuracy). Of those that include noise identification, most reported the threshold for noise detection (75.9%) and the tool used for detection (83.1%, most commonly personal activity location measurement system (PALMS)). The noise threshold was most commonly speed >130 km/hour or delta elevation >1000 m (PALMS default parameters). Only 15.7% of these studies reported the per cent of GPS points considered to be noise (median=0.4%). About 31% of studies reported whether they employed imputation. Of those who reported imputing missing GPS data, all of them reported the imputation method, and 81.6% reported the tool. Almost 70% reported the imputation threshold, but only 15.7% reported the number of imputed points (median 15.5%). Of studies that conducted data linkage with GPS data (n=132), all reported which data were linked. However, only 87.9% reported the linkage epoch (median=60 s), 61.4% reported the tool for linkage (most commonly PALMS) and 20.5% reported the data loss through the linkage process (median=11%). About 68% of studies reported the criteria for GPS data inclusion. Yet only 47.7% reported the minutes of data required to be a valid day (median=480) and 58.1% reported the number of valid days required (median=2). Over 90% reported participants lost by compliance criteria (median=3.6%). Of the included studies in this review, 81.5% reported all four GPS usage practices while <5% reported all general GPS processing practices (not including subthemes a–c/d—which were rarely reported).

When evaluating trends in reporting of GPS usage practices over time, there does not appear to be a clear pattern (figure 2A). However, all studies conducted prior to 2015, and in 2020 reported all practices. There does not appear to be a trend towards increased reporting over time. Likewise, temporal trends in reporting of GPS processing practices did not show a clear trend (figure 2B). But, overall, average reporting scores for GPS processing practices were much lower than those for GPS usage practices.

Figure 2

Average scores and 95% CIs for Global Positioning System (GPS) usage (high=4, low=0; A) and processing (high=5, low=0; B) practices reported, by year.

Discussion

The aim of this review was to identify best practices of GPS data usage, processing and linkage in spatial health and exposure research, and assess the current state of reporting those practices. We explored the recommendations for reporting methods from best practice literature and then quantified reporting of GPS data best practice elements in published studies. To our knowledge, this is the first systematic review focused on the current state of GPS data usage, processing and linkage reporting, mirroring efforts in allied sciences to promote scientific transparency and replicability.

The themes identified in best practice manuscripts included the model/brand of GPS device used, sampling frequency, wear time, GPS data missingness, noise considerations, imputation, linkage of GPS data to a variety of other data and data inclusion criteria. These themes were each identified in at least 50% of the best practice manuscripts and were then used to assess reporting practices in our systematic review manuscripts. Of all papers included in the review, 81.5% reported GPS usage practices (P1–P3), however, only five papers (3.2%) reported on all GPS processing practices (P4–P6, not including subcomponents).

For our first practice—reporting GPS brand and model—8% of the studies in this review did not disclose the GPS device model used, while 6% did not report brand. This limits understanding of the capability or comparability of devices across studies, as devices have different levels of locational precision and varying lengths of time to acquire a signal.20 Still, most research-related devices yield similar accuracy when unobstructed.12 If researchers deploy previously unvalidated devices, this might be an important limitation or weakness of the study’s ability to measure relationships between location and health outcomes. Similarly, reporting GPS device wear protocols is important, as they may affect the reliability and generalisability of findings. We found that all but one study reported the amount of requested wear time, but only 68.2% reported the inclusion criteria for their data, whether that be at the data point, day or person level. Because so much processing must occur to raw GPS data, specificity in what is considered a ‘valid’ point can clarify the quality of the data, as well as assist the researcher in reporting thresholds and other aspects of data cleaning within a manuscript. While some studies operated on a subday level, understanding the parameters for inclusion of observation days is relevant because a study that requires 3 days of GPS wear time may find stronger or weaker associations between exposures and health outcomes than a study that requires 7 days. The 3-day study may overestimate or underestimate a given exposure if, for example, the observation period does not include a representative sample of participants’ extent of activity spaces (eg, weekdays only). Previous research has found that at least 14 days of valid GPS data are required to obtain a representative sample of participants’ activity spaces.14 However, this may differ depending on the risks or outcomes of interest or the population under study. Future research is needed into GPS wear protocols and processing steps which may affect associations with specific health outcomes or exposures.

Assessment, processing and reporting of missing data via signal loss, noise or linkage was highly inconsistent among studies. Review and reporting of missing data are important aspects of assessing possible bias in a study, especially if GPS or linked data missingness leads to removal of participants from a study. Additional sources of missing data can occur when using GPS models that automatically turn off when they do not sense activity or lose satellite signal, or when a researcher decides to exclude data outside of a specific study area. We found that only 12.1% of studies reported the per cent of GPS data lost by signal loss, only 15.7% reported the per cent of GPS data considered to be noise and none reported how much data were removed when rectifying data to a specific study area (not tallied or shown in tables). Of studies reporting amount of data loss from missing GPS data, numbers ranged from 0.1% to 70% of data missing. Larger amounts of missing data may indicate a poorer estimate of GPS-derived metrics, effecting quality of a study and ability to compare results to other studies. Delineation of GPS errors due to device or satellite reception issues and methods for either removing for correcting such errors are important to report in studies because they may affect the strength of associations of environmental exposures or behavioural contexts with health outcomes. For example, noisy GPS data or missing GPS data may occur in dense urban areas, where heat island effects or air pollution may be the strongest. Further, specific participant characteristics may more often take them outside of a study area. Missing GPS data may underestimate participants’ environmental exposures and may bias the associations with the health outcome of interest. Further research is needed into the spatial variation in GPS positional errors and how they relate to specific exposures,199 which was beyond the scope of this review. Closely related to missing data is the decision of whether to impute missing or noisy data or not. While many studies chose to ignore missing data, some research has found that this can bias results particularly when linking GPS to other data resources like accelerometers. At the minimum, reporting if imputation was performed or not (only 30.6% of studies reported this) will help in identifying if a study may be prone to potential biases.16

Additionally, we noted that a fair amount of GPS data was lost (median 11%) through linkage to other devices or GIS data. The potential implications of these lost data varied by study type. For example, some studies were only interested in physical activity monitored by accelerometers and therefore used the accelerometer epoch as the standard for linkage. The lost GPS data (which were not matched to accelerometer data) were then likely to have minimal impact on the quantification of physical activity and potentially minimal impact on study results. However, if the focus was to assess where physical activity or in what types of environments it was occurring, the data lost due to linkage could bias the results. In an existing review of studies using GPS units with accelerometers or travel diaries, 17 had missing or unusable data ranging from 2.5% to 95% after linkage.15 We also noted that very few studies specified how data were kept after linkage, for example, if only data that included both GPS and the linked data resource as kept, or if all GPS data were kept no matter if it had a linkage. Although beyond the scope of this review, we further note that simplistic linkage of GPS data to other sources may lead to uncertainty in estimates of exposure. For example, some studies used a simple intersection between GPS and GIS layers to determine exposure, which assumes that there is no positional error in the GPS or GIS data. This may lead to misclassification of exposure. For example, in a study evaluating time spent in a park, if the GPS points fall outside of the GIS park perimeter, linkage may misrepresent exposure time. Sensitivity analyses based on varying distance thresholds may help determine how variations in distance between GPS data and GIS layers may bias exposure estimates.200 201 It is possible that missing data which results in removal or participants may bias the results of studies (although this was not formally assessed here). Very few manuscripts performed analysis comparing characteristics of their retained sample compared with participants who had to be removed due to either GPS missing/noisy data or linkage issues (eg, missing at random analysis), with notable exceptions (eg, ref 9). In fact, few manuscripts reported how many participants were lost due to GPS data issues, instead amalgamating all lost participants together, regardless of reason for exclusion.

To promote reporting of practices and methods in this research area, we created an example table for best practice reporting (online supplemental table S1). This table provides examples for ways to report practices, reviewers to evaluate and readers to identify GPS data considerations and potential biases. The table was designed based on real examples from the reviewed literature and makes use of the reporting themes identified in this review.

Limitations

While attempting to carry out an exhaustive review, there were certain aspects of GPS reporting and processing which we were not able to evaluate. For example, we did not consider the necessity of differential reporting for certain health outcomes. Future research may wish to provide guidance on an outcome-by-outcome basis (eg, for physical activity, depression, asthma). Moreover, our findings were not separated by subpopulations being studied (eg, child vs adult), though we do understand the need to modify methods should they not be appropriate for the population of interest. Thus, future research may wish to review best practices for each subpopulation and provide relevant guidance. Another potential limitation to our review was the omission of studies using mobile phone apps and smartwatches to collect GPS data. Although these devices are becoming commonplace, the decision was made to focus on ‘research grade’ GPS devices due to mobile phone apps often unknown calibration, source(s) of locational data and lack of homogeneity among these apps. However, by identifying best practices among research-related GPS devices, these practices can be transferred as applicable to mobile phone or smartwatch data collection and processing. Last, our restriction to evaluating studies published in English only is a limitation of this review and future studies pulling from non-English literature would be valuable.

Future research and policy implications

Though this review was focused on identifying best practices and assessing the current state of reporting on those practices, several ancillary areas of future research remain. For example, it is unknown how much uncertainty not correcting or removing locational noise may be introduced to the estimation of exposure or how GPS wear protocols and processing steps could affect the detected associations with health outcomes. Future research may usefully estimate the magnitude of each of these practices and/or data loss on overall uncertainty or bias using a meta-analysis or similar approach. Perhaps the most evident need in future research, based on our findings, is a consensus on which practices should be reported, regardless of study design or research focus, and which practices may be optional. As mentioned in the Introduction section, this second step in our research process will make use of the themes identified in the current systematic review in order to build consensus among experts. With such a consensus, future geospatial health and exposure research will be more comparable, reliable and reproducible.

Because studies using GPS data may be used to quantify harmful exposures, and thus inform policies aimed at protecting the public from those exposures, the designation of minimum reporting for comparisons across studies would allow us to ensure that policies are based on the best available science. Furthermore, enabling meta-analyses to pool findings and create best guidance for policy could be afforded by efforts to standardise reporting.

Conclusions

In summary, because there is currently no consensus for the optimal use or reporting of GPS data in spatial health and exposure research, studies tend to report what they feel is essential, yielding such variety that comparisons across studies are challenging. Throughout this review process, we found a lack of consistency in both reporting and methods. Some manuscripts were meticulous in identifying and reporting their process and procedures, either in the main text or appendix. For other manuscripts, we had considerable difficulty finding processing decisions, criteria or other critical information. This review underscores that the current state of GPS usage and processing practices reporting has significant room for improvement. Details pertaining to acquiring and processing of GPS data are vital so that future studies can fully assess the methods used, identify quality of data inclusion, compile findings in a meta-analysis or draw comparisons across studies.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information. All data are available in online supplemental file 2.

Ethics statements

Patient consent for publication

Acknowledgments

The authors thank Dr David Berrigan, Phil Hurvitz and Steve Mooney for help in conceptualising this research and providing critical feedback.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Correction notice This article has been corrected since it was published. The funding grant number has been corrected.

  • Contributors ALP: conceptualisation, data curation, investigation, methodology, formal analysis, resources, visualisation, writing—original draft, writing—review and editing. CDB, CT, J-AY: data curation, writing—review and editing. KP: conceptualisation, writing—review and editing. MMJ: conceptualisation, data curation, investigation, methodology, resources, visualisation, writing—original draft, writing—review and editing, guaranteed the content of the manuscript.

  • Funding AP is funded by the National Cancer Institute (NCI) of the NationalInstitutes of Health (NIH) R01 CA239187.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.