EMA-CleanR: Ecological Momentary Assessment (EMA) Data Processing in R

R R-programming EMA EmoTe-lab featured featured-code

Summary

Research teams that make use of Ecological Momentary Assessment (EMA) surveys often have a need to clean and pre-process the data. Dr. Sarah Sperry and Victoria Murphy of the Emotion and Temporal Dynamics (EmoTe) Lab at the University of Michigan created EMA-CleanR, an R-based program for efficient pre-processing, cleaning, and visualization of EMA survey data. This article documents how to use EMA-CleanR to pre-process EMA data.

Screenshot of EMA-CleanR output in HTML

Screenshot of EMA-CleanR visualizations

View Sample Output

View Code on GitHub

Setup and Usage

EMA-CleanR is written in R-Markdown, which allows formatting R code and displaying it in sections, along with tables and plots, to make the data analysis steps easier to follow. The code is easier to view as HTML (a web page), which can be easily generated by opening the EMA-CleanR.Rmd file in RStudio and clicking Knit. The code takes as input a single CSV file called "EMA-Data.csv" with certain required columns. Each EMA item (question) must be in a separate column, and the column headings should begin with "EMA_". Customizable parameters can be configured in the top section of the .Rmd file, in YAML format.

Project Setup

Download the code from GitHub at: https://github.com/DepressionCenter/EMA-CleanR
Replace EMA-Data.csv with your own file.
- Ensure it has at least these columns: participantidentifier, surveyname, start_datetime, end_datetime.
- There should be one column per EMA item (question), and the column headings should start with "EMA_" (e.g. EMA_01, EMA_02, etc.). This prefix is configurable. under Parameters (more on that later).
- Each row represents one survey taken by one participant at one point in time.
Open EMA-CleanR.Rmd with RStudio. If asked, install any missing packages.
Edit the parameters at the top if needed (e.g. input file name), in the YAML section.

Configure Project Parameters

To customize the analysis for your particular study, set the project parameters at the top of EMA-CleanR.Rmd, in the YAML-formatted section. After changing the parameters, click the Knit button again to re-run the analysis.

input_file: The name of your input CSV. It defaults to "EMA-Data.csv" which contains sample data.
input_file_has_headers: Indicates that your CSV has column headers.
output_dir: The sub-directory (relative to the .Rmd file) where the output CSVs will be stored.
late_survey_cutoff_hour: If you want to allow late survey responses after midnight, set this parameter to at least one hour before the start of your first daily survey. Any responses received between midnight and this hour will be counted under the previous day. In the sample data, the earliest survey occurs at 10AM, so this parameter is set to 9 to allow late responses up until 9am. To disable, set it to 0.
ignore_surveys: Specific survey names to ignore (e.g. practice surveys).
surveys_per_day: Number of surveys per day. In some EMA software, we must schedule each time as a separate survey. Defaults to 4.
total_surveys_in_study: Total number of surveys. In this example, 4 surveys per day x 28 days = 112 surveys.
ema_item_prefix: All EMA columns (one per question) must start with this prefix. Defaults to "EMA_" (e.g. EMA_01, EMA_02, EMA_03, etc.)
ema_item_labels: Optional: setup friendly names for each of your EMA items. For example, EMA_01: "Nervous/Anxious" will make the code display "Nervous/Anxious" in graphs instead of "EMA_01".
participant_group_map: Optional: diagnosis codes, cohorts or groups mapped to the first letter/digit of the participant ID. For example, "1": "Cohort 1" will group all participant IDs starting with 1 (10, 101, 10002, etc.) as "Cohort 1".
plot_colors: U-M Colors for graphs. Go Blue!

Run the Analysis

In RStudio, open EMA-CleanR.Rmd and setup per the above instructions.
To run the analysis, click the "Knit" button (or Ctrl+Shift+K) to generate a new EMA-CleanR.html file. This will contain a walk-through analysis of your data and visualizations.
The output directory will contain exports of the data analysis in CSV format.

Navigating the Analysis

After running the analysis by clicking the Knit button in RStudio, you will be shown a web page containing the code, comments, tables and plots. This page is also saved locally as EMA-CleanR.html, which can be opened in any web browser.

Use the left menu to navigate through the different sections of the code.
Code blocks are displayed in gray, and the results of each block in white.
You can hide individual sections by clicking the Hide buttons. On the top right of the page, you can also click the blue Code button to show or hide all code blocks (useful for printing to PDF).

Files

README.md: This is the home page shown in the GitHub repo.
EMA-Data.csv: Contains sample EMA data.
EMA-CleanR.Rmd: Contains the R/Markdown code for the project. Use this file to configure the input parameters and to run the analysis. Best opened in RStudio.
EMA-CleanR.html: Contains the output of the analysis in HTML. This file is refreshed every time you click the Knit button in the .Rmd file.
/images: Contains screenshots and background images used in the demo files, this knowledge base article, and on GitHub.
/styles: Contains CSS style sheets with University of Michigan colors and digital accessibility features used when generating the HTML output.
.gitignore, .nojekyll: Internal files used by Git during check-in and when generating the sample output. Do not modify.
index.html: Redirection page for the demo site. Do not modify.
LICENSE, NOTICE: Copyright and license information.

Code Walk-Through

The following is a short walk-through of the code. More information can be found in the comments throughout the code.

YAML Parameter Block

This block contains the project parameters used when running the analysis, in YAML format.

params: These are the input parameters to use when running the analysis. See the Parameters section above for more details.
output: These are configuration settings for the HTML output.

Project Setup

This section sets up the display options for code blocks going forward, reads the YAML parameters and converts them to R variables, and loads packages.

Global chunk options. Suppresses warnings/messages, echoes code for reproducibility.
Load required packages. Loads the R libraries used by the project (deplyr, corrplot, rlang, etc.). A full list of libraries is displayed.
- Note that these external libraries each have their own licenses. Please refer to their individual websites or the R CRAN repository for license information.
Set project parameters. This section reads the parameters from YAML and adds them to the global variables. This makes project setup easier by unifying all parameters at the top, in an easy-to-read format.
- The parameter values are shown in a white box, so everyone can see what assumptions were made when running the analysis.
- Decision Point: Handles output directory creation if missing.
Global functions. This section creates functions for common tasks used throughout the code, such as generating color gradients using the colors defined in the parameters.
Load data sets. This section reads the input file (EMA-Data.csv, or as defined in the input_file parameter).
- The ingested row count is shown for a quick sanity check.
- Assumption: This file already contains combined participant data, with one row per survey taken per participant. This is often the case when exporting directly from mobile technology platforms or clinical trial management systems.

Cleanup EMA Data File

This section attempts to do some data cleaning, de-duplication, and grouping. It also creates global variables to access the clean data.

Remove practice and test surveys

Filters out surveys named in ignore_surveys.
Assumption: Surveys to be excluded are correctly listed; could miss unexpected test names.

Sort dataset and filter invalid participant IDs

Removes rows with NA, empty, or invalid string identifiers ("NA", "null", etc.).
Special Case: Catches common issues with identifier entry errors.
The remaining rows after participant ID cleaning are shown.

Cleanup dates and de-duplicate

Parses start/end datetime as POSIX objects.
Creates date-only column (start_day).
Identifies and merges duplicate records (same participant, date, end time).
Decision Point: Merges partial responses via coalesce(); does not arbitrarily drop duplicates, but combines them.
The remaining rows after de-duplication are shown.

Create participant groups

Prefix-based mapping for participant groups (diagnosis/cohort). The mapping is configured in the participant_group_map parameter.
Assumption: Group distinction encoded in participant ID format. The first digit of the participant ID determines the group.

Identify and clean EMA items

Identifies survey items by prefix (per the ema_item_prefix parameter) and removes rows where all items are NA.
Decision Point: Only retains rows with some valid EMA response.

Create variables

Days in Study: Calculates day-in-study per participant, allowing for varying start dates.
Late Survey Correction: If survey completed after midnight but before late_survey_cutoff_hour, counts toward previous day using lag().
Survey Sequence: Assigns sequential survey number within each participant.
Weekday/Weekend Classification: Adds columns for day type for context in analyses.

Time to completion (TTC) flag

Calculates time difference (seconds) for survey completion.
Sets cutoff for unusually high completion times: mean + 2 SD.
- TTCFlag_High: Flag for abnormally slow surveys.
Sets minimum for abnormally fast completion: < 1 second per EMA item.
- TTCFlag_Low: Flag for possibly inattentive (too fast) responses.
Assumption: Typical response times cluster around group mean, distributions are reasonable.
Plots a histogram of time differences.

Flags

Calculates and plots items that are below or above a standard deviation. Lists suspect items that contain two or more flags.

Compliance

Missing Data Summary

Removes duplicates and identifies missing EMA items based on participant identifier, day in study, and survey name.
Plots completion vs missingness by survey name, device type, etc.
- Assumption: For the plot by device type only, there must be a column indicating the device type (e.g. iPhone, Android) used to complete the survey. This data is usually provided by the clinical trial mangement system vendors.

Figures for Percent Compliance

Creates violin plots of survey completion percentages by groups (e.g. cohorts or diagnosis groups), as defined in the project parameters.
Displays a detailed compliance table.

Item Distribution

Creates histogram plots of survey completion percentages.
Plots item within day time effect.

Correlation

Creates and plots correlation matrices, such as between-person r value, within-person standard distribution, etc.

Output

Saves the individual correlation matrix tables to CSV.

Betweencorr.csv: between person r value.
Withincorr.csv: within person r value.
betweenm.csv: person centered mean.
betweenp.csv: between person p value.
withinp.csv: within person p value.
withinsd.csv: within-person SD.

Notes

If you find this work useful, please cite it. DOI: 10.5281/zenodo.17982075

Resources

GitHub Repository: https://github.com/DepressionCenter/EMA-CleanR
Sample output: https://code.depressioncenter.org/EMA-CleanR/EMA-CleanR.html
EmoTe Lab projects: https://sperry.lab.medicine.umich.edu/active-projects
The R Project for Statistical Computing - https://www.r-project.org/
RStudio IDE: https://posit.co/products/open-source/rstudio

About the Authors

Dr. Sarah Sperry, PhD is the Richard Tam Early Career Professor of Translational Bipolar Research in the Department of Psychiatry where she directs the Emotion and Temporal Dynamics (EmoTe) Lab and serves as Associate Director of the Heinz C. Prechter Bipolar Research Program. Dr. Sperry’s research uses ambulatory assessments (smartphones, wearable devices) to understand affective, cognitive, and circadian risk factors for the development and maintenance of bipolar disorders. Her work in this area has been recognized with the International Society for Bipolar Disorders Samuel Gershon Junior Investigator Award and the Depression and Bipolar Support Alliance Gerald L. Klerman Young Investigator Award. She is a member of MeTRIC, serves as the apps and wearable lead for the BD2 Integrated Network, and is an executive board member of the Society for Ambulatory Assessment.

| |

Victoria Murphy graduated from the University of Michigan with a Bachelor of Science in Biopsychology, Cognition, and Neuroscience and a minor in Statistics. Victoria joined the Heinz C. Prechter Bipolar Research Program through the Undergraduate Research Opportunity Program UM in 2021 and transitioned to a Clinical Research Coordinator Position. Victoria aims to explore how personalized digital interventions can change life course in bipolar disorder. Victoria is interested in pursuing a Ph.D. in Clinical Psychology with a specialization in Quantitative Methods.

| |

Gabriel Mongefranco is a Mobile Data Architect at the University of Michigan's Eisenberg Family Depression Center. Gabriel has over a decade of experience with automation, data analytics, database architecture, dashboard design, software development, and technical writing. He supports U-M researchers with data cleaning, data pipelines, automation and enterprise architecture for wearables and other mobile technologies.

| |

0 reviews

Print Article

Updating...

EMA-CleanR: Ecological Momentary Assessment (EMA) Data Processing in R

Summary

Setup and Usage

Project Setup

Configure Project Parameters

Run the Analysis

Navigating the Analysis

Files

Code Walk-Through

YAML Parameter Block

Project Setup

Cleanup EMA Data File

Remove practice and test surveys

Sort dataset and filter invalid participant IDs

Cleanup dates and de-duplicate

Create participant groups

Identify and clean EMA items

Create variables

Time to completion (TTC) flag

Flags

Compliance

Missing Data Summary

Figures for Percent Compliance

Item Distribution

Correlation

Output

Notes

Resources

About the Authors

Related Articles (3)

Deleting...