In this series of posts, I’ll show how to use models available in our repository of methods for analyzing accelerometer data.

Our Purpose

Today’s post is an easy one. Similar to prior posts highlighting methods in our repository, I will show how to use Hibbing et al.’s two-regression model. In this model, ENMO (in a 1-s epoch) and the the coefficient of variation of ENMO are used to determine which of two equations are used to predict METs.

Calculating Features

As always, we first load necessary packages. The most important package here is ‘TwoRegression’ by Paul Hibbing, which has a number of helpful functions to implement two-regression models or just calculate coefficient of variation. However, I have downloaded the latest version from GitHub (version 0.1.3) so I am unsure if the function I will use today is available in the latest CRAN version (0.1.2).

devtools::install_github("paulhibbing/TwoRegression",force=TRUE)
library(TwoRegression)
tryCatch(library(dplyr),error=function(e){install.packages("dplyr");library(dplyr)})
tryCatch(library(data.table),error=function(e){install.packages("data.table");library(data.table)})
tryCatch(library(stringr),error=function(e){install.packages("stringr");library(stringr)})
tryCatch(library(lubridate),error=function(e){install.packages("lubridate");library(lubridate)})

Hibbing et al., provide a few different algorithms, most of which use data from the ActiGraph GT9X’s inertial measurement unit (IMU). I don’t have IMU data, so I’m going to tell the function to implement algorithm 1, which only uses ENMO calculated from raw acceleration data.

While the TwoRegression package makes this super easy, I need to make a function that can process a whole folder of files, not just an individual file and I need this function to do a few things:

  • Predict METs using Hibbing’s algorithm 1
  • Classify the predicted METs as moderate-to-vigorous activity (or not)
  • Extract the participant’s ID number from the file name
  • Extract data from a certain time interval

The first part can be done using the ‘hibbing18_twoReg_process’ function where ‘x’ is my file of interest (e.g., ‘File1.csv’).

rawdata<-hibbing18_twoReg_process(x, "Hip", x, 1, verbose=TRUE)

The second step is done using the ‘cut’ function. The ‘2. 999999’ ensures that a value of 3 is classified as MVPA.

rawdata$actclass<-cut(as.vector(rawdata$Hip_Algorithm1_METs),breaks=c(-Inf,2.999999, Inf),labels=c("sedlight","mvpa"))

The third part is a bit study-specific but I’ll include it because I often need to do this and it changes every time based on how the files were named. In the files I was using today, the names are like “AL_Hip_V1_001RAW” where “V1” is the visit number and the “001” part is the participant ID. This code first creates a ‘file’ variable that is the full filename (“AL_Hip_V1_001RAW.csv”). Then the ‘visit’ and ‘id’ variables are extracted using ‘str_sub.’ In the ‘visit’ example, I am creating a variable that starts 12 characters back from the last character in “file”, and ends 11 characters back from the last character in “file.” I have it this way instead of just specifying which character numbers I am interested in or starting the count from the beginning of “file” because the filename length and exact position of the target characters can change (e.g., because another file is “AL_LW_V1_001RAW.csv”).

rawdata$file<-x
rawdata$visit<-str_sub(rawdata$file,str_length(rawdata$file)-12,str_length(rawdata$file)-11)
rawdata$id<-str_sub(rawdata$file,str_length(rawdata$file)-9,str_length(rawdata$file)-7)

Now I can use “id” and “visit” to extract data from a certain time period. In this case, I want to extract data from a laboratory visit and exclude times before and after the visit. I use a similar approach to extract particular wear dates when measuring habitual physical activity (e.g., if the person received the monitor on March 1 and returned it March 21, but actually only wore it March 2 to March 9). In this example, I had an ‘xlsx’ file with the ‘start’ and ‘stop’ datetimes. After merging the weartimes data with my raw acceleration data, I created an interval using the ‘lubridate’ package and only kept data that fell within that interval.

    id       date visit               start                stop
1  001 2015-01-15    V1 2015-01-15 14:00:00 2015-01-15 15:19:30
2  001 2015-01-22    V2 2015-01-22 13:21:00 2015-01-22 14:40:00
3  002 2015-01-16    V1 2015-01-16 11:10:00 2015-01-16 12:27:30
4  002 2015-01-20    V2 2015-01-20 13:30:00 2015-01-20 14:50:00
5  003 2015-01-17    V1 2015-01-17 08:35:00 2015-01-17 09:49:00
6  003 2015-01-22    V2 2015-01-22 16:33:00 2015-01-22 17:53:00

weartimes<-readxl::read_xlsx('startstop2.xlsx')
rawdata<-merge(weartimes,rawdata, by=c("id","visit"), all=TRUE)
rawdata$int<-lubridate::interval(lubridate::ymd_hms(rawdata$start),lubridate::ymd_hms(rawdata$stop))
rawdata$Timestamp<-ymd_hms(rawdata$Timestamp)
datacut<-subset(rawdata, rawdata$Timestamp %within% rawdata$int)

I put this all together in a single function, get a list of ‘csv’ files in my target folder (because I also have gt3x files in there), and apply the function to each file. You will see I added one last line to the function using “select” so that I don’t end up with a bunch of extraneous variables. Voila!

data_hibbing<-function(x) {
  rawdata<-hibbing18_twoReg_process(x, "Hip", x, 1, verbose=TRUE)
  rawdata<-select(rawdata,c("Timestamp","Hip_Algorithm1_METs"))
  rawdata$actclass<-cut(as.vector(rawdata$Hip_Algorithm1_METs),breaks=c(-Inf,2.999999, Inf),labels=c("sedlight","mvpa"))
  
  rawdata$file<-x
  rawdata$visit<-str_sub(rawdata$file,str_length(rawdata$file)-12,str_length(rawdata$file)-11)
  rawdata$id<-str_sub(rawdata$file,str_length(rawdata$file)-9,str_length(rawdata$file)-7)
  
  rawdata<-merge(weartimes,rawdata, by=c("id","visit"), all=TRUE)
  rawdata$int<-lubridate::interval(lubridate::ymd_hms(rawdata$start),lubridate::ymd_hms(rawdata$stop))
  rawdata$Timestamp<-ymd_hms(rawdata$Timestamp)
  datacut<-subset(rawdata, rawdata$Timestamp %within% rawdata$int)
  datacut<-select(datacut,"id","visit","Timestamp","Hip_Algorithm1_METs","actclass")
 return(datacut)
}

filenames<-(list.files(pattern="*.csv"))
data<-try(lapply(filenames,data_hibbing) %>% bind_rows())

We end up with a data.frame of all participants, with predicted METs and activity intensity for each 1-s epoch.

   id visit           Timestamp Hip_Algorithm1_METs actclass
1 001    V1 2015-01-15 14:00:00                1.25 sedlight
2 001    V1 2015-01-15 14:00:01                1.25 sedlight
3 001    V1 2015-01-15 14:00:02                1.25 sedlight
4 001    V1 2015-01-15 14:00:03                1.25 sedlight
5 001    V1 2015-01-15 14:00:04                1.25 sedlight
6 001    V1 2015-01-15 14:00:05                1.25 sedlight

In this case, I actually wanted the total number of minutes each participant spent in MVPA according to the two-regression model, not epoch-level data. I can easily do this using ‘dplyr’ to select a subset of variables I want to keep, count the number of epochs in each level of ‘actclass’ (sed/light, MVPA), then collapse to the participant-level while creating a variable called ‘minutes’ which is the number of epochs (in this case, the number of seconds), divided by 60, which gives me the number of minutes at each level of ‘actclass.’ In the final portion of this pipe, I use ‘pivot_wider’ so that there is one row per participant instead of one row per intensity per participant.

data<-data_hibb %>%
  select(id,Timestamp,actclass) %>%
  count(id,actclass) %>% group_by(id) %>%
  mutate(minutes=n/60)%>% tidyr::pivot_wider(id_cols=c("id"),names_from=c("actclass"),values_from=c("minutes"),names_prefix="hib_")
id hib_sedlight hib_mvpa
001        105.8 52.73333

That’s all, folks!