Using Brian Helsel’s agcounts package, I generate ActiGraph counts in R for gt3x, csv, NHANES, and GENEActiv files.
Our Purpose
ActiGraph recently made their activity count algorithm available in Python, which Dr Helsel implemented in R as the agcounts package. Today’s post has two parts. First, I compare agcounts output to Actilife output. Second, I show a slightly modified version of the package code that allows for three additional file types- Actilife raw data in csv format, NHANES data in csv format, and bin files from the GENEActiv device.
Using agcounts to get activity counts
The main agcounts function is ‘get_counts’, which takes a gt3x file and calculates activity counts over the user-specified epoch. This is going to be really useful for streamlining data analysis and it’s super-easy to implement!
get_counts(path="file.gt3x",frequency=30,epoch=5,write.file=FALSE,return.data=TRUE)
As always, we can apply this function to a whole list of files. Here, I’ve defined a short function, ‘f’, which will apply ‘get_counts’ and add on the id number as a new variable. If I wanted to do additional steps like removing non-wear time, I could do so directly in this function. ‘lapply’ is then used to apply the function ‘f’ to my list of filenames.
setwd("C:/Users/clevengerka/myfiles/")
filenames<-(list.files(pattern="*.gt3x"))
f<-function(x){
data<-get_counts(x,frequency=60,epoch=60,write.file=FALSE,return.data = TRUE)
data$id<-x
return(data)
}
data<-try(lapply(filenames,f)) %>% bind_rows()
head(data)
time Axis1 Axis2 Axis3 Vector Magnitude id
1 2015-01-15 13:30:00 278 0 276 392 AL001.gt3x
2 2015-01-15 13:31:00 0 0 0 0 AL001.gt3x
3 2015-01-15 13:32:00 0 0 0 0 AL001.gt3x
4 2015-01-15 13:33:00 0 0 0 0 AL001.gt3x
5 2015-01-15 13:34:00 0 0 0 0 AL001.gt3x
6 2015-01-15 13:35:00 0 0 0 0 AL001.gt3x
Comparing agcounts to ActiLife
It’s important to know whether agcounts is actually comparable to the output we would normally get from Actilife, so I compared the two across six epochs using 66 files from a lab-based study involving structured and semi-structured activities. The means between ActiLife-generated and agcounts-generated activity counts were similar and the differences came down to rounding. (I want to note that preliminary tests using free-living data are a bit more complicated, so further testing is needed.)
Vector Magnitude
Epoch (in seconds) | ActiLife | agcounts |
---|---|---|
1 | 47.72 | 47.68 |
5 | 230.66 | 230.48 |
10 | 456.88 | 456.53 |
15 | 682.43 | 681.95 |
30 | 1357.01 | 1356.09 |
60 | 2702.93 | 2700.86 |
Axis 1
Epoch (in seconds) | ActiLife | agcounts |
---|---|---|
1 | 25.27 | 25.23 |
5 | 126.38 | 126.18 |
10 | 252.84 | 252.44 |
15 | 379.37 | 378.79 |
30 | 759.21 | 758.08 |
60 | 1519.90 | 1517.54 |
Modifying the get_counts function to allow other file types
Sometimes I work with file types other than gt3x, so my second purpose today was to make a few modifications to the ‘get_counts’ function to allow for three other file types.
- While it is ideal to only work with the gt3x files (e.g., to reduce storage needs), sometimes collaborators send me the Actilife-export raw data in csv format. In this format, there are 10 rows of header information (e.g., start and end date) before the actual acceleration data.
- NHANES raw acceleration data will be released as csv files, but the format isn’t the same as the Actilife-generated csv format.
- I may want to calculate activity counts for raw acceleration data collected using a non-ActiGraph device, such as the GENEActiv, which stores data in bin files.
To modify the ‘get_counts’ function for my purpose, I first want to see the background code for the function, which you can do using the below code (I’m not going to post the whole thing here because it is long…).
View(get_counts)
I then define the following function, which is based on the ‘get_counts’ function. I only made minor changes to allow for reading in other file types. Specifically, I use the ‘file_ext’ function from the tools package to apply different initial steps based on the file extension (gt3x, csv, or bin). Because I want the function to work for both Actilife-generated and NHANES csv files, the portion for csv files has a further nested if statement which basically says “if reading the file using ‘read_AG_raw’ fails, then try using ‘read.csv.”
get_counts2<-function (path, frequency, epoch, lfe_select = FALSE, write.file = TRUE,
return.data = FALSE, verbose = FALSE)
{
if (!frequency %in% seq(30, 100, 10)) {
stop(paste0("Frequency has to be 30, 40, 50, 60, 70, 80, 90 or 100 Hz"))
}
if (verbose) {
print(paste0("------------------------- ", "Reading ActiGraph GT3X File for ",
basename(path), " -------------------------"))
}
if(tools::file_ext(path)=="gt3x"){
raw <- read.gt3x::read.gt3x(path = path, asDataFrame = TRUE, imputeZeroes = TRUE)
}else if(tools::file_ext(path)=="csv"){
raw <- try(AGread::read_AG_raw(path, skip=10, return_raw=TRUE))
if(class(raw)=="data.frame"){
raw<-raw[,3:6]
names(raw)<-c("time","X","Y","Z")
}else if(class(raw)=="try-error"){
raw<-read.csv(path)
raw$HEADER_TIMESTAMP<-lubridate::ymd_hms(raw$HEADER_TIMESTAMP)
names(raw)<-c("time","X","Y","Z")
}
}else if(tools::file_ext(path)=="bin"){
raw<-GENEAread::read.bin(path)
raw<-as.data.frame(raw$data.out)
raw<-select(raw,c("timestamp","x","y","z"))
raw$timestamp<-as.POSIXct(raw$timestamp, origin="1970-01-01")
names(raw)<-c("time","X","Y","Z")
}
if(tools::file_ext(path)=="gt3x"){
start <- as.POSIXct(format(attr(raw, "start_time"), "%Y-%m-%d %H:%M:%S"),
"%Y-%m-%d %H:%M:%S", tz = "America/Chicago")
end <- as.POSIXct(format(attr(raw, "stop_time"), "%Y-%m-%d %H:%M:%S"),
"%Y-%m-%d %H:%M:%S", tz = "America/Chicago")
raw <- .check_idle_sleep(raw = raw, frequency = frequency,
epoch = epoch, verbose = verbose)
} else if(tools::file_ext(path)!="gt3x"){
start <- lubridate::ymd_hms(first(raw$time))
}
downsample_data <- .resample(raw = raw, frequency = frequency,
verbose = verbose)
bpf_data <- .bpf_filter(downsample_data = downsample_data,
verbose = verbose)
trim_data <- .trim_data(bpf_data = bpf_data, lfe_select = lfe_select,
verbose = verbose)
downsample_10hz <- .resample_10hz(trim_data = trim_data,
verbose = verbose)
epoch_counts <- data.frame(t(.sum_counts(downsample_10hz = downsample_10hz,
epoch_seconds = epoch, verbose = verbose)))
epoch_counts <- epoch_counts[c("Y", "X", "Z")]
colnames(epoch_counts) <- c("Axis1", "Axis2", "Axis3")
epoch_counts$`Vector Magnitude` <- round((sqrt(epoch_counts$Axis1^2 +
epoch_counts$Axis2^2 + epoch_counts$Axis3^2)))
first.time.obs <- as.POSIXct(format(raw[1, "time"], "%Y-%m-%d %H:%M:%S"),
"%Y-%m-%d %H:%M:%S", tz = "America/Chicago")
if (start == first.time.obs) {
epoch_counts <- cbind(time = seq(from = start, by = epoch,
length.out = nrow(epoch_counts)), epoch_counts)
}
if (start != first.time.obs) {
last.missing.time <- as.POSIXct(format(raw[1, "time"],
"%Y-%m-%d %H:%M:%S"), "%Y-%m-%d %H:%M:%S", tz = "America/Chicago") -
epoch
missing.timestamps <- cbind(time = rev(seq(to = start,
from = last.missing.time, by = -epoch)), matrix(0,
ncol = 4, nrow = length(rev(seq(to = start, from = last.missing.time,
by = -epoch)))))
colnames(missing.timestamps) <- c("time", "Axis1", "Axis2",
"Axis3", "Vector Magnitude")
epoch_counts <- cbind(time = seq(from = first.time.obs,
by = epoch, length.out = nrow(epoch_counts)), epoch_counts)
epoch_counts <- rbind(missing.timestamps, epoch_counts)
epoch_counts$time <- as.POSIXct(epoch_counts$time, origin = "1970-01-01",tz="America/Chicago")
}
if (write.file) {
if (lfe_select) {
name <- paste0("AG ", epoch, "s", " LFE ", "Epoch Counts")
}
if (!lfe_select) {
name <- paste0("AG ", epoch, "s", " Epoch Counts")
}
if (!dir.exists(paste0(dirname(path), "/", name))) {
dir.create(paste0(dirname(path), "/", name))
}
write.csv(epoch_counts, file = paste0(dirname(path),
"/", name, "/", strsplit(basename(path), "[.]")[[1]][1],
".csv"), row.names = FALSE)
}
if (return.data) {
return(epoch_counts)
}
}
I can then test this new ‘get_counts2’ function on each of the four file types.
#gt3x, like above
test_gt3x<-get_counts2(“file.gt3x”,frequency=60,epoch=5,write.file=FALSE,return.data = TRUE)
time Axis1 Axis2 Axis3 Vector Magnitude
1 2015-01-15 13:30:00 242 0 276 367
2 2015-01-15 13:30:05 0 0 0 0
3 2015-01-15 13:30:10 0 0 0 0
4 2015-01-15 13:30:15 0 0 0 0
5 2015-01-15 13:30:20 0 0 0 0
6 2015-01-15 13:30:25 0 0 0 0
#Actilife csv
test_csvraw<-get_counts2(“fileRAW.csv”,frequency=60,epoch=5,write.file=FALSE,return.data = TRUE)
time Axis1 Axis2 Axis3 Vector Magnitude
1 2015-01-15 07:30:00 0 0 0 0
2 2015-01-15 07:30:05 0 0 0 0
3 2015-01-15 07:30:10 0 0 0 0
4 2015-01-15 07:30:15 0 0 0 0
5 2015-01-15 07:30:20 0 0 0 0
6 2015-01-15 07:30:25 0 0 0 0
#NHANES csv
test_csvnhanes<-get_counts2(“GT3XPLUS-AccelerationCalibrated-2x5x0.NEO1G86032672.2000-01-04-14-02-00-000-P0000.sensor.csv.csv”,frequency=80,epoch=5,write.file=FALSE,return.data = TRUE)
time Axis1 Axis2 Axis3 Vector Magnitude
1 2000-01-04 08:02:00 0 0 0 0
2 2000-01-04 08:02:05 0 0 0 0
3 2000-01-04 08:02:10 0 0 0 0
4 2000-01-04 08:02:15 0 0 0 0
5 2000-01-04 08:02:20 0 0 0 0
6 2000-01-04 08:02:25 0 0 0 0
#GENEActiv bin file
test_ga<-get_counts2(“file.bin”,frequency=60,epoch=5,write.file=FALSE,return.data = TRUE)
time Axis1 Axis2 Axis3 Vector Magnitude
1 2015-01-15 02:30:05 0 0 0 0
2 2015-01-15 02:30:10 0 0 0 0
3 2015-01-15 02:30:15 0 0 0 0
4 2015-01-15 02:30:20 0 0 0 0
5 2015-01-15 02:30:25 0 0 0 0
6 2015-01-15 02:30:30 0 0 0 0
That’s it! I’m looking forward to seeing how this package progresses and is used in future research.