2

I am stuck in a seemingly easy task. Imagine the following data.table:

dt1 <- data.table(ID = as.factor(c("202E", "202E", "202E")), 
              timestamp = as.POSIXct(c("2017-05-02 00:00:00",
                                       "2017-05-02 00:15:00",
                                       "2017-05-02 00:30:00")), 
              acceleration_raw = c("-0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141",
                                   "-0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117", 
                                   "-0.750 0.586 0.117 -0.773 0.586 0.117 -0.773 0.609 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.750 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117"))

Created on 2022-11-17 with reprex v2.0.2

The idea is that I want to separate the acceleration_raw column into 3 different ones: acc_x, acc_y and acc_z. Each row of acceleration_raw is a string of characters, eventually leading to 120 numeric observations. I want to separate acceleration_raw and then take every the value from the first row and forth with a step of 3 and put it to acc_x, every value from the second row and forth and put it to acc_y, and finally every value from the third row and on and put it to acc_z.

I tried to first separate acceleration_raw with separate_rows from dplyr:

library('tidyverse')
library('data.table')

dt1 <- dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F)

Created on 2022-11-17 with reprex v2.0.2

And after that:

library('tidyverse')
library('data.table')

dt1 <- dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F) %>%
  mutate(acc_x = seq(acceleration_raw, from = 1, to = length(dt1), by = 3), 
         acc_y = seq(acceleration_raw, from = 2, to = length(dt1), by = 3), 
         acc_z = seq(acceleration_raw, from = 3, to = length(dt1), by = 3))
#> Warning in seq.default(acceleration_raw, from = 1, to = length(dt1), by = 3):
#> first element used of 'length.out' argument
#> Error in `mutate()`:
#> ! Problem while computing `acc_x = seq(acceleration_raw, from = 1, to =
#>   length(dt1), by = 3)`.
#> Caused by error in `ceiling()`:
#> ! non-numeric argument to mathematical function

Created on 2022-11-17 with reprex v2.0.2

Any suggestions on how to proceed?

1
  • Can you put an example of your desired result? how many rows do you expect must have the resulting df? Commented Nov 17, 2022 at 19:24

2 Answers 2

1

You could use pivot_wider and unnest:

library(tidyverse)

dt1 %>%
  separate_rows(acceleration_raw, sep = " ", convert = F) %>%
  mutate(id = rep(c("acc_x", "acc_y", "acc_z"), times = nrow(.) / 3)) %>% 
  pivot_wider(names_from = id, values_from = acceleration_raw, values_fn = list) %>% 
  unnest(cols = c("acc_x", "acc_y", "acc_z"))

This returns

# A tibble: 120 × 5
   ID    timestamp           acc_x  acc_y acc_z
   <fct> <dttm>              <chr>  <chr> <chr>
 1 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 2 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 3 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 4 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 5 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 6 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 7 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 8 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
 9 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
10 202E  2017-05-02 00:00:00 -0.703 0.656 0.164
# … with 110 more rows
Sign up to request clarification or add additional context in comments.

2 Comments

Tbh: This looks very odd.
This one looks very good. ` mutate(id = rep(c("acc_x", "acc_y", "acc_z"), times = nrow(.) / 3)) `
1

I was not able pivot_wider without NA's, so the solution is not optimal:

library(tidyverse)

dt1 %>% 
  as_tibble() %>% 
  separate_rows(acceleration_raw, sep = " ") %>% 
  group_by(group =  as.integer(gl(n(), n()/3, n()))) %>% 
  mutate(id = row_number()) %>% 
  mutate(group = case_when(group == 1 ~ "acc_x",
                           group == 2 ~ "acc_y", 
                           group == 3 ~ "acc_z")) %>% 
  pivot_wider(names_from = group, values_from = acceleration_raw) %>% 
  mutate(acc_y = lead(acc_y,n()/3),
         acc_z = lead(acc_z,n()/3*2)) %>% 
  na.omit()
# A tibble: 120 x 6
   ID    timestamp              id acc_x  acc_y  acc_z 
   <fct> <dttm>              <int> <chr>  <chr>  <chr> 
 1 202E  2017-05-02 00:00:00     1 -0.703 -0.703 -0.750
 2 202E  2017-05-02 00:00:00     2 0.656  0.680  0.586 
 3 202E  2017-05-02 00:00:00     3 0.164  0.117  0.117 
 4 202E  2017-05-02 00:00:00     4 -0.703 -0.680 -0.773
 5 202E  2017-05-02 00:00:00     5 0.656  0.680  0.586 
 6 202E  2017-05-02 00:00:00     6 0.164  0.117  0.117 
 7 202E  2017-05-02 00:00:00     7 -0.703 -0.680 -0.773
 8 202E  2017-05-02 00:00:00     8 0.656  0.680  0.609 
 9 202E  2017-05-02 00:00:00     9 0.164  0.117  0.117 
10 202E  2017-05-02 00:00:00    10 -0.703 -0.680 -0.773
# ... with 110 more rows
# i Use `print(n = ...)` to see more rows

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.