0

I would like to use map() from the purrr package to iterate over a subset of variables of my data frame. Is there a standard and convenient approach that? take the following example dataset:

library(data.table)
library(purrr)
dt <- data.table(id= c("Alpha 1","Alpha 2","Alpha 3","Beta 1"),
id2= c("gamma 1","gamma 2","gamma 3","Delta 1") ,
y = rnorm(4))
        id     id2          y
1: Alpha 1 gamma 1 -1.1184009
2: Alpha 2 gamma 2  0.4347047
3: Alpha 3 gamma 3  0.2318315
4:  Beta 1 Delta 1  1.2640080

I would like to split my id columns every time there is a space (" "). The final dataset should look like this.

      id numberid   id2 numberid2           y
1: Alpha        1 gamma         1 -1.45772675
2: Alpha        2 gamma         2 -1.07430118
3: Alpha        3 gamma         3 -0.53454071
4:  Beta        1 Delta         1 -0.05854228

I know how to do this one column at the time:

dt_m <- dt%>%separate(id,
         sep=" ", c("id","numberid"))
      id numberid     id2          y
1: Alpha        1 gamma 1  2.0789930
2: Alpha        2 gamma 2 -0.2528485
3: Alpha        3 gamma 3  0.1332267
4:  Beta        1 Delta 1  1.9299524

But I would like to iterate this using map over a number of columns. Does anyone knows a convenient way to

  1. iterate with map over a set of columns, returning a data frame

  2. and using the columns both for indexing and as a character sting (to paste number"id" and number"id2")?

I have tried something like this but it produces an empty data frame

vars <- c("id","id2")
dt2 <- dt%>%map_df(vars,~separate(.x,sep=" ", c((.x), "number")))

thanks a lot for your help

1

3 Answers 3

3

Use cSplit which will allow you to do this for multiple columns in one go.

splitstackshape::cSplit(dt, c('id', 'id2'), sep = ' ')

#            y  id_1 id_2 id2_1 id2_2
#1:  0.4037779 Alpha    1 gamma     1
#2: -0.3753461 Alpha    2 gamma     2
#3:  0.8014951 Alpha    3 gamma     3
#4: -1.3539683  Beta    1 Delta     1
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks this is really useful to deal with the issue at hand!
Dear @RonakShah, I just have a follow up question, is it possible to specify the resulting column names (e.g. "id" "numberid") using cSplit?
No..I don't think so you can do it within cSplit.
1

I think a more typical tidyverse approach in using separate() would be too pivot to long format and separate and then pivot back to wide, but as you asked for a map() solution you can do the following. Note also that you're using data.table which has different indexing behavior to a data frame or tibble.

library(data.table)
library(tidyverse)

vars <- c("id","id2")

imap(vars, ~separate(dt[, .x, with = FALSE], .x, sep=" ", c(.x, paste0("numberid", .y))))  %>%
  bind_cols(dt[, setdiff(names(dt), vars), with = FALSE])

      id numberid1   id2 numberid2           y
1: Alpha         1 gamma         1 -0.69201999
2: Alpha         2 gamma         2 -0.39839537
3: Alpha         3 gamma         3 -1.24125212
4:  Beta         1 Delta         1 -0.02165367

Alternatively:

dt %>%
  rowid_to_column() %>%
  pivot_longer(-c(y, rowid)) %>%
  separate(value, c("id", "number")) %>%
  pivot_wider(names_from = name, values_from = c(id, number))

2 Comments

thanks a lot for your reply, would you mind showing the pivot approach and how you would index if I was using a dataframe. Given the code complexity, my sense is that map is not very convenient to iterate over variables.
See update. Re my comment about indexing and data.table - that was just to explain why dt[, .x, with = FALSE] was used instead of dt[.x]. See cran.r-project.org/web/packages/data.table/vignettes/…
1

An option with fread from data.table

library(data.table)
nm1 <- names(dt)[1:2]
nm2 <- paste0('number', nm1)
nm3 <- c(rbind(nm1, nm2))
setnames(dt[, c(list(y), lapply(.SD,  function(x) 
      fread(text = x))), .SDcols= nm1], c("y", nm3))[]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.