map() to iterate over columns of dataframe

Question

I would like to use map() from the purrr package to iterate over a subset of variables of my data frame. Is there a standard and convenient approach that? take the following example dataset:

library(data.table)
library(purrr)
dt <- data.table(id= c("Alpha 1","Alpha 2","Alpha 3","Beta 1"),
id2= c("gamma 1","gamma 2","gamma 3","Delta 1") ,
y = rnorm(4))
        id     id2          y
1: Alpha 1 gamma 1 -1.1184009
2: Alpha 2 gamma 2  0.4347047
3: Alpha 3 gamma 3  0.2318315
4:  Beta 1 Delta 1  1.2640080

I would like to split my id columns every time there is a space (" "). The final dataset should look like this.

      id numberid   id2 numberid2           y
1: Alpha        1 gamma         1 -1.45772675
2: Alpha        2 gamma         2 -1.07430118
3: Alpha        3 gamma         3 -0.53454071
4:  Beta        1 Delta         1 -0.05854228

I know how to do this one column at the time:

dt_m <- dt%>%separate(id,
         sep=" ", c("id","numberid"))
      id numberid     id2          y
1: Alpha        1 gamma 1  2.0789930
2: Alpha        2 gamma 2 -0.2528485
3: Alpha        3 gamma 3  0.1332267
4:  Beta        1 Delta 1  1.9299524

But I would like to iterate this using map over a number of columns. Does anyone knows a convenient way to

iterate with map over a set of columns, returning a data frame
and using the columns both for indexing and as a character sting (to paste number"id" and number"id2")?

I have tried something like this but it produces an empty data frame

vars <- c("id","id2")
dt2 <- dt%>%map_df(vars,~separate(.x,sep=" ", c((.x), "number")))

thanks a lot for your help

This might be helpful tstrsplit to different columns in one round — markus
– markus, Commented Dec 16, 2020 at 11:53

Ronak Shah · Accepted Answer · 2020-12-16 12:13:08Z

3

Use cSplit which will allow you to do this for multiple columns in one go.

splitstackshape::cSplit(dt, c('id', 'id2'), sep = ' ')

#            y  id_1 id_2 id2_1 id2_2
#1:  0.4037779 Alpha    1 gamma     1
#2: -0.3753461 Alpha    2 gamma     2
#3:  0.8014951 Alpha    3 gamma     3
#4: -1.3539683  Beta    1 Delta     1

answered Dec 16, 2020 at 12:13

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Alex Over a year ago

Thanks this is really useful to deal with the issue at hand!

Alex Over a year ago

Dear @RonakShah, I just have a follow up question, is it possible to specify the resulting column names (e.g. "id" "numberid") using cSplit?

Ronak Shah Over a year ago

No..I don't think so you can do it within cSplit.

iroha · Accepted Answer · 2020-12-16 12:41:50Z

1

I think a more typical tidyverse approach in using separate() would be too pivot to long format and separate and then pivot back to wide, but as you asked for a map() solution you can do the following. Note also that you're using data.table which has different indexing behavior to a data frame or tibble.

library(data.table)
library(tidyverse)

vars <- c("id","id2")

imap(vars, ~separate(dt[, .x, with = FALSE], .x, sep=" ", c(.x, paste0("numberid", .y))))  %>%
  bind_cols(dt[, setdiff(names(dt), vars), with = FALSE])

      id numberid1   id2 numberid2           y
1: Alpha         1 gamma         1 -0.69201999
2: Alpha         2 gamma         2 -0.39839537
3: Alpha         3 gamma         3 -1.24125212
4:  Beta         1 Delta         1 -0.02165367

Alternatively:

dt %>%
  rowid_to_column() %>%
  pivot_longer(-c(y, rowid)) %>%
  separate(value, c("id", "number")) %>%
  pivot_wider(names_from = name, values_from = c(id, number))

edited Dec 16, 2020 at 12:41

answered Dec 16, 2020 at 12:01

iroha

35.3k4 gold badges55 silver badges66 bronze badges

2 Comments

Alex Over a year ago

thanks a lot for your reply, would you mind showing the pivot approach and how you would index if I was using a dataframe. Given the code complexity, my sense is that map is not very convenient to iterate over variables.

iroha Over a year ago

See update. Re my comment about indexing and data.table - that was just to explain why dt[, .x, with = FALSE] was used instead of dt[.x]. See cran.r-project.org/web/packages/data.table/vignettes/…

akrun · Accepted Answer · 2020-12-16 17:37:54Z

1

An option with fread from data.table

library(data.table)
nm1 <- names(dt)[1:2]
nm2 <- paste0('number', nm1)
nm3 <- c(rbind(nm1, nm2))
setnames(dt[, c(list(y), lapply(.SD,  function(x) 
      fread(text = x))), .SDcols= nm1], c("y", nm3))[]

answered Dec 16, 2020 at 17:37

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

map() to iterate over columns of dataframe

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related