How do I select columns conditional to the value of another column? (dplyr)

Question

Name1    Col1    Col2    Col3    Name2    Col4    Col5    Col6    Col7
John     A       A       A       Alex       B       B       B      1
Alex     B       B       B       John       A       A       A      0

Looking at the data frame above, I would like to select the data based on the value of Col7. Specifically, if Col7 = 1 then I want to select Columns 1, 2, and 3. If Col7 = 0, then Cols 4,5,6 are selected. Cols 4,5,6 are the same variable as Cols 1,2,3 just associated to Alex instead of John (in row 1). Therefore, John's data is selected both times, and this will be the same for every pair.

I was thinking some form of select in "Dplyr" would work, but I am having trouble with the conditional selection aspect.

My final data frame would appear as follows:

Name1    Col1    Col2    Col3
John      A       A       A
John      A       A       A

Omar · Accepted Answer · 2018-03-03 02:13:26Z

2

Hi try something very basic (combining filter and select_at) :

df1 <- df %>% 
  filter(Col7 == 1) %>% 
  select_at(vars(Name = Name1, Col1, Col2, Col3))
df2 <- df %>% 
  filter(Col7 == 0) %>% 
  select_at(vars(Name = Name2, Col1 = Col4, Col2 = Col5, Col3 = Col6))
df <- bind_rows(df1, df2)

You get exactly the data frame you seek :

> df
  Name Col1 Col2 Col3
1 John    A    A    A
2 John    A    A    A

answered Mar 3, 2018 at 2:13

Omar

6356 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Frank · Accepted Answer · 2018-03-02 20:45:08Z

You can use melt from data.table or reshape2 and then left join on the condition:

library(data.table)
setDT(d)

d[, row := .I]
md = melt(d, id=c("row", "Col7"), 
  meas = Map(c, 1:4, 5:8), 
  variable.factor = FALSE,
  variable.name = "colset",
  value.name = names(d)[1:4])
#    row Col7 colset Name1 Col1 Col2 Col3
# 1:   1    1      1  John    A    A    A
# 2:   2    0      1  Alex    B    B    B
# 3:   1    1      2  Alex    B    B    B
# 4:   2    0      2  John    A    A    A

cond = data.table(Col7 = 0:1, colset = c("2", "1"))
#    Col7 colset
# 1:    0      2
# 2:    1      1

res = md[cond, on=names(cond), nomatch=0]
#    row Col7 colset Name1 Col1 Col2 Col3
# 1:   2    0      2  John    A    A    A
# 2:   1    1      1  John    A    A    A

This approach extends to more than two sets of columns, eg with meas=Map(c, 1:4, 5:8, 9:12).

alistaire · Accepted Answer · 2018-03-02 20:51:11Z

Here's a tidyverse (more tidyr than dplyr) approach. It's fairly verbose, as your original data is not in a tidy form, so most of the code is just getting to a long form, cleaning, and spreading back to wide form.

library(tidyverse)

df <- data_frame(Name1 = c("John", "Alex"), 
                 Col1 = c("A", "B"), Col2 = c("A", "B"), Col3 = c("A", "B"), 
                 Name2 = c("Alex", "John"), 
                 Col4 = c("B", "A"), Col5 = c("B", "A"), Col6 = c("B", "A"), 
                 Col7 = c(1L, 0L))

df %>% 
    # reshape to long form
    gather(col, col_val, num_range('Col', 1:6)) %>% 
    gather(name_var, name, contains('Name')) %>% 
    # clean, subset, clean for spreading
    mutate(col = parse_number(col), 
           name_var = parse_number(name_var)) %>% 
    filter(ifelse(Col7 == 1, 
                  col %in% 1:3 & name_var == 1, 
                  col %in% 4:6 & name_var == 2)) %>% 
    mutate(col = paste0('Col', col %% 3 + 1), 
           name_var = 'Name') %>% 
    # reshape back to wide form
    spread(name_var, name) %>% 
    spread(col, col_val) %>% 
    # clean
    select(-Col7)
#> # A tibble: 2 x 4
#>   Name  Col1  Col2  Col3 
#>   <chr> <chr> <chr> <chr>
#> 1 John  A     A     A    
#> 2 John  A     A     A

sm925 · Accepted Answer · 2018-03-02 20:55:30Z

1

In base R:

df <- read.table(text = "Name1    Col1    Col2    Col3    Name2    Col4    Col5    Col6    Col7
John     A       A       A       Alex       B       B       B      1
                  Alex     B       B       B       John       A       A       A      0'", 
                 header = TRUE, stringsAsFactors = FALSE)


for(i in 1:nrow(df)){
  if(df$Col7[i] == 1){
    df1 <- df[i, c("Name1", "Col1", "Col2", "Col3")]
  }else if(df$Col7[i] == 0){
    df2 <- df[i, c("Name2", "Col4", "Col5", "Col6")]
  }
}

colnames(df2)[1] <- "Name1"
colnames(df2)[2] <- "Col1"
colnames(df2)[3] <- "Col2"
colnames(df2)[4] <- "Col3"

df <- rbind(df1, df2)

   Name1 Col1 Col2 Col3
1  John    A    A    A
2  John    A    A    A

edited Mar 2, 2018 at 20:55

answered Mar 2, 2018 at 20:49

sm925

2,6881 gold badge19 silver badges33 bronze badges

Collectives™ on Stack Overflow

How do I select columns conditional to the value of another column? (dplyr)

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related