3
Name1    Col1    Col2    Col3    Name2    Col4    Col5    Col6    Col7
John     A       A       A       Alex       B       B       B      1
Alex     B       B       B       John       A       A       A      0

Looking at the data frame above, I would like to select the data based on the value of Col7. Specifically, if Col7 = 1 then I want to select Columns 1, 2, and 3. If Col7 = 0, then Cols 4,5,6 are selected. Cols 4,5,6 are the same variable as Cols 1,2,3 just associated to Alex instead of John (in row 1). Therefore, John's data is selected both times, and this will be the same for every pair.

I was thinking some form of select in "Dplyr" would work, but I am having trouble with the conditional selection aspect.

My final data frame would appear as follows:

Name1    Col1    Col2    Col3
John      A       A       A
John      A       A       A

4 Answers 4

2

Hi try something very basic (combining filter and select_at) :

df1 <- df %>% 
  filter(Col7 == 1) %>% 
  select_at(vars(Name = Name1, Col1, Col2, Col3))
df2 <- df %>% 
  filter(Col7 == 0) %>% 
  select_at(vars(Name = Name2, Col1 = Col4, Col2 = Col5, Col3 = Col6))
df <- bind_rows(df1, df2)

You get exactly the data frame you seek :

> df
  Name Col1 Col2 Col3
1 John    A    A    A
2 John    A    A    A
Sign up to request clarification or add additional context in comments.

Comments

1

You can use melt from data.table or reshape2 and then left join on the condition:

library(data.table)
setDT(d)

d[, row := .I]
md = melt(d, id=c("row", "Col7"), 
  meas = Map(c, 1:4, 5:8), 
  variable.factor = FALSE,
  variable.name = "colset",
  value.name = names(d)[1:4])
#    row Col7 colset Name1 Col1 Col2 Col3
# 1:   1    1      1  John    A    A    A
# 2:   2    0      1  Alex    B    B    B
# 3:   1    1      2  Alex    B    B    B
# 4:   2    0      2  John    A    A    A

cond = data.table(Col7 = 0:1, colset = c("2", "1"))
#    Col7 colset
# 1:    0      2
# 2:    1      1

res = md[cond, on=names(cond), nomatch=0]
#    row Col7 colset Name1 Col1 Col2 Col3
# 1:   2    0      2  John    A    A    A
# 2:   1    1      1  John    A    A    A

This approach extends to more than two sets of columns, eg with meas=Map(c, 1:4, 5:8, 9:12).

Comments

1

Here's a tidyverse (more tidyr than dplyr) approach. It's fairly verbose, as your original data is not in a tidy form, so most of the code is just getting to a long form, cleaning, and spreading back to wide form.

library(tidyverse)

df <- data_frame(Name1 = c("John", "Alex"), 
                 Col1 = c("A", "B"), Col2 = c("A", "B"), Col3 = c("A", "B"), 
                 Name2 = c("Alex", "John"), 
                 Col4 = c("B", "A"), Col5 = c("B", "A"), Col6 = c("B", "A"), 
                 Col7 = c(1L, 0L))

df %>% 
    # reshape to long form
    gather(col, col_val, num_range('Col', 1:6)) %>% 
    gather(name_var, name, contains('Name')) %>% 
    # clean, subset, clean for spreading
    mutate(col = parse_number(col), 
           name_var = parse_number(name_var)) %>% 
    filter(ifelse(Col7 == 1, 
                  col %in% 1:3 & name_var == 1, 
                  col %in% 4:6 & name_var == 2)) %>% 
    mutate(col = paste0('Col', col %% 3 + 1), 
           name_var = 'Name') %>% 
    # reshape back to wide form
    spread(name_var, name) %>% 
    spread(col, col_val) %>% 
    # clean
    select(-Col7)
#> # A tibble: 2 x 4
#>   Name  Col1  Col2  Col3 
#>   <chr> <chr> <chr> <chr>
#> 1 John  A     A     A    
#> 2 John  A     A     A

Comments

1

In base R:

df <- read.table(text = "Name1    Col1    Col2    Col3    Name2    Col4    Col5    Col6    Col7
John     A       A       A       Alex       B       B       B      1
                  Alex     B       B       B       John       A       A       A      0'", 
                 header = TRUE, stringsAsFactors = FALSE)


for(i in 1:nrow(df)){
  if(df$Col7[i] == 1){
    df1 <- df[i, c("Name1", "Col1", "Col2", "Col3")]
  }else if(df$Col7[i] == 0){
    df2 <- df[i, c("Name2", "Col4", "Col5", "Col6")]
  }
}

colnames(df2)[1] <- "Name1"
colnames(df2)[2] <- "Col1"
colnames(df2)[3] <- "Col2"
colnames(df2)[4] <- "Col3"

df <- rbind(df1, df2)

   Name1 Col1 Col2 Col3
1  John    A    A    A
2  John    A    A    A

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.