Replace multiple column values with values from other columns if pattern matches (row-wise) in R

Question

Hello, folks!

I have tried to find a solution to this following problem that I think it would be pretty simple. Perhaps it is (for some of you), but I couldn’t solve the problem yet. What do I want is to modify all zeros and ones from columns 6 to 10, replacing the 0 for the third column values, and 1 for the fourth values in a row-wise manner.

That’s a reproducible example:

# Creating dataframe vectors
chr= rep(10,10)
id= paste0("name", 1:10)
pos= seq(1,1000, length.out = 10)
allele1= c("T","T","G","G","C","T","C","C","G","C")
allele2= c("A","A","T","T","C","T","C","C","T","T")
col6= sample(c(0,1),10, TRUE)
col7= sample(c(0,1),10, TRUE)
col8= sample(c(0,1),10, TRUE)
col9= sample(c(0,1),10, TRUE)
col10= sample(c(0,1),10, TRUE)

df= data.frame(chr,id, pos, allele1, allele2, col6, col7, col8, col9, col10)
df

   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    1    1    1    1     1
2   10  name2  112       T       A    0    0    0    1     1
3   10  name3  223       G       T    1    0    1    1     0
4   10  name4  334       G       T    1    1    0    1     1
5   10  name5  445       C       C    0    0    1    0     1
6   10  name6  556       T       T    0    1    0    1     1
7   10  name7  667       C       C    0    1    0    0     1
8   10  name8  778       C       C    0    0    1    1     1
9   10  name9  889       G       T    1    1    1    1     0
10  10 name10 1000       C       T    0    1    1    0     1

Accordingly to this output, I would expect:

df
   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    A    A    A    A     A
2   10  name2  112       T       A    T    T    T    A     A
3   10  name3  223       G       T    T    G    T    T     G
4   10  name4  334       G       T    T    T    G    T     T
5   10  name5  445       C       C    C    C    C    C     C
6   10  name6  556       T       T    T    T    T    T     T
7   10  name7  667       C       C    C    C    C    C     C
8   10  name8  778       C       C    C    C    C    C     C
9   10  name9  889       G       T    T    T    T    T     G
10  10 name10 1000       C       T    C    T    T    C     T

I have tried using the function 'within' and 'apply' inside a for loop, but it seems like I am indexing wrongly. I bet this task is much easier in Perl, but I'd really like to use R for practicing.

Here's an example of the code I've tried:

within(df, {
  for(i in 1:nrow(df)){
  df[i,6:length(df)]= ifelse(df[i,6:length(df)] == 0, df[i,4],df[i,5])
  }
})

for(i in 1:nrow(df)){
  df[,6:length(df)]= apply(df[,6:length(df)]==0,2,ifelse,df[i,4],df[i,5])
}

I would appreciate any help!

Sincerely yours

www · Accepted Answer · 2017-07-14 21:35:02Z

2

Solution 1

We can use mutate_at from the dplyr package. df2 is the final output.

# Load package
library(dplyr)

# Process the data
df2 <- df %>%
  mutate_at(.vars = vars(contains("col")), 
            .funs = function(Col){
              Col2 <- ifelse(Col == 1, allele2, allele1)
              return(Col2)
            })

Solution 2

We can use functions from both tidyr and dplyr. df3 is the final output.

library(dplyr)
library(tidyr)
df3 <- df %>%
  mutate(allele1 = as.character(allele1), allele2 = as.character(allele2)) %>%
  gather(Col, Value, contains("col")) %>%
  mutate(Value = ifelse(Value == 1, allele2, allele1)) %>%
  spread(Col, Value) %>%
  select(colnames(df))

Data Preparation

# Set seed for reproducibility
set.seed(123)

# Creating dataframe vectors
chr= rep(10,10)
id= paste0("name", 1:10)
pos= seq(1,1000, length.out = 10)
allele1= c("T","T","G","G","C","T","C","C","G","C")
allele2= c("A","A","T","T","C","T","C","C","T","T")
col6= sample(c(0,1),10, TRUE)
col7= sample(c(0,1),10, TRUE)
col8= sample(c(0,1),10, TRUE)
col9= sample(c(0,1),10, TRUE)
col10= sample(c(0,1),10, TRUE)

df= data.frame(chr,id, pos, allele1, allele2, col6, col7, col8, col9, col10)

edited Jul 14, 2017 at 21:35

answered Jul 14, 2017 at 21:21

www

39.3k12 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Cainã Max Couto da Silva Over a year ago

Awesome! Thanks a lot!! Actually, I had to replace ".vars" by ".cols" due to an error asking for this argument, but it worked perfectly. If someone knows different ways to do that, I'd be equally grateful, since I am learning, practicing and using R quite recently.

www Over a year ago

.col is deprecated in the latest version of dplyr. If you update your dplyr to 0.7.1, .var is the recommended argument to use.

Cainã Max Couto da Silva Over a year ago

Indeed, I've installed this package at the very beginning and didn't use or update it yet. Thanks again, @ycw !

User2321 · Accepted Answer · 2017-07-14 21:31:14Z

2

You could try the following:

chr= rep(10,10)
id= paste0("name", 1:10)
pos= seq(1,1000, length.out = 10)
allele1= c("T","T","G","G","C","T","C","C","G","C")
allele2= c("A","A","T","T","C","T","C","C","T","T")
set.seed(1) #for reproducibility
col6= sample(c(0,1),10, TRUE)
col7= sample(c(0,1),10, TRUE)
col8= sample(c(0,1),10, TRUE)
col9= sample(c(0,1),10, TRUE)
col10= sample(c(0,1),10, TRUE)

df= data.frame(chr,id, pos, allele1, allele2, col6, col7, col8, col9, col10, stringsAsFactors = F)

Note that as mentioned in the comments (thx @ycw) here I have stringsAsFactors = F so as to avoid the factor conversion!! Otherwise ifelse will just give integers instead of the character.

> df
   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    0    0    1    0     1
2   10  name2  112       T       A    0    0    0    1     1
3   10  name3  223       G       T    1    1    1    0     1
4   10  name4  334       G       T    1    0    0    0     1
5   10  name5  445       C       C    0    1    0    1     1
6   10  name6  556       T       T    1    0    0    1     1
7   10  name7  667       C       C    1    1    0    1     0
8   10  name8  778       C       C    1    1    0    0     0
9   10  name9  889       G       T    1    0    1    1     1
10  10 name10 1000       C       T    0    1    0    0     1

df[, c(6:10)] <- lapply(df[, c(6:10)], function(x) ifelse(x == 0, df[, 4], df[, 5]))

> df
   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    T    T    A    T     A
2   10  name2  112       T       A    T    T    T    A     A
3   10  name3  223       G       T    T    T    T    G     T
4   10  name4  334       G       T    T    G    G    G     T
5   10  name5  445       C       C    C    C    C    C     C
6   10  name6  556       T       T    T    T    T    T     T
7   10  name7  667       C       C    C    C    C    C     C
8   10  name8  778       C       C    C    C    C    C     C
9   10  name9  889       G       T    T    G    T    T     T
10  10 name10 1000       C       T    C    T    C    C     T

edited Jul 14, 2017 at 21:31

answered Jul 14, 2017 at 21:28

User2321

3,09230 silver badges55 bronze badges

2 Comments

www Over a year ago

Good answer. But be careful that because allele1 and allele2 are factor, this solution fills in integers. Convert allele1 and allele2 to character before running this code.

User2321 Over a year ago

A yes I did in my R session but forgot to add in the solution! Thank you!

Collectives™ on Stack Overflow

Replace multiple column values with values from other columns if pattern matches (row-wise) in R

2 Answers 2

Solution 1

Solution 2

Data Preparation

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Solution 1

Solution 2

Data Preparation

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related