1

Hello, folks!

I have tried to find a solution to this following problem that I think it would be pretty simple. Perhaps it is (for some of you), but I couldn’t solve the problem yet. What do I want is to modify all zeros and ones from columns 6 to 10, replacing the 0 for the third column values, and 1 for the fourth values in a row-wise manner.

That’s a reproducible example:

# Creating dataframe vectors
chr= rep(10,10)
id= paste0("name", 1:10)
pos= seq(1,1000, length.out = 10)
allele1= c("T","T","G","G","C","T","C","C","G","C")
allele2= c("A","A","T","T","C","T","C","C","T","T")
col6= sample(c(0,1),10, TRUE)
col7= sample(c(0,1),10, TRUE)
col8= sample(c(0,1),10, TRUE)
col9= sample(c(0,1),10, TRUE)
col10= sample(c(0,1),10, TRUE)

df= data.frame(chr,id, pos, allele1, allele2, col6, col7, col8, col9, col10)
df

   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    1    1    1    1     1
2   10  name2  112       T       A    0    0    0    1     1
3   10  name3  223       G       T    1    0    1    1     0
4   10  name4  334       G       T    1    1    0    1     1
5   10  name5  445       C       C    0    0    1    0     1
6   10  name6  556       T       T    0    1    0    1     1
7   10  name7  667       C       C    0    1    0    0     1
8   10  name8  778       C       C    0    0    1    1     1
9   10  name9  889       G       T    1    1    1    1     0
10  10 name10 1000       C       T    0    1    1    0     1

Accordingly to this output, I would expect:

df
   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    A    A    A    A     A
2   10  name2  112       T       A    T    T    T    A     A
3   10  name3  223       G       T    T    G    T    T     G
4   10  name4  334       G       T    T    T    G    T     T
5   10  name5  445       C       C    C    C    C    C     C
6   10  name6  556       T       T    T    T    T    T     T
7   10  name7  667       C       C    C    C    C    C     C
8   10  name8  778       C       C    C    C    C    C     C
9   10  name9  889       G       T    T    T    T    T     G
10  10 name10 1000       C       T    C    T    T    C     T

I have tried using the function 'within' and 'apply' inside a for loop, but it seems like I am indexing wrongly. I bet this task is much easier in Perl, but I'd really like to use R for practicing.

Here's an example of the code I've tried:

within(df, {
  for(i in 1:nrow(df)){
  df[i,6:length(df)]= ifelse(df[i,6:length(df)] == 0, df[i,4],df[i,5])
  }
})

for(i in 1:nrow(df)){
  df[,6:length(df)]= apply(df[,6:length(df)]==0,2,ifelse,df[i,4],df[i,5])
}

I would appreciate any help!

Sincerely yours

2 Answers 2

2

Solution 1

We can use mutate_at from the dplyr package. df2 is the final output.

# Load package
library(dplyr)

# Process the data
df2 <- df %>%
  mutate_at(.vars = vars(contains("col")), 
            .funs = function(Col){
              Col2 <- ifelse(Col == 1, allele2, allele1)
              return(Col2)
            })

Solution 2

We can use functions from both tidyr and dplyr. df3 is the final output.

library(dplyr)
library(tidyr)
df3 <- df %>%
  mutate(allele1 = as.character(allele1), allele2 = as.character(allele2)) %>%
  gather(Col, Value, contains("col")) %>%
  mutate(Value = ifelse(Value == 1, allele2, allele1)) %>%
  spread(Col, Value) %>%
  select(colnames(df))

Data Preparation

# Set seed for reproducibility
set.seed(123)

# Creating dataframe vectors
chr= rep(10,10)
id= paste0("name", 1:10)
pos= seq(1,1000, length.out = 10)
allele1= c("T","T","G","G","C","T","C","C","G","C")
allele2= c("A","A","T","T","C","T","C","C","T","T")
col6= sample(c(0,1),10, TRUE)
col7= sample(c(0,1),10, TRUE)
col8= sample(c(0,1),10, TRUE)
col9= sample(c(0,1),10, TRUE)
col10= sample(c(0,1),10, TRUE)

df= data.frame(chr,id, pos, allele1, allele2, col6, col7, col8, col9, col10)
Sign up to request clarification or add additional context in comments.

3 Comments

Awesome! Thanks a lot!! Actually, I had to replace ".vars" by ".cols" due to an error asking for this argument, but it worked perfectly. If someone knows different ways to do that, I'd be equally grateful, since I am learning, practicing and using R quite recently.
.col is deprecated in the latest version of dplyr. If you update your dplyr to 0.7.1, .var is the recommended argument to use.
Indeed, I've installed this package at the very beginning and didn't use or update it yet. Thanks again, @ycw !
2

You could try the following:

chr= rep(10,10)
id= paste0("name", 1:10)
pos= seq(1,1000, length.out = 10)
allele1= c("T","T","G","G","C","T","C","C","G","C")
allele2= c("A","A","T","T","C","T","C","C","T","T")
set.seed(1) #for reproducibility
col6= sample(c(0,1),10, TRUE)
col7= sample(c(0,1),10, TRUE)
col8= sample(c(0,1),10, TRUE)
col9= sample(c(0,1),10, TRUE)
col10= sample(c(0,1),10, TRUE)

df= data.frame(chr,id, pos, allele1, allele2, col6, col7, col8, col9, col10, stringsAsFactors = F)

Note that as mentioned in the comments (thx @ycw) here I have stringsAsFactors = F so as to avoid the factor conversion!! Otherwise ifelse will just give integers instead of the character.

> df
   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    0    0    1    0     1
2   10  name2  112       T       A    0    0    0    1     1
3   10  name3  223       G       T    1    1    1    0     1
4   10  name4  334       G       T    1    0    0    0     1
5   10  name5  445       C       C    0    1    0    1     1
6   10  name6  556       T       T    1    0    0    1     1
7   10  name7  667       C       C    1    1    0    1     0
8   10  name8  778       C       C    1    1    0    0     0
9   10  name9  889       G       T    1    0    1    1     1
10  10 name10 1000       C       T    0    1    0    0     1

df[, c(6:10)] <- lapply(df[, c(6:10)], function(x) ifelse(x == 0, df[, 4], df[, 5]))

> df
   chr     id  pos allele1 allele2 col6 col7 col8 col9 col10
1   10  name1    1       T       A    T    T    A    T     A
2   10  name2  112       T       A    T    T    T    A     A
3   10  name3  223       G       T    T    T    T    G     T
4   10  name4  334       G       T    T    G    G    G     T
5   10  name5  445       C       C    C    C    C    C     C
6   10  name6  556       T       T    T    T    T    T     T
7   10  name7  667       C       C    C    C    C    C     C
8   10  name8  778       C       C    C    C    C    C     C
9   10  name9  889       G       T    T    G    T    T     T
10  10 name10 1000       C       T    C    T    C    C     T

2 Comments

Good answer. But be careful that because allele1 and allele2 are factor, this solution fills in integers. Convert allele1 and allele2 to character before running this code.
A yes I did in my R session but forgot to add in the solution! Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.