4

my data frame look like this:

      A S1 S2 S3 S4
1   ex1  1  0  0  0
2   ex2  0  1  0  0
3   ex3  0  0  1  0
4   ex4  1  0  0  0
5   ex5  0  0  0  1
6   ex6  0  1  0  0
7   ex7  1  0  0  0
8   ex8  0  1  0  0
9   ex9  0  0  1  0
10 ex10  1  0  0  0

i need to have it as a single factor list like:

A   Type
ex1 S1
ex2 S2
ex3 S3
ex4 S1
ex5 S4
ex6 S2
ex7 S1
ex8 S2
ex9 S3
ex10 S1

anybody help my problem?

0

4 Answers 4

3

Assuming d is the data, the new column could be obtained with

d$type <- names(d[-1])[apply(d[-1] == 1, 1, which)]
d[c(1, 6)]
#       A type
# 1   ex1   S1
# 2   ex2   S2
# 3   ex3   S3
# 4   ex4   S1
# 5   ex5   S4
# 6   ex6   S2
# 7   ex7   S1
# 8   ex8   S2
# 9   ex9   S3
# 10 ex10   S1
Sign up to request clarification or add additional context in comments.

5 Comments

+1, nice! Just one question: is it fully vectorized although you also use apply in your answer?
well, not fully. But a lot of times you can put sapply and apply in there to make it faster.
Hm, I can imagine that using sapply instead of apply (where possible) is faster, but what do you mean by using apply to make it faster? Not sure I understand.
I mean when it's between the [ and ]
Yeah, I just figured that's what you mean. You're right there - good point.
2

You can use apply and check the max in the columns 2-5 and then return the corresponding column name:

df$Type <- apply(df[2:5], 1, function(x) names(df)[which.max(x)+1] )

Afterwards, you can delete the columns you don't need anymore:

df <- df[,-c(2:5)]

2 Comments

its working for small datasets with 5 columns, but when with large dataset its throwing error "object of type 'closure' is not subsettable". Any idea
Not sure, did you check that all relevant columns are either logical or numeric or integer? As long as the input is the same type, there should be no problems with larger data sets.
2

Could also do (if dat is your data set)

library(reshape2)
dat <- melt(dat, id = "A")
dat[dat$value > 0, 1:2]

Comments

0

You could try: If df is the dataframe

data.frame(A=df$A, Type=rep(names(df)[-1], nrow(df))[!!t(df[,-1])])
    A Type
1   ex1   S1
2   ex2   S2
3   ex3   S3
4   ex4   S1
5   ex5   S4
6   ex6   S2
7   ex7   S1
8   ex8   S2
9   ex9   S3
10 ex10   S1

Also:

   names(df)[-1][t(df[-1])*seq_len(ncol(df)-1)]
 [1] "S1" "S2" "S3" "S1" "S4" "S2" "S1" "S2" "S3" "S1"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.