Converting multiple boolean columns to single factor column

Question

my data frame look like this:

      A S1 S2 S3 S4
1   ex1  1  0  0  0
2   ex2  0  1  0  0
3   ex3  0  0  1  0
4   ex4  1  0  0  0
5   ex5  0  0  0  1
6   ex6  0  1  0  0
7   ex7  1  0  0  0
8   ex8  0  1  0  0
9   ex9  0  0  1  0
10 ex10  1  0  0  0

i need to have it as a single factor list like:

A   Type
ex1 S1
ex2 S2
ex3 S3
ex4 S1
ex5 S4
ex6 S2
ex7 S1
ex8 S2
ex9 S3
ex10 S1

anybody help my problem?

Rich Scriven · Accepted Answer · 2019-02-27 22:37:48Z

3

Assuming d is the data, the new column could be obtained with

d$type <- names(d[-1])[apply(d[-1] == 1, 1, which)]
d[c(1, 6)]
#       A type
# 1   ex1   S1
# 2   ex2   S2
# 3   ex3   S3
# 4   ex4   S1
# 5   ex5   S4
# 6   ex6   S2
# 7   ex7   S1
# 8   ex8   S2
# 9   ex9   S3
# 10 ex10   S1

edited Feb 27, 2019 at 22:37

answered Jun 23, 2014 at 8:45

Rich Scriven

99.8k11 gold badges190 silver badges252 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

talat Over a year ago

+1, nice! Just one question: is it fully vectorized although you also use apply in your answer?

Rich Scriven Over a year ago

well, not fully. But a lot of times you can put sapply and apply in there to make it faster.

talat Over a year ago

Hm, I can imagine that using sapply instead of apply (where possible) is faster, but what do you mean by using apply to make it faster? Not sure I understand.

Rich Scriven Over a year ago

I mean when it's between the [ and ]

talat Over a year ago

Yeah, I just figured that's what you mean. You're right there - good point.

talat · Accepted Answer · 2014-06-23 08:16:39Z

2

You can use apply and check the max in the columns 2-5 and then return the corresponding column name:

df$Type <- apply(df[2:5], 1, function(x) names(df)[which.max(x)+1] )

Afterwards, you can delete the columns you don't need anymore:

df <- df[,-c(2:5)]

answered Jun 23, 2014 at 8:16

talat

70.5k22 gold badges130 silver badges158 bronze badges

2 Comments

Samsudhin Over a year ago

its working for small datasets with 5 columns, but when with large dataset its throwing error "object of type 'closure' is not subsettable". Any idea

talat Over a year ago

Not sure, did you check that all relevant columns are either logical or numeric or integer? As long as the input is the same type, there should be no problems with larger data sets.

David Arenburg · Accepted Answer · 2014-06-23 08:23:37Z

2

Could also do (if dat is your data set)

library(reshape2)
dat <- melt(dat, id = "A")
dat[dat$value > 0, 1:2]

answered Jun 23, 2014 at 8:23

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

Comments

akrun · Accepted Answer · 2014-06-23 10:01:31Z

0

You could try: If df is the dataframe

data.frame(A=df$A, Type=rep(names(df)[-1], nrow(df))[!!t(df[,-1])])
    A Type
1   ex1   S1
2   ex2   S2
3   ex3   S3
4   ex4   S1
5   ex5   S4
6   ex6   S2
7   ex7   S1
8   ex8   S2
9   ex9   S3
10 ex10   S1

Also:

   names(df)[-1][t(df[-1])*seq_len(ncol(df)-1)]
 [1] "S1" "S2" "S3" "S1" "S4" "S2" "S1" "S2" "S3" "S1"

edited Jun 23, 2014 at 10:01

answered Jun 23, 2014 at 9:04

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Converting multiple boolean columns to single factor column

4 Answers 4

5 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related