How to create a factor from a binary indicator matrix?

Question

Say I have the following matrix mat, which is a binary indicator matrix for the levels A, B, and C for a set of 5 observations:

mat <- matrix(c(1,0,0,
                1,0,0,
                0,1,0,
                0,1,0,
                0,0,1), ncol = 3, byrow = TRUE)
colnames(mat) <- LETTERS[1:3]

> mat
     A B C
[1,] 1 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 1 0
[5,] 0 0 1

I want to convert that into a single factor such that the output is equivalent to fac defines as:

> fac <- factor(rep(LETTERS[1:3], times = c(2,2,1)))
> fac
[1] A A B B C
Levels: A B C

Extra points if you get the labels from the colnames of mat, but a set of numeric codes (e.g. c(1,1,2,2,3)) would also be acceptable as desired output.

Tomas · Accepted Answer · 2011-10-11 16:30:07Z

15

Elegant solution with matrix multiplication (and shortest up to now):

as.factor(colnames(mat)[mat %*% 1:ncol(mat)])

answered Oct 11, 2011 at 16:30

Tomas

60.2k54 gold badges251 silver badges386 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Gavin Simpson Over a year ago

seq_len(ncol(mat)) would be more robust, but as your answer is simple, elegant and deals with the possibility of an unordered indicator matrix, you get the Accept. The ordering could easily be solved in the other solutions, but that will add to their length. Thanks Thomas.

Tomas Over a year ago

@Gavin, thanks. Regarding robustness - how is seq_len more robust? You mean the case when ncol(mat) == 0? In that case it wouldn't work either.

Gavin Simpson Over a year ago

I know, but 1:ncol(mat) gives 1,0 in that case, and seq_len(ncol(mat)) returns a zero length integer vector - which is the right answer. You could imagine cases where 1:ncol(mat) might work but give the wrong answer whilst seq_len(ncol(mat)) would cause it to fail appropriately. I'm just always wary of 1:foo where foo is computed.

Tomas Over a year ago

@Gavin, thanks, good note. But as I tested it now, you don't need to worry with ncol(). It seems it will never return something smaller than 1 without an error (which is quite expected behaviour).

Andrie · Accepted Answer · 2011-10-11 14:04:54Z

8

This solution makes use of the arr.ind=TRUE argument of which, returning the matching positions as array locations. These are then used to index the colnames:

> factor(colnames(mat)[which(mat==1, arr.ind=TRUE)[, 2]])
[1] A A B B C
Levels: A B C

Decomposing into steps:

> which(mat==1, arr.ind=TRUE)
     row col
[1,]   1   1
[2,]   2   1
[3,]   3   2
[4,]   4   2
[5,]   5   3

Use the values of the second column, i.e. which(...)[, 2] and index colnames:

> colnames(mat)[c(1, 1, 2, 2, 3)]
[1] "A" "A" "B" "B" "C"

And then convert to a factor

answered Oct 11, 2011 at 14:04

Andrie

180k52 gold badges456 silver badges504 bronze badges

3 Comments

Tomas Over a year ago

Will not work if the factors are not ordered, try it on matrix mat2 = rbind(mat, c(1, 0, 0)).

Tomas Over a year ago

The problem is that which() is doing it by columns, not by rows. You can fix it by transposing it (swapping rows/columns): factor(colnames(mat2)[which(t(mat2)==1, arr.ind=TRUE)[,1]]). I don't know, maybe there is a better way how to tell which() to go by rows, not by columns!

Arun Over a year ago

This can be overcome by taking its transpose: rownames(which(t(mat2) == 1, arr.ind=T)) = "A", "A", "B", "B", "C", "A".

mdsumner · Accepted Answer · 2011-10-11 14:07:37Z

5

One way is to replicate the names out by row number and index directly with the matrix, then wrap that with factor to restore the levels:

factor(rep(colnames(mat), each = nrow(mat))[as.logical(mat)])
[1] A A B B C
Levels: A B C

If this is from model.matrix, the colnames have fac prepended, and so this should work the same but removing the extra text:

factor(gsub("^fac", "", rep(colnames(mat), each = nrow(mat))[as.logical(mat)]))

answered Oct 11, 2011 at 14:07

mdsumner

29.6k6 gold badges85 silver badges91 bronze badges

1 Comment

Tomas Over a year ago

Again, it will not work if the factors are not ordered, try it on matrix mat2 = rbind(mat, c(1, 0, 0)).

Gavin Simpson · Accepted Answer · 2011-10-11 14:08:20Z

4

You could use something like this:

lvls<-apply(mat, 1, function(currow){match(1, currow)})
fac<-factor(lvls, 1:3, labels=colnames(mat))

edited Oct 11, 2011 at 14:08

Gavin Simpson

176k28 gold badges405 silver badges461 bronze badges

answered Oct 11, 2011 at 14:06

Nick Sabbe

12k1 gold badge45 silver badges57 bronze badges

Comments

Ramnath · Accepted Answer · 2011-10-11 23:22:06Z

1

Here is another one

factor(rep(colnames(mat), colSums(mat)))

answered Oct 11, 2011 at 23:22

Ramnath

55.9k16 gold badges129 silver badges155 bronze badges

1 Comment

Tomas Over a year ago

Will not work if the factors are not ordered, try it on matrix mat2 = rbind(mat, c(1, 0, 0)).

Collectives™ on Stack Overflow

How to create a factor from a binary indicator matrix?

5 Answers 5

4 Comments

3 Comments

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

3 Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related