17

Say I have the following matrix mat, which is a binary indicator matrix for the levels A, B, and C for a set of 5 observations:

mat <- matrix(c(1,0,0,
                1,0,0,
                0,1,0,
                0,1,0,
                0,0,1), ncol = 3, byrow = TRUE)
colnames(mat) <- LETTERS[1:3]

> mat
     A B C
[1,] 1 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 1 0
[5,] 0 0 1

I want to convert that into a single factor such that the output is equivalent to fac defines as:

> fac <- factor(rep(LETTERS[1:3], times = c(2,2,1)))
> fac
[1] A A B B C
Levels: A B C

Extra points if you get the labels from the colnames of mat, but a set of numeric codes (e.g. c(1,1,2,2,3)) would also be acceptable as desired output.

5 Answers 5

15

Elegant solution with matrix multiplication (and shortest up to now):

as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
Sign up to request clarification or add additional context in comments.

4 Comments

seq_len(ncol(mat)) would be more robust, but as your answer is simple, elegant and deals with the possibility of an unordered indicator matrix, you get the Accept. The ordering could easily be solved in the other solutions, but that will add to their length. Thanks Thomas.
@Gavin, thanks. Regarding robustness - how is seq_len more robust? You mean the case when ncol(mat) == 0? In that case it wouldn't work either.
I know, but 1:ncol(mat) gives 1,0 in that case, and seq_len(ncol(mat)) returns a zero length integer vector - which is the right answer. You could imagine cases where 1:ncol(mat) might work but give the wrong answer whilst seq_len(ncol(mat)) would cause it to fail appropriately. I'm just always wary of 1:foo where foo is computed.
@Gavin, thanks, good note. But as I tested it now, you don't need to worry with ncol(). It seems it will never return something smaller than 1 without an error (which is quite expected behaviour).
8

This solution makes use of the arr.ind=TRUE argument of which, returning the matching positions as array locations. These are then used to index the colnames:

> factor(colnames(mat)[which(mat==1, arr.ind=TRUE)[, 2]])
[1] A A B B C
Levels: A B C

Decomposing into steps:

> which(mat==1, arr.ind=TRUE)
     row col
[1,]   1   1
[2,]   2   1
[3,]   3   2
[4,]   4   2
[5,]   5   3

Use the values of the second column, i.e. which(...)[, 2] and index colnames:

> colnames(mat)[c(1, 1, 2, 2, 3)]
[1] "A" "A" "B" "B" "C"

And then convert to a factor

3 Comments

Will not work if the factors are not ordered, try it on matrix mat2 = rbind(mat, c(1, 0, 0)).
The problem is that which() is doing it by columns, not by rows. You can fix it by transposing it (swapping rows/columns): factor(colnames(mat2)[which(t(mat2)==1, arr.ind=TRUE)[,1]]). I don't know, maybe there is a better way how to tell which() to go by rows, not by columns!
This can be overcome by taking its transpose: rownames(which(t(mat2) == 1, arr.ind=T)) = "A", "A", "B", "B", "C", "A".
5

One way is to replicate the names out by row number and index directly with the matrix, then wrap that with factor to restore the levels:

factor(rep(colnames(mat), each = nrow(mat))[as.logical(mat)])
[1] A A B B C
Levels: A B C

If this is from model.matrix, the colnames have fac prepended, and so this should work the same but removing the extra text:

factor(gsub("^fac", "", rep(colnames(mat), each = nrow(mat))[as.logical(mat)]))

1 Comment

Again, it will not work if the factors are not ordered, try it on matrix mat2 = rbind(mat, c(1, 0, 0)).
4

You could use something like this:

lvls<-apply(mat, 1, function(currow){match(1, currow)})
fac<-factor(lvls, 1:3, labels=colnames(mat))

Comments

1

Here is another one

factor(rep(colnames(mat), colSums(mat)))

1 Comment

Will not work if the factors are not ordered, try it on matrix mat2 = rbind(mat, c(1, 0, 0)).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.