Create a binary indicator matrix (Boolean matrix) in R

Question

I have a list of data indicating attendance to conferences like this:

Event                     Participant  
ConferenceA               John   
ConferenceA               Joe  
ConferenceA               Mary    
ConferenceB               John  
ConferenceB               Ted  
ConferenceC               Jessica

I would like to create a binary indicator attendance matrix of the following format:

Event        John  Joe  Mary  Ted  Jessica  
ConferenceA  1     1    1     0    0  
ConferenceB  1     0    0     1    0  
ConferenceC  0     0    0     0    1

Is there a way to do this in R?

possible duplicate of Reshaping a column from a data frame into several columns using R ... and searching on [r] create indicator matrix brought up several others that were relevant (with either xtabs, table or one of the 'reshape2' functions). — IRTFM
– IRTFM, Commented Jul 2, 2013 at 17:20
I initially voted to close but this example has a much better title and is much better visually to understand (original output and expected output are shown). — Tyler Rinker
– Tyler Rinker, Commented Jul 2, 2013 at 17:36

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-07-02 17:09:52Z

11

Assuming your data.frame is called "mydf", simply use table:

> table(mydf)
             Participant
Event         Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

If there is a chance that someone would have attended a conference more than once, leading table to return a value greater than 1, you can simply recode all values greater than 1 to 1, like this.

temp <- table(mydf)
temp[temp > 1] <- 1

Note that this returns a table. If you want a data.frame to be returned, use as.data.frame.matrix:

> as.data.frame.matrix(table(mydf))
            Jessica Joe John Mary Ted
ConferenceA       0   1    1    1   0
ConferenceB       0   0    1    0   1
ConferenceC       1   0    0    0   0

In the above, "mydf" is defined as:

mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
  "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
  Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
  .Names = c("Event", "Participant"), class = "data.frame", 
  row.names = c(NA, -6L))

Please share your data in a similar manner in the future.

edited Jul 2, 2013 at 17:09

answered Jul 2, 2013 at 17:02

A5C1D2H2I1M1N2O1R2T1

194k31 gold badges417 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

IRTFM Over a year ago

+1 for as.data.frame.matrix since as.data.frame would reverse the table operation.

Brian Over a year ago

@Ananda Awesome! Works perfectly!

Tyler Rinker · Accepted Answer · 2014-02-27 02:05:06Z

@Ananda's answer is way better but I thought I'd throw up another approach using qdap. It only shines in the instance where "someone would have attended a conference more than once".

I included an instance when "someone would have attended a conference more than once" as pointed out by Ananda. In this case using the adjmat function and pulling out the Boolean matrix could be helpful.

Data With Double Attendee:

## dat <- read.table(text="Event                     Participant  
## ConferenceA               John   
## ConferenceA               Joe  
## ConferenceA               Mary    
## ConferenceB               John  
## ConferenceB               Ted  
## ConferenceB               Ted
## ConferenceC               Jessica  ", header=TRUE)

A table of counts:

library(qdap)
wfm(dat[, 1], dat[, 2], lower.case = FALSE)

## > wfm(dat[, 1], dat[, 2], lower.case = FALSE)
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   2
## conferenceC       1   0    0    0   0

With mtabulate

with(dat, mtabulate(split(Participant, Event)))

##             Jessica Joe John Mary Ted
## ConferenceA       0   1    1    1   0
## ConferenceB       0   0    1    0   2
## ConferenceC       1   0    0    0   0

A Boolean matrix:

adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean

## > adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   1
## conferenceC       1   0    0    0   0

AnilGoyal · Accepted Answer · 2021-02-25 15:13:59Z

0

One more baseR way, using function xtabs

xtabs(~mydf$Event+mydf$Participant)

             mydf$Participant
mydf$Event    Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

#using data
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
                                 "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
                       Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
                  .Names = c("Event", "Participant"), class = "data.frame", 
                  row.names = c(NA, -6L))

answered Feb 25, 2021 at 15:13

AnilGoyal

26.3k4 gold badges34 silver badges50 bronze badges

Collectives™ on Stack Overflow

Create a binary indicator matrix (Boolean matrix) in R

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related