3

I have a list of data indicating attendance to conferences like this:

Event                     Participant  
ConferenceA               John   
ConferenceA               Joe  
ConferenceA               Mary    
ConferenceB               John  
ConferenceB               Ted  
ConferenceC               Jessica  

I would like to create a binary indicator attendance matrix of the following format:

Event        John  Joe  Mary  Ted  Jessica  
ConferenceA  1     1    1     0    0  
ConferenceB  1     0    0     1    0  
ConferenceC  0     0    0     0    1  

Is there a way to do this in R?

2
  • possible duplicate of Reshaping a column from a data frame into several columns using R ... and searching on [r] create indicator matrix brought up several others that were relevant (with either xtabs, table or one of the 'reshape2' functions). Commented Jul 2, 2013 at 17:20
  • 1
    I initially voted to close but this example has a much better title and is much better visually to understand (original output and expected output are shown). Commented Jul 2, 2013 at 17:36

3 Answers 3

11

Assuming your data.frame is called "mydf", simply use table:

> table(mydf)
             Participant
Event         Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

If there is a chance that someone would have attended a conference more than once, leading table to return a value greater than 1, you can simply recode all values greater than 1 to 1, like this.

temp <- table(mydf)
temp[temp > 1] <- 1

Note that this returns a table. If you want a data.frame to be returned, use as.data.frame.matrix:

> as.data.frame.matrix(table(mydf))
            Jessica Joe John Mary Ted
ConferenceA       0   1    1    1   0
ConferenceB       0   0    1    0   1
ConferenceC       1   0    0    0   0

In the above, "mydf" is defined as:

mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
  "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
  Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
  .Names = c("Event", "Participant"), class = "data.frame", 
  row.names = c(NA, -6L))

Please share your data in a similar manner in the future.

Sign up to request clarification or add additional context in comments.

2 Comments

+1 for as.data.frame.matrix since as.data.frame would reverse the table operation.
@Ananda Awesome! Works perfectly!
1

@Ananda's answer is way better but I thought I'd throw up another approach using qdap. It only shines in the instance where "someone would have attended a conference more than once".

I included an instance when "someone would have attended a conference more than once" as pointed out by Ananda. In this case using the adjmat function and pulling out the Boolean matrix could be helpful.

Data With Double Attendee:

## dat <- read.table(text="Event                     Participant  
## ConferenceA               John   
## ConferenceA               Joe  
## ConferenceA               Mary    
## ConferenceB               John  
## ConferenceB               Ted  
## ConferenceB               Ted
## ConferenceC               Jessica  ", header=TRUE)

A table of counts:

library(qdap)
wfm(dat[, 1], dat[, 2], lower.case = FALSE)

## > wfm(dat[, 1], dat[, 2], lower.case = FALSE)
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   2
## conferenceC       1   0    0    0   0

With mtabulate

with(dat, mtabulate(split(Participant, Event)))

##             Jessica Joe John Mary Ted
## ConferenceA       0   1    1    1   0
## ConferenceB       0   0    1    0   2
## ConferenceC       1   0    0    0   0

A Boolean matrix:

adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean

## > adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   1
## conferenceC       1   0    0    0   0

Comments

0

One more baseR way, using function xtabs

xtabs(~mydf$Event+mydf$Participant)

             mydf$Participant
mydf$Event    Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

#using data
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
                                 "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
                       Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
                  .Names = c("Event", "Participant"), class = "data.frame", 
                  row.names = c(NA, -6L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.