2

I have a dataframe (call it df) of accidents. Each accident has a # associated with it, a # for each person involved, and the type of accident. It looks something like this:

x               y                    z
accident #1   person A    accident type #1
accident #1   person A    accident type #2
accident #2   person A    accident type #1
accident #2   person B    accident type #2
accident #2   person B    accident type #3
accident #3   person C    accident type #1

In the above case, person A was involved in two accidents. In the first accident, there were two 'types' of accidents that person A was involved with. Person B was involved with person A, but was only involved in one accident, with two accident types. Person C was also involved in only one accident.

I want to collect the subset of people who have only been involved in one accident. However, I want to include all of their accident types. So using the above example, I would want this:

x               y                    z
accident #2   person #2    accident type #2
accident #2   person #2    accident type #3
accident #3   person #3    accident type #1

How might I do this in R?

5
  • 1
    Indent four spaces to make code blocks, or highlight and press CTRL+K Commented May 1, 2017 at 17:42
  • please google this, as it's quite a common operation Commented May 1, 2017 at 17:47
  • @MichaelChirico I'm new to R and am unsure what exactly to Google. Nothing I've found matches my specific case. Commented May 1, 2017 at 17:50
  • Then explicate what you've found, and why it doesn't apply -- demonstrating effort goes a long way Commented May 1, 2017 at 17:51
  • 1
    stackoverflow.com/questions/17421776/… Commented May 1, 2017 at 18:00

2 Answers 2

3

You can do this with the dplyr package, using group_by, filter, and n_distinct:

library(dplyr)
df %>%
  group_by(y) %>%
  filter(n_distinct(x) == 1) %>%
  ungroup()
Sign up to request clarification or add additional context in comments.

Comments

0

We can use data.table

library(data.table)
setcolorder(setDT(df)[, .SD[uniqueN(x)==1] , y], names(df))[]
#            x        y                z
#1: accident #2 person B accident type #2
#2: accident #2 person B accident type #3
#3: accident #3 person C accident type #1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.