Removing rows from a dataframe based on a conditional statement?

Question

I have a dataframe (call it df) of accidents. Each accident has a # associated with it, a # for each person involved, and the type of accident. It looks something like this:

x               y                    z
accident #1   person A    accident type #1
accident #1   person A    accident type #2
accident #2   person A    accident type #1
accident #2   person B    accident type #2
accident #2   person B    accident type #3
accident #3   person C    accident type #1

In the above case, person A was involved in two accidents. In the first accident, there were two 'types' of accidents that person A was involved with. Person B was involved with person A, but was only involved in one accident, with two accident types. Person C was also involved in only one accident.

I want to collect the subset of people who have only been involved in one accident. However, I want to include all of their accident types. So using the above example, I would want this:

x               y                    z
accident #2   person #2    accident type #2
accident #2   person #2    accident type #3
accident #3   person #3    accident type #1

How might I do this in R?

Indent four spaces to make code blocks, or highlight and press CTRL+K — Frank
– Frank, Commented May 1, 2017 at 17:42
@MichaelChirico I'm new to R and am unsure what exactly to Google. Nothing I've found matches my specific case. — Sydney Maples
– Sydney Maples, Commented May 1, 2017 at 17:50
Then explicate what you've found, and why it doesn't apply -- demonstrating effort goes a long way — MichaelChirico
– MichaelChirico, Commented May 1, 2017 at 17:51

David Robinson · Accepted Answer · 2017-05-01 17:50:27Z

3

You can do this with the dplyr package, using group_by, filter, and n_distinct:

library(dplyr)
df %>%
  group_by(y) %>%
  filter(n_distinct(x) == 1) %>%
  ungroup()

answered May 1, 2017 at 17:50

David Robinson

78.8k16 gold badges172 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akrun · Accepted Answer · 2017-05-02 03:11:40Z

0

We can use data.table

library(data.table)
setcolorder(setDT(df)[, .SD[uniqueN(x)==1] , y], names(df))[]
#            x        y                z
#1: accident #2 person B accident type #2
#2: accident #2 person B accident type #3
#3: accident #3 person C accident type #1

answered May 2, 2017 at 3:11

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Removing rows from a dataframe based on a conditional statement?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related