Conditionally remove rows from dataframe (more than one conditions)

Question

I have searched SO and although there are many QA about conditionally removing rows none of the QA fit my problem.

I have a data.frame containing longitudinal measurements of variable x, y etc... , at various time points time, in several subjects id. Some subjects experience an event ev (denoted as 1, otherwise 0 at some time). I would like to reduce the initial data.frame to:

1) All rows with subjects that have not experienced an event (ok, thats easy) but also include
2) For the subjects that have experienced an event, all rows just prior to the event (that is all rows whith times less that the time of the event of that individual).

so that,

testdf<-data.frame(id=c(rep("A",4),rep("B",4),rep("C",4) ),
                   x=c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5), 
                   y=rev(c(NA, NA, 1,2, 3, NA, NA, 1, 2, NA,NA, 5)),
                   time=c(1,2,3,4,0.1,0.5,10,20,3,2,1,0.5),
                   ev=c(0,0,0,0,0,1,0,0,0,0,0,1))

would reduce to

   id  x  y time ev
1   A NA  5  1.0  0
2   A NA NA  2.0  0
3   A  1 NA  3.0  0
4   A  2  2  4.0  0
5   B  3  1  0.1  0
6   C  2  2  3.0  0
7   C NA  1  2.0  0
8   C NA NA  1.0  0

Note that condition 2 implies condition 1, if condition 2 is written as "all rows prior to an event". — Matthew Lundberg
– Matthew Lundberg, Commented Jan 26, 2013 at 15:11

Sven Hohenstein · Accepted Answer · 2013-01-27 08:11:41Z

4

Here's a solution with subset and ave:

subset(testdf, !ave(ev, id, FUN = cumsum))

edited Jan 27, 2013 at 8:11

answered Jan 26, 2013 at 15:16

Sven Hohenstein

82k17 gold badges150 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Matthew Lundberg · Accepted Answer · 2013-01-26 15:09:27Z

4

A solution in base:

> do.call(rbind, by(testdf, testdf$id, function(x) x[cumsum(x$ev) == 0,]))
     id  x  y time ev
A.1   A NA  5  1.0  0
A.2   A NA NA  2.0  0
A.3   A  1 NA  3.0  0
A.4   A  2  2  4.0  0
B     B  3  1  0.1  0
C.9   C  2  2  3.0  0
C.10  C NA  1  2.0  0
C.11  C NA NA  1.0  0

answered Jan 26, 2013 at 15:09

Matthew Lundberg

42.8k6 gold badges93 silver badges112 bronze badges

1 Comment

A5C1D2H2I1M1N2O1R2T1 Over a year ago

Or, testdf[with(testdf, ave(ev, id, FUN = cumsum)) == 0, ]

kohske · Accepted Answer · 2013-01-26 15:07:50Z

3

Here is an example:

> ddply(testdf, .(id), function(z) z[cumsum(z$ev) == 0, ])
  id  x  y time ev
1  A NA  5  1.0  0
2  A NA NA  2.0  0
3  A  1 NA  3.0  0
4  A  2  2  4.0  0
5  B  3  1  0.1  0
6  C  2  2  3.0  0
7  C NA  1  2.0  0
8  C NA NA  1.0  0

answered Jan 26, 2013 at 15:07

kohske

67.2k9 gold badges168 silver badges155 bronze badges

Comments

Arun · Accepted Answer · 2013-01-26 15:10:33Z

3

This solution using data.table seems to work on your testdf. The idea is to use cumsum to track the positions after the start of the first event.

require(data.table)
dt <- data.table(testdf, key=c("id"))
dt.out <- dt[, .SD[cumsum(ev) == 0], by=id]
> dt.out

#    id  x  y time ev
# 1:  A NA  5  1.0  0
# 2:  A NA NA  2.0  0
# 3:  A  1 NA  3.0  0
# 4:  A  2  2  4.0  0
# 5:  B  3  1  0.1  0
# 6:  C  2  2  3.0  0
# 7:  C NA  1  2.0  0
# 8:  C NA NA  1.0  0

edited Jan 26, 2013 at 15:10

answered Jan 26, 2013 at 15:05

Arun

119k28 gold badges290 silver badges396 bronze badges

Collectives™ on Stack Overflow

Conditionally remove rows from dataframe (more than one conditions)

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related