2

I have a data frame which contains data relating to a score of different events. There can be a number of scoring events for one game. What I would like to do, is to subset the occasions when the score goes above 5 or below -5. I would also like to get the last row for each ID. So for each ID, I would have one or more rows depending on whether the score goes above 5 or below -5. My actual data set contains many other columns of information, but if I learn how to do this then I'll be able to apply it to anything else that I may want to do.

Here is a data set

ID Score Time
1    0    0
1    3    5
1    -2   9
1    -4   17
1    -7   31
1    -1   43
2    0    0
2    -3   15
2    0    19
2    4    25
2    6    29
2    9    33
2    3    37
3    0    0
3    5    3
3    2    11

So for this data set, I would hopefully get this output:

ID Score Time
1   -7    31    
1   -1    43
2    6    29 
2    9    33
2    3    37
3    2    11

So at the very least, for each ID there will be one line printed with the last score for that ID regardless of whether the score goes above 5 or below -5 during the event( this occurs for ID 3).

My attempt can subset when the value goes above 5 or below -5, I just don't know how to write code to get the last line for each ID:

Data[Data$Score > 5 | Data$Score < -5]

Let me know if you need anymore information.

2
  • What do you want to happen to a row that satisfies both conditions? Should it appear once or twice? Commented Jan 26, 2017 at 21:12
  • Preferably just once. If it appears twice it isn't an issue, I'm sure there is a way to delete duplicate rows Commented Jan 26, 2017 at 21:16

4 Answers 4

3

You can use rle to grab the last row for each ID. Check out ?rle for more information about this useful function.

Data2 <- Data[cumsum(rle(Data$ID)$lengths), ]
Data2
#   ID Score Time
#6   1    -1   43
#13  2     3   37
#16  3     2   11

To combine the two conditions, use rbind.

Data2 <- rbind(Data[Data$Score > 5 | Data$Score < -5, ], Data[cumsum(rle(Data$ID)$lengths), ])

To get rid of rows that satisfy both conditions, you can use duplicated and rownames.

Data2 <- Data2[!duplicated(rownames(Data2)), ]

You can also sort if desired, of course.

Sign up to request clarification or add additional context in comments.

5 Comments

In your rbind code, you could simplify the whole line down to df[with(df, c(which(Score > 5 | Score < -5), cumsum(rle(ID)$lengths))), ]
Whenever I run this code, get the error 1: In Ops.factor(Score, 5) : ‘>’ not meaningful for factors 2: In Ops.factor(Score, -5) : ‘<’ not meaningful for factors Even though I previously transform Score to numeric : DF <- transform(Table, Score = as.numeric(as.character(Score))) class(DF$Score) [1] "numeric"
@useR How did you read in your data? Make sure that you read it in so that the Score field is not a factor as you need to do numeric comparisons with it.
I left my computer for the weekend, came back and its now working! Thanks very much. The only issued I face was that the data wasn't grouped together. By that, I mean it printed all of the times when the score was above 5 or below -5, and then it printed the final value for each ID. I solved this issue by sorting the data frame by grouping the data by ID. Data3 <- Data2[order(Data2$ID),]
@useR If you want the rows in the same exact order as the original dataframe, then you can sort by the rownames, I believe.
3

Here's a go at it in data.table, where df is your original data frame.

library(data.table)
setDT(df)

df[df[, c(.I[!between(Score, -5, 5)], .I[.N]), by = ID]$V1]
#    ID Score Time
# 1:  1    -7   31
# 2:  1    -1   43
# 3:  2     6   29
# 4:  2     9   33
# 5:  2     3   37
# 6:  3     2   11

We are grouping by ID. The between function finds the values between -5 and 5, and we negate that to get our desired values outside that range. We then use a .I subset to get the indices per group for those. Then .I[.N] gives us the row number of the last entry, per group. We use the V1 column of that result as our row subset for the entire table. You can take unique values if unique rows are desired.

Note: .I[c(which(!between(Score, -5, 5)), .N)] could also be used in the j entry of the first operation. Not sure if it's more or less efficient.

Addition: Another method, one that uses only logical values and will never produce duplicate rows in the output, is

df[df[, .I == .I[.N] | !between(Score, -5, 5), by = ID]$V1]
#    ID Score Time
# 1:  1    -7   31
# 2:  1    -1   43
# 3:  2     6   29
# 4:  2     9   33
# 5:  2     3   37
# 6:  3     2   11

2 Comments

Whenever I try run this code I get the error Error in [.data.frame(DF, , .I == .I[.N] | !between(Score, -5, 5), by = ID) : unused argument (by = ID)' Even though ID is definitely a column name: > colnames(DF)` [1] "ID" "Score" "Time" `
@useR Did you run library(data.table)? The package needs to be installed and loaded.
2

Here is another base R solution.

df[as.logical(ave(df$Score, df$ID,
                  FUN=function(i) abs(i) > 5 | seq_along(i) == length(i))), ]

   ID Score Time
5   1    -7   31
6   1    -1   43
11  2     6   29
12  2     9   33
13  2     3   37
16  3     2   11

abs(i) > 5 | seq_along(i) == length(i) constructs a logical vector that returns TRUE for each element that fits your criteria. ave applies this function to each ID. The resulting logical vector is used to select the rows of the data.frame.

Comments

0

Here's a tidyverse solution. Not as concise as some of the above, but easier to follow.

library(tidyverse)
lastrows  <- Data %>% group_by(ID) %>% top_n(1, Time)
scorerows <- Data %>% group_by(ID) %>% filter(!between(Score, -5, 5))
bind_rows(scorerows, lastrows) %>% arrange(ID, Time) %>% unique()

# A tibble: 6 x 3
# Groups:   ID [3]
#      ID Score  Time
#   <int> <int> <int>
# 1     1    -7    31
# 2     1    -1    43
# 3     2     6    29
# 4     2     9    33
# 5     2     3    37
# 6     3     2    11

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.