0

I have a dataframe (df1) containing many records Each record has up to three trials, each trial can be repeat up to five times. Below is an example of some data I have:

Record   Trial   Start    End    Speed     Number
     1       2       1      4       12         9
     1       2       4      6       11        10
     1       3       1      3       10        17
     2       1       1      5       14         5

I have the following code that calculates the longest 'Distance' and 'Maximum Number' for each Record.:

getInfo <- function(race_df) {
  race_distance <- as.data.frame(race_df %>% group_by(record,trial) %>% summarise(max.distance = max(End - Start)))
  race_max_number = as.data.frame(race_df %>% group_by(record,trial) %>% summarise(max.N = max(Number)))
  rd_rmn_merge <- as.data.frame(merge(x = race_distance, y = race_max_number)
  total_summary <- as.data.frame(rd_rmn_merge[order(rd_rmn_merge$trial,])
  return(list(race_distance, race_max_number, total_summary)
}

list_summary <- getInfo(race_df)
total_summary <- list_of_races[[3]]

list_summary gives me an output like this:

 [[1]]
 Record   Trial    Max.Distance  
      1       2       3       
      1       3       2     
      2       1       4      

 [[2]]
 Record    Trial    Max.Number
      1       2       10
      1       3       17
      2       1        5

 [[3]]
 Record  Trial    Max.Distance    Max.Number 
      1       2        3             10
      1       3        2             17
      2       1        4              5

I am now trying to seek the longest distance with the corresponding 'Number' regardless if it being maximum. So having Record 1, Trial 2 look like this instead:

Record   Trial     Max.Distance  Corresponding Number
     1       2          3                9

Eventually I would like to be able to create a function that is able to take arguments 'Record' and 'Trial' through the 'race_df' dataframe to make searching for a specific record and trial's longest distance easier.

Any help on this would be much appreciated.

2 Answers 2

3

The data (in case anyone else wants to offer their solution):

df <- data.frame( Record = c(1,1,1,2),
                  Trial = c(2,2,3,1),
                  Start = c(1,4,1,1),
                  End = c(4,6,3,5),
                  Speed = c(12,11,10,14),
                  Number = c(9,10,17,5))

Here's a tidyverse solution:

library(tidyverse)
df %>% 
  mutate( Max.Distance = End - Start) %>% 
  select(-Start,-End,-Speed) %>%
  group_by(Record) %>% 
  nest() %>% 
  mutate( data = map( data, ~ filter(.x, Max.Distance == max(Max.Distance)) )) %>% 
  unnest()

The output:

  Record Trial Number Max.Distance
   <dbl> <dbl>  <dbl>        <dbl>
1      1     2      9            3
2      2     1      5            4

Note if you want to keep all of your columns in the final data frame, just remove select....

Sign up to request clarification or add additional context in comments.

2 Comments

Hi, thanks for responding :) I tried implementing this code, however, I received this error: Error in library(tidyverse) : there is no package called ‘tidyverse’ So I did a bit of searching and used this code to install it install.packages('tidyverse', dependencies=TRUE, type="source") and it worked!
Glad to hear it
2

I hope I get right what your function is supposed to do. In the end it should take a record and a trial and put out the row(s) where we have the maximum distance, right? So, it boils down to two filters:

  1. filter rows for the record and trial.
  2. filter the row inside that subset that has the maximum distance

Between those two filters, we have to calculate the distance although I suggest you move that outside the function because it is basically a one time operation.

race_df <- data.frame(Record = c(1, 1, 1, 2), Trial = c(2, 2, 3, 1),
                 Start = c(1, 4, 1, 1), End = c(4, 6, 3, 5), Speed = c(12, 11, 10, 14),
                 Number = c(9, 10, 17, 5))

get_longest <- function(df, record, trial){
  df %>% 
    filter(Record == record & Trial == trial) %>% 
    mutate(Distance = End - Start) %>% 
    filter(Distance == max(Distance)) %>%
    select(Number, Distance)
}
get_longest(race_df, 1, 2)

4 Comments

Hi, thanks for your response! I'm just wanting Distance and Number. How would I remove Start, End and Speed to the output?
see my edit. If that solves your problem, you could accept my answer. Ps: I suggest you have a look at the data transformation chapter of Hadley Wickham's free ebook "R for Data Science". dplyr is very powerful!
Thanks for your suggestion and edit! I tried implementing the code with select but received back this error: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘select’ for signature ‘"data.frame"’ Not entirely sure what that means and now looking for examples on the internet.
After using install.packages('tidyverse', dependencies=TRUE, type="source"). It now works, thanks very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.