0

I have a dataframe like so:

    TS    Device1.max   Device2.max   Device3.max       Device4.max
18:02:44     FALSE        FALSE          TRUE               FALSE
18:02:45     TRUE         TRUE          FALSE               FALSE
18:02:46     FALSE        FALSE         FALSE               TRUE
18:02:47     FALSE        FALSE         FALSE               FALSE
18:02:48     FALSE        FALSE         FALSE               FALSE
18:02:49     FALSE        FALSE         FALSE               FALSE
18:02:50     FALSE        FALSE         FALSE               FALSE
18:02:51     FALSE        FALSE         FALSE               FALSE
18:02:52     FALSE        FALSE         FALSE               TRUE
18:02:53     FALSE        TRUE          FALSE               FALSE
18:02:54     FALSE        FALSE         FALSE               FALSE

To get the true false columns I used the following code:

df$Device1.max = ifelse(df$Device1 == max(df$Device1), 'true','false')
df$Device2.max = ifelse(df$Device2 == max(df$Device2), 'true','false')
df$Device3.max = ifelse(df$Device3 == max(df$Device3), 'true','false')
df$Device4.max = ifelse(df$Device4 == max(df$Device4), 'true','false')

For simplicity I am only showing 4 Device columns. I have about a hundred device columns where I would like to do the comparison. It wont be feasible to specify the hundred columns in a hundred ifelse statements How do I compare using regex or specifying generic column name assuming all the device columns of interest will have some sort of name starting like device?

I then want to filter to or find the row where maximum Device.max columns satisfy the condition where it is TRUE within +/-1 row of it. Algorithmically, I would create an index column and filter to a dataframe where only the TRUE values are present. Then I would check how many columns have indices within 1 row of each other. In the above case rows 1,2 & 3 have 4 columns satisfying the true condition, whereas rows 9 & 10 have only 2 columns satisfying the condition. Therefore my expected output would be:

     TS      Device1.max    Device2.max   Device3.max         Device4.max
    18:02:44     FALSE        FALSE          TRUE               FALSE
    18:02:45     TRUE         TRUE          FALSE               FALSE
    18:02:46     FALSE        FALSE         FALSE               TRUE

However this method seems very iterative and inefficient. Is there a better way to do it leveraging dataframe functions in R?

1 Answer 1

1

This code should answer the first TRUE/FALSE question

r <- c();
colum <- c();
for (colu in 2:ncol(example_table)){
  example_table[ ,colu] <- example_table[ ,colu]==max(example_table[ ,colu]) # returns True/False
  val <- which(example_table[ ,colu]==T) # searching for row indexes
  r <- append(r,val) # append row indexes
  colum <- append(colum,rep(colu,length(val))) # since one column can contain more than one True, repeat that column index and append it
}
true_values <- cbind(r,colum) # just a matrix-like output

out:

> example_table
     V1 V2 V3 V4
1 18:02  5  8  1
2 14:05  7  1  7
3 19:27  7  6  1

# After for:

> example_table
     V1    V2    V3    V4
1 18:02 FALSE  TRUE FALSE
2 14:05  TRUE FALSE  TRUE
3 19:27  TRUE FALSE FALSE

> true_values
     r colum
[1,] 2     2
[2,] 3     2
[3,] 1     3
[4,] 2     4

where r is the row index and colum is the column index which contains true values. Note that example_table[ ,colu]==max(example_table[ ,colu]) returns a TRUE/FALSE value, and note that colum <- append(colum,rep(colu,length(val))) is needed to avoid future dimensional problems too.

For the second question, you have now row indexes which contains TRUE values. You can then implement a code that when rows above and under the selected row contains a TRUE value (any() function should be a good one), select that row. Then subset() original data frame with this row indexes.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much for that clever solution! I have a couple of questions: 1. Are the semicolons initializing r and colum mandatory? It seemed to work for me without them 2. Is is possible assign a different column name when doing the check? Like df[ ,colu_max] <- df[ ,colu]==max(df[ ,colu]) ?
I was able to do it using the paste function in column names. thank You so much for getting me this far!
1. No, semicolons are not necessary. 2. No, you can't change it until the loop has finished. You can change "colu" variable in loop, but must be changed inside all the elements. Once loop is done, you can, as you said, change columnames using colnames(your data frame) function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.