compare multiple values with multple values in R Dataframe

Question

I have a data frame with 2 columns, "time and "a".

df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5))

How is it possible to compare if the values changed over time? I need a new column "comp" in the data frame that shows if the third value in column "c" is the still the same as the last two values and the two values before in the same column. So the result could look like this:

df <- data.frame(time = c(1, 2, 3, 4, 5, 6, 7, 8, 9), a = c(3, 8, 2, 2, 2, 2, 2, 4, 5), comp = c(F, F, F, F, T, F, F, F, F)

In the end I need to compare a column with about 3 mio. observations.

Bas · Accepted Answer · 2020-05-03 09:42:29Z

3

Using the tidyverse:

library(tidyverse)

df %>% 
  arrange(time) %>% 
  mutate(comp = a == lag(a) & a == lag(a, 2) & a == lead(a) & a == lead(a, 2))

#   time a  comp
# 1    1 3 FALSE
# 2    2 8 FALSE
# 3    3 2 FALSE
# 4    4 2 FALSE
# 5    5 2  TRUE
# 6    6 2 FALSE
# 7    7 2 FALSE
# 8    8 4 FALSE
# 9    9 5 FALSE

answered May 3, 2020 at 9:42

Bas

4,6681 gold badge17 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bolle Over a year ago

nice and easy solution :) what if I would like to compare specific values. For example the result is only True if there are twos (like in my df example) who haven't changed or (as a condition) sixes?

Bas Over a year ago

You can just keep adding conditions to the right-hand side of the comp = ... statement, such as & a == 2.

Ronak Shah · Accepted Answer · 2020-05-03 10:22:31Z

3

A similar solution to @Bas using data.table

library(data.table)
setDT(df)[, comp := a == shift(a) & a == shift(a, 2) & 
                  a == shift(a, type = 'lead') & a == shift(a, 2, type = 'lead')]

#   time a  comp
#1:    1 3 FALSE
#2:    2 8 FALSE
#3:    3 2 FALSE
#4:    4 2 FALSE
#5:    5 2  TRUE
#6:    6 2 FALSE
#7:    7 2 FALSE
#8:    8 4 FALSE
#9:    9 5 FALSE

answered May 3, 2020 at 10:22

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

2 Comments

Bolle Over a year ago

thanks for your data.table solution! What if I want to check for example the next 20 values? There must be a better solution than write shift(a, 1), shift(a, 2) etc.... because if I try setDT(df)[, comp := a == shift(a, type = "lag", n = c(1:2)) & a == shift(a, type = "lead", n = c(1:2))] I get an error: 'list' object cannot be coerced to type 'double'

Ronak Shah Over a year ago

You can use a rolling operations. Check out zoos ?rolapply function. You may ask a new question if you have trouble implementing it for your data.

Dominic van Essen · Accepted Answer · 2020-05-03 10:23:59Z

If I understand right, you're looking for values that are the same as their 2 adjacent values on either side, and in this case you're happy to ignore the 'missing' adjacent values for the 2 first & 2 last values.

Using base R:

sameasadj=function(v,n=2,include_ends=T) {
    if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))} 
    else {vv=c(rep(NA,n),v,rep(NA,n))}
    sapply(seq_along(v),function(i) diff(range(vv[i:(i+2*n)]))==0)
}

df$comp = sameasadj(df$a)
df$comp

Output:

[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

Explanation:

sameasadj=function(v,n=2,include_ends=T) = define function sameasadj to test whether each value is the same as its adjacent neighbours on each side. We can give the option to choose the number n of adjacent neighbours (in your case 2), and whether-or-not to include the ends (or to return 'NA' for these, since they lack enough neighbours on one side).

if(include_ends){vv=c(rep(head(v,1),n),v,rep(tail(v,1),n))} = if we want to include the ends, then we just add the 'missing' neighbours so that they match

else {vv=c(rep(NA,n),v,rep(NA,n))} = otherwise we add 'NA' values

sapply(seq_along(v),function(i) = go along each position i in the vector...

diff(range(vv[i:(i+2*n)]))==0) = ...and check whether the elements from i to i+2*n are all the same (diff(range(x))==0 will return TRUE if all elements of x are the same)

Putting it all into a function makes it easy to change your mind later about the number of adjacent neighbours, or what to do with the ends...

Collectives™ on Stack Overflow

compare multiple values with multple values in R Dataframe

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related