3

I have a dataset like this,

       x time 
1   TRUE    9       
2   TRUE    8       
3   TRUE   10      
4   TRUE    5       
5   TRUE   16       
6  FALSE    2       
7  FALSE   17       
8  FALSE    6   
9   TRUE   11       
10  TRUE    7       
11  TRUE   20       
12  TRUE    3       
13  TRUE   10       
14 FALSE    4       
15 FALSE    2       
16 FALSE   10  
17  TRUE    3       
18  TRUE    6 

Using r, I would like to generate a new variable to assign a unique number for the changes of condition based on x and time. Specifically, I would like to search the data from the beginning and assign a number (e.g.,1) for the first row. When the condition of x change between TRUE and False, the number will increase. Within the "x is False" condition, the number will remain the same. However, within the "x is TRUE" condition, the number remain the same when "x is TRUE and time < 10", but the number will also be added 1, when meet the condition that "x is TRUE and time >= 10", and remain the same until meeting the next change condition.

Saying differently, change between TRUE and False is considered as condition change. In addition, when "x is TRUE", every time when time > 10 is also considered as the start of a new condition.

The output I would like to get is like this.

       x time   count
1   TRUE   9       1
2   TRUE   8       1
3   TRUE   10      2
4   TRUE   5       2
5   TRUE   16      3
6  FALSE    2      4 
7  FALSE   17      4 
8  FALSE    6      4
9   TRUE   11      5 
10  TRUE    7      5 
11  TRUE   20      6 
12  TRUE    3      6 
13  TRUE    9      6 
14 FALSE    4      7 
15 FALSE    2      7 
16 FALSE   10      7
17  TRUE    3      8 
18  TRUE    6      8
19  TRUE    15     9

I tried rleid(x) but it certainly did not take the change in time variable into account. I will appreciate any advice on how to solve this in r!

2 Answers 2

1

Here is one option with rleid - use rleid on the column 'x' and the numeric index created based on the 'time' column

library(data.table)
setDT(df1)[, count := rleid(x, replace(x, x, cumsum(time[x] >= 10)))]

-output

        x  time count
    <lgcl> <int> <int>
 1:   TRUE     9     1
 2:   TRUE     8     1
 3:   TRUE    10     2
 4:   TRUE     5     2
 5:   TRUE    16     3
 6:  FALSE     2     4
 7:  FALSE    17     4
 8:  FALSE     6     4
 9:   TRUE    11     5
10:   TRUE     7     5
11:   TRUE    20     6
12:   TRUE     3     6
13:   TRUE     9     6
14:  FALSE     4     7
15:  FALSE     2     7
16:  FALSE    10     7
17:   TRUE     3     8
18:   TRUE     6     8
19:   TRUE    15     9

Or with dplyr

library(dplyr)
df1 %>% 
   mutate(count = rleid(x, replace(x, x, cumsum(time[x] >= 10))))

-output

       x time count
1   TRUE    9     1
2   TRUE    8     1
3   TRUE   10     2
4   TRUE    5     2
5   TRUE   16     3
6  FALSE    2     4
7  FALSE   17     4
8  FALSE    6     4
9   TRUE   11     5
10  TRUE    7     5
11  TRUE   20     6
12  TRUE    3     6
13  TRUE    9     6
14 FALSE    4     7
15 FALSE    2     7
16 FALSE   10     7
17  TRUE    3     8
18  TRUE    6     8
19  TRUE   15     9

data

df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19"), class = "data.frame")
Sign up to request clarification or add additional context in comments.

Comments

0

You can use for loop in base R.

# Your data, copied from @akrun
df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
                            FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
                            TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
                                                  7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
                                                                                                             "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                                                             "14", "15", "16", "17", "18", "19"), class = "data.frame")

# Create an empty `count` column 

df1$count <- 0

# Assign 1 to the first row 

df1$count[1] <- 1

# From the 2nd row up to the last row, increase the count number if 
# one two #conditions is satisfied. Otherwise, the count number should 
# remain unchanged.

for (k in 2:nrow(df1)) {
      # The two conditions for increase of count number:
      # (1)there is a change in x    OR     (2) x is TRUE and time >=10

  if( df1$x[k] != df1$x[k-1] | (df1$x[k] == TRUE & df1$time[k] >= 10)){
    df1$count[k] <- df1$count[k-1] + 1
  }
  else df1$count[k] <- df1$count[k-1]
}

df1
       x time count
1   TRUE    9     1
2   TRUE    8     1
3   TRUE   10     2
4   TRUE    5     2
5   TRUE   16     3
6  FALSE    2     4
7  FALSE   17     4
8  FALSE    6     4
9   TRUE   11     5
10  TRUE    7     5
11  TRUE   20     6
12  TRUE    3     6
13  TRUE    9     6
14 FALSE    4     7
15 FALSE    2     7
16 FALSE   10     7
17  TRUE    3     8
18  TRUE    6     8
19  TRUE   15     9

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.