Filling data frame with previous row value

Question

I have a data frame that has 2 columns.

column1 has random numbers in column2 is a place holding column for what i want column3 to look like

  random    temp
0.502423373 1
0.687594055 0
0.741883739 0
0.445364032 0
0.50626137  0.5
0.516364981 0
...

I want to fill column3 so it takes the last non-zero number (1 or .5 in this example) and continuously fills the following rows with that value until it hits a row with a different number. then it repeats the process for the entire column.

random     temp state
0.502423373 1   1
0.687594055 0   1
0.741883739 0   1
0.445364032 0   1
0.50626137  0.5 0.5
0.516364981 0   0.5
0.807804708 0   0.5
0.247948445 0   0.5
0.46573337  0   0.5
0.103705154 0   0.5
0.079625868 1   1
0.938928944 0   1
0.677713019 0   1
0.112231619 0   1
0.165907178 0   1
0.836195267 0   1
0.387712998 1   1
0.147737077 0   1
0.439281543 0.5 0.5
0.089013503 0   0.5
0.84174743  0   0.5
0.931738707 0   0.5
0.807955172 1   1

thanks for any and all help

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-12-06 07:52:45Z

12

Perhaps you can make use of na.locf from the "zoo" package after setting values of "0" to NA. Assuming your data.frame is called "mydf":

mydf$state <- mydf$temp
mydf$state[mydf$state == 0] <- NA

library(zoo)
mydf$state <- na.locf(mydf$state)
#      random temp state
# 1 0.5024234  1.0   1.0
# 2 0.6875941  0.0   1.0
# 3 0.7418837  0.0   1.0
# 4 0.4453640  0.0   1.0
# 5 0.5062614  0.5   0.5
# 6 0.5163650  0.0   0.5

If there were NA values in your original data.frame in the "temp" column, and you wanted to keep them as NA in the newly generated "state" column too, that's easy to take care of. Just add one more line to reintroduce the NA values:

mydf$state[is.na(mydf$temp)] <- NA

edited Dec 6, 2013 at 7:52

answered Dec 6, 2013 at 4:40

A5C1D2H2I1M1N2O1R2T1

194k31 gold badges417 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Neal Fultz Over a year ago

I think this would be bad if there are already NAs in the data. But if it works that's good too.

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@NealFultz, and that comment warrants a down-vote? It's pretty easy to address your concern about the comment. (I'm presuming that you would want the value in the generated "state" variable to be NA if it was NA in the "temp" variable. Notice that I don't touch the "temp" variable, so I still have easy access to that information.)

Neal Fultz Over a year ago

And if you have NAs next to 0s?

A5C1D2H2I1M1N2O1R2T1 Over a year ago

@NealFultz, ??? How should I know. It's not my data and these conditions are not specified in the question. I would still guess that a NA next to a zero should be replaced with the last known value, and with the current data set, I don't see that this would be a problem. Or do you want to continue filling the data with NA when an NA is encountered? Please feel free to share the condition you perceive and how you propose dealing with it. I don't see that your present solution handles NA values, so I am eager to learn.

user2813055 Over a year ago

Just to clarify, there are no NAs, so this solution did the trick!

shadow · Accepted Answer · 2013-12-06 13:40:34Z

5

Inspired by the solution of @Ananda Mahto, this is an adaption of the internal code of na.locf that works directly with 0's instead of NAs. Then you don't need the zoo package and you don't need to do the preprocessing of changing the values to NA. Benchmarktests show that this is about 10 times faster than the original version.

locf.0 <- function(x) {
  L <- x!=0
  idx <- c(0, which(L))[cumsum(L) + 1]
  return(x[idx])
} 
mydf$state <- locf.0(mydf$temp)

answered Dec 6, 2013 at 13:40

shadow

22.4k5 gold badges67 silver badges80 bronze badges

1 Comment

A5C1D2H2I1M1N2O1R2T1 Over a year ago

Clever thought to modify na.locf. +1

kdauria · Accepted Answer · 2013-12-06 06:51:26Z

3

Here is an interesting way with the Reduce function.

temp = c(1,0,0,0,.5,0,0,0,0,0,1,0,0,0,0,0,1,0,0.5,0,0,0,1)
fill_zero = function(x,y) if(y==0) x else y
state = Reduce(fill_zero, temp, accumulate=TRUE)

If you're worried about speed, you can try Rcpp.

library(Rcpp)
cppFunction('
  NumericVector fill_zeros( NumericVector x ) {
    for( int i=1; i<x.size(); i++ )
     if( x[i]==0 ) x[i] = x[i-1];
    return x;
  }
')
state = fill_zeros(temp)

edited Dec 6, 2013 at 6:51

answered Dec 6, 2013 at 6:29

kdauria

6,7814 gold badges37 silver badges55 bronze badges

Comments

alexis_laz · Accepted Answer · 2013-12-06 11:38:56Z

3

Also, unless I'm overlooking something, this seems to work:

DF$state2 <- ave(DF$temp, cumsum(DF$temp), FUN = function(x) x[x != 0])
DF
#       random temp state state2
#1  0.50242337  1.0   1.0    1.0
#2  0.68759406  0.0   1.0    1.0
#3  0.74188374  0.0   1.0    1.0
#4  0.44536403  0.0   1.0    1.0
#5  0.50626137  0.5   0.5    0.5
#6  0.51636498  0.0   0.5    0.5
#7  0.80780471  0.0   0.5    0.5
#8  0.24794844  0.0   0.5    0.5
#9  0.46573337  0.0   0.5    0.5
#10 0.10370515  0.0   0.5    0.5
#11 0.07962587  1.0   1.0    1.0
#12 0.93892894  0.0   1.0    1.0
#13 0.67771302  0.0   1.0    1.0
#14 0.11223162  0.0   1.0    1.0
#15 0.16590718  0.0   1.0    1.0
#16 0.83619527  0.0   1.0    1.0
#17 0.38771300  1.0   1.0    1.0
#18 0.14773708  0.0   1.0    1.0
#19 0.43928154  0.5   0.5    0.5
#20 0.08901350  0.0   0.5    0.5
#21 0.84174743  0.0   0.5    0.5
#22 0.93173871  0.0   0.5    0.5
#23 0.80795517  1.0   1.0    1.0

answered Dec 6, 2013 at 11:38

alexis_laz

13.2k4 gold badges29 silver badges37 bronze badges

2 Comments

kdauria Over a year ago

I think ave(DF$temp, cumsum(DF$temp), FUN = sum) should work as well.

alexis_laz Over a year ago

@Kevin: Yeah, you're right! In this case, summing the values works, too. And, perhaps, it is faster too, because it avoids turning to logical before indexing? Although, I'd still might prefer x[x != 0], because it declares exactly what the averaging function is.

TheComeOnMan · Accepted Answer · 2013-12-06 05:04:56Z

0

A loop along the following lines should do the trick for you -

for(i in seq(nrow(df)))
{
  if (df[i,"v1"] == 0) df[i,"v1"] <- df[i-1,"v1"]
}

Output -

> df
   v1 somedata
1   1       33
2   2       24
3   1       36
4   0       49
5   2       89
6   2       48
7   0        4
8   1       98
9   1       60
10  2       76
> 
> for(i in seq(nrow(df)))
+ {
+   if (df[i,"v1"] == 0) df[i,"v1"] <- df[i-1,"v1"]
+ }
> df
   v1 somedata
1   1       33
2   2       24
3   1       36
4   1       49
5   2       89
6   2       48
7   2        4
8   1       98
9   1       60
10  2       76

answered Dec 6, 2013 at 5:04

TheComeOnMan

13k9 gold badges42 silver badges55 bronze badges

Comments

Neal Fultz · Accepted Answer · 2013-12-06 07:08:20Z

0

I suggest using the run length encoding functions, it's a natural way for dealing with steaks in a data set. Using @Kevin's example vector:

temp = c(1,0,0,0,.5,0,0,0,0,0,1,0,0,0,0,0,1,0,0.5,0,0,0,1)
y <- rle(temp)
#str(y)
#List of 2
# $ lengths: int [1:11] 1 3 1 5 1 5 1 1 1 3 ...
# $ values : num [1:11] 1 0 0.5 0 1 0 1 0 0.5 0 ...
# - attr(*, "class")= chr "rle"


for( i in seq(y$values)[-1] ) {
   if(y$values[i] == 0) {
      y$lengths[i-1] = y$lengths[i] + y$lengths[i-1]
      y$lengths[i] = 0
   }
}

#str(y)
#List of 2
# $ lengths: num [1:11] 4 0 6 0 6 0 2 0 4 0 ...
# $ values : num [1:11] 1 0 0.5 0 1 0 1 0 0.5 0 ...
# - attr(*, "class")= chr "rle"

inverse.rle(y)
#  [1] 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.5
# [20] 0.5 0.5 0.5 1.0

answered Dec 6, 2013 at 7:08

Neal Fultz

9,7791 gold badge46 silver badges60 bronze badges

1 Comment

Carl Witthoft Over a year ago

You need some A-1 data sauce to go with those steaks? :-0

wibeasley · Accepted Answer · 2019-02-28 04:25:56Z

-1

Simply use a loop with a global variable ,

globalvariable used here is m, r is a dataframe with two columns A and B.

r$B = c(1,NA, NA, NA, 3, NA,6)


m=1

for( i in 1:nrow(r) ){

  if(is.na(r$B[i])==FALSE ){

    m <<- i # please note the assign sign ,  " <<- "
    next()

  } else {

    r$B[i] = r$B[m]

  }

}

After Execution : r$B = 1 1 1 1 3 3 6

edited Feb 28, 2019 at 4:25

wibeasley

5,3073 gold badges38 silver badges68 bronze badges

answered Feb 28, 2019 at 2:43

tinu maria jose

1

2 Comments

Maurits Evers Over a year ago

First off, this is a really bad and un-R-like way to achieve what OP is after. There are much much better (and vectorised) alternatives, see the other answers to this post. Secondly, the code you give is actually not reproducible. r is not defined anywhere, you mention R as a data.frame but R is case-sensitive. Using <<- in this context is precisely one of the examples for how not to use <<-: The Evil and Wrong use is to modify variables in the global environment.

Maurits Evers Over a year ago

[continued] Lastly, next is a control flow statement; next doesn't return a value, and it should be next instead of next(). I think this answer contributes little (if anything) to this post and therefore should be deleted as it promotes bad R coding practice.

Collectives™ on Stack Overflow

Filling data frame with previous row value

7 Answers 7

5 Comments

1 Comment

Comments

2 Comments

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

5 Comments

1 Comment

Comments

2 Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related