0

I have a loop that recodes values of a column and breaks when a condition is met. I would like to use this loop, or its basic concept, on a list of data frames with the same format.

sample data:

Id <- as.factor(c(rep("01001", 11), rep("01043", 11), rep("01065", 11), rep("01069", 11)))
YearCode <- as.numeric(rep(1:11, 4))
Type <- c(NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA)
test <- NA
sample_df <- data.frame(Id, YearCode, Type, test)

# A part of sample_df
one_df <- subset(sample_df, sample_df$Id=="01069")

This for loop works fine for one data frame:

# example for loop using example data frame "one_df"
for(i in seq(along=one_df$Id)){
if(is.na(one_df$Type[i])){  # if Type is NA, recode to 0
one_df$test[i] <- 0  
} else {   # Stop when Type is not NA, and leave remaining NAs that come after
break }
} 

However, I have many data frames with this same format in a list. I would like to keep them in the list and apply this loop over the whole list.

# example list : split data frame into list by Id
sample_list <- split(sample_df, sample_df$Id, drop = TRUE)

I've looked around other posts such as this one, but I get stuck when trying to loop over each data frame in the list or write a similar function using lapply. How can I modify this loop to work on the list (sample_list), using either a for loop, lapply, or something else?

Any tips would be greatly appreciated, let me know if I need to clarify anything. Thanks!

2 Answers 2

2

I think the following would do the job that you described. What I did is the following. I first created a new column called test with if_else(). If complete.cases(Type) is TRUE, then use a value from Type. Otherwise use 0. The next step was to replace some specific 0s with NA. Since you do not want to have 0s in rows which come after the row with the first numeric value in Type. For instance, you do not want to have 0s after the 10th row for Id == 01069. So I created the testing condition: row_number() > which(complete.cases(Type))[1]. You can read this as "whether a row number is larger than the row number for the first numeric value." Using this condition, I replaced 0s with NA. I provided a part of the result for sample_df. I hope this will help your work.

library(dplyr)

sample_df %>%
group_by(Id) %>%
mutate(test = if_else(complete.cases(Type), Type, 0),
       test = if_else(row_number() > which(complete.cases(Type))[1],
                      NA_real_, test)) -> out

#       Id YearCode  Type  test
#   <fctr>    <dbl> <dbl> <dbl>
#1   01001        1    NA     0
#2   01001        2    NA     0
#3   01001        3    NA     0
#4   01001        4    NA     0
#5   01001        5    NA     0
#6   01001        6    NA     0
#7   01001        7    NA     0
#8   01001        8     2     2
#9   01001        9    NA    NA
#10  01001       10    NA    NA
#11  01001       11    NA    NA
#------------------------------
#34  01069        1    NA     0
#35  01069        2    NA     0
#36  01069        3    NA     0
#37  01069        4    NA     0
#38  01069        5    NA     0
#39  01069        6    NA     0
#40  01069        7    NA     0
#41  01069        8    NA     0
#42  01069        9    NA     0
#43  01069       10     2     2
#44  01069       11    NA    NA

EDIT

The OP wants to have 0 when Type contains NAs only, according to his/her comment. The following will do the job.

sample_df %>%
group_by(Id) %>%
mutate(test = if_else(complete.cases(Type), Type, 0),
       test = if_else(row_number() > which(complete.cases(Type))[1],
                      NA_real_, test),
       foo = sum(Type, na.rm = TRUE),
       test = replace(test, which(foo == 0), 0)) %>%
select(-foo) -> out

# A part of the result
#       Id YearCode  Type  test
#   <fctr>    <dbl> <dbl> <dbl>
#1   01001        1    NA     0
#2   01001        2    NA     0
#3   01001        3    NA     0
#4   01001        4    NA     0
#5   01001        5    NA     0
#6   01001        6    NA     0
#7   01001        7    NA     0
#8   01001        8     2     2
#9   01001        9    NA    NA
#10  01001       10    NA    NA
#11  01001       11    NA    NA
#12  01043        1    NA     0
#13  01043        2    NA     0
#14  01043        3    NA     0
#15  01043        4    NA     0
#16  01043        5    NA     0
#17  01043        6    NA     0
#18  01043        7    NA     0
#19  01043        8    NA     0
#20  01043        9    NA     0
#21  01043       10    NA     0
#22  01043       11    NA     0
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your quick response! And also thanks, this is an interesting solution. Correct me if I'm wrong, but I get NAs instead of 0 for the IDs without anything in the Type column. Ideally when all values for Type are NA within that ID all values for test would be 0.
@mhd Yeah, that is the expected outcome. Since there was no information of how to handle such cases, I decided to write the code above. I'll add another way to do the additional task.
0

IS there an issue with creating a function and using lapply? it seems to work

#rm(list=ls())
Id <- as.factor(c(rep("01001", 11), rep("01043", 11), rep("01065", 11), rep("01069", 11)))
YearCode <- as.numeric(rep(1:11, 4))
Type <- c(NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
          NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA)
test <- NA
sample_df <- data.frame(Id, YearCode, Type, test)

# A part of sample_df
one_df <- subset(sample_df, sample_df$Id=="01069")

sample_list <- split(sample_df, sample_df$Id, drop = TRUE)

####################################

# for loop as funciton   
fnX<- function(myDF){
 for(i in seq(along=myDF$Id)){
   if(is.na(myDF$Type[i])){  # if Type is NA, recode to 0
    myDF$test[i] <- 0  
   } else {   # Stop and leave remaining NAs that come after
   break }
  } 
  myDF
 }

#apply function 
fnX(sample_list$`01069`)   

lapply(sample_list,fnX)

2 Comments

Yup that's it. I missed the last segment of the function with the myDF before the last bracket, THANK YOU for helping to pull it together!
The return bit.. The convenience of R makes it easier to commit a mistake, alas! Pleased it helped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.