0

I am R beginner, the following is my code:

complete <- function(directory, id = 1:332) {


# Read through all the csv data file
for (i in id) {
    i <- sprintf("%03d", as.numeric(i))
    data <- read.csv(paste(directory, "/", i, ".csv", sep =""))
    good <- complete.cases(data)   # Eliminating the NA rows
    cases <- sum(good == TRUE)  # add complete value    
} 


data.frame(id = id, nobs = cases )
}

when I print the output

 id nobs
1  1  402
2  2  402
3  3  402
4  4  402
5  5  402          (incorrect)

if I just print the cases

[1] 117
[1] 1041
[1] 243
[1] 474
[1] 402

so the correct output should be

  id nobs
1  1  117
2  2 1041
3  3  243
4  4  474
5  5  402

I realize it only take last value from the (cases).

My question is how can I store the (cases) output into a vector so when I call the data.frame function it will return the correct output.

thanks

0

3 Answers 3

1

This should do the job, if id is a numeric vector (untested since you provided no reprodicible example!)

Otherwise you should use for(i in seq_along(id)) and id[i] inside the loop.

complete <- function(directory, id = 1:332) {

cases <- NULL
# Read through all the csv data file
for (i in id) {
    i <- sprintf("%03d", as.numeric(i))
    data <- read.csv(paste(directory, "/", i, ".csv", sep =""))
    good <- complete.cases(data)   # Eliminating the NA rows
    cases[i] <- sum(good == TRUE)  # add complete value    
} 


data.frame(id = id, nobs = cases )
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, Can you explain what "NULL" do?
It creates an object cases in your workspace, which it actually empty. In the for loop this object 'grows' to a vector. I must agree with @Sven Hohenstein, that this is not a very efficient solutution, however I wanted to keep the code similare to the one in your question.
1

This is a more efficient function for the task:

complete <- function(directory, id = 1:332) {
  filenames <- file.path(directory, paste0(sprintf("%03d", id), ".csv"))
  data.frame(id = id, 
             nobs = sapply(filenames, function(x) 
                                        sum(complete.cases(read.csv(x)))))
}

Comments

0
complete <- function(directory ,id = 1:332){
  folder = directory
  df_total = data.frame()
  for (x in id){
    filenames <- sprintf("%03d.csv", x) 
    filenames <- paste(folder,filenames,sep="\\")
    df <- do.call(rbind,lapply(filenames,read.csv, header=TRUE))
    my_vector <- sum(complete.cases(enter the column for which you want))
    df1 <- data.frame(id=x,nobs=my_vector)
    df_total <- rbind(df_total,df1)
  }
  df_total
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.