Using lapply to apply a function over read-in list of files and saving output as new list of files

Question

I'm quite new at R and a bit stuck on what I feel is likely a common operation to do. I have a number of files (57 with ~1.5 billion rows cumulatively by 6 columns) that I need to perform basic functions on. I'm able to read these files in and perform the calculations I need no problem but I'm tripping up in the final output. I envision the function working on 1 file at a time, outputting the worked file and moving onto the next.

After calculations I would like to output 57 new .txt files named after the file the input data first came from. So far I'm able to perform the calculations on smaller test datasets and spit out 1 appended .txt file but this isn't what I want as a final output.

#list filenames 
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)

#begin looping process
loop_output = lapply(files, 
function(x) {

#Load 'x' file in
DF<- read.table(x, header = FALSE, sep= "\t")

#Call calculated height average a name
R_ref= 1647.038203

#Add column names to .las data
colnames(DF) <- c("X","Y","Z","I","A","FC")

#Calculate return
DF$R_calc <- (R_ref - DF$Z)/cos(DF$A*pi/180)

#Calculate intensity
DF$Ir_calc <- DF$I * (DF$R_calc^2/R_ref^2)

#Output new .txt with calcuated columns
write.table(DF, file=, row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")

})

My latest code endeavors have been to mess around with the intial lapply/sapply function as so:

#begin looping process
loop_output = sapply(names(files), 
function(x) {

As well as the output line:

#Output new .csv with calcuated columns 
write.table(DF, file=paste0(names(DF), "txt", sep="."),
row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")

From what I've been reading the file naming function during write.table output may be one of the keys I don't have fully aligned yet with the rest of the script. I've been viewing a lot of other asked questions that I felt were applicable:

Using lapply to apply a function over list of data frames and saving output to files with different names

Write list of data.frames to separate CSV files with lapply

to no luck. I deeply appreciate any insights or paths towards the right direction on inputting x number of files, performing the same function on each, then outputting the same x number of files. Thank you.

map() from the purrr package works well for this. You can read in a folder of files, keeping them separate, and perform the same set of operations over each one. I would define a function to perform the requisite operations, and then read in, transform, then write with map() — Mako212
– Mako212, Commented Jul 19, 2017 at 16:46
So the issue to your lapply code is just the one appended text file? — Parfait
– Parfait, Commented Jul 19, 2017 at 17:02
@Parfait No, it arrives to a similar conclusion as I would like: ie, it calculates what I need to calculate and provides a correct output. However, I want to output 57 individual new files instead of the 1 appended file for data size management and for what I want to do with the files in the next step of my work process. — forest_codes
– forest_codes, Commented Jul 19, 2017 at 17:08
Then simply adjust the file= argument as @Damian shows in your write.table and add a return(DF) so your lapply returns a list of dataframes and not results of write.table() — Parfait
– Parfait, Commented Jul 19, 2017 at 20:06

Damian · Accepted Answer · 2017-07-19 17:08:50Z

2

The reason the output is directed to the same file is probably that file = paste0(names(DF), "txt", sep=".") returns the same value for every iteration. That is, DF must have the same column names in every iteration, therefore names(DF) will be the same, and paste0(names(DF), "txt", sep=".") will be the same. Along with the append = TRUE option the result is that all output is written to the same file.

Inside the anonymous function, x is the name of the input file. Instead of using names(DF) as a basis for the output file name you could do some transformation of this character string.

example.

Given

x <- "/foo/raw_data.csv"

Inside the function you could do something like this

infile <- x
outfile <- file.path(dirname(infile), gsub('raw', 'clean', basename(infile)))

outfile
[1] "/foo/clean_data.csv"

Then use the new name for output, with append = FALSE (unless you need it to be true)

write.table(DF, file = outfile, row.names = FALSE, col.names = FALSE, append = FALSE, fileEncoding = "UTF-8")

edited Jul 19, 2017 at 17:08

answered Jul 19, 2017 at 17:03

Damian

1,44311 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

forest_codes Over a year ago

thank you for your input and suggestions. This proved to be the trick! Inside the function I put 'inFile' as the first line and then 'outFile' just before my output line as you had written (with append = FALSE in the write.table line). With sapply at the function line my code wasn't working, with lapply at the function line it did. Again, thank you.

Damian Over a year ago

Glad to help, and if the issue is resolved please accept the answer to let others know. (ref: meta.stackexchange.com/questions/5234/…)

forest_codes Over a year ago

I like the gsub('raw', 'clean'... function too. Helps with an overwrite issue I was trying to prevent. I'll accept your answer

Mako212 · Accepted Answer · 2017-07-19 16:52:15Z

1

Using your code, this is the general idea:

require(purrr)

#list filenames 
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)


#Call calculated height average a name
R_ref= 1647.038203

dfTransform <- function(file){
  colnames(file) <- c("X","Y","Z","I","A","FC")

  #Calculate return
  file$R_calc <- (R_ref - file$Z)/cos(file$A*pi/180)

  #Calculate intensity
  file$Ir_calc <- file$I * (file$R_calc^2/R_ref^2)
  return(file)
}

output <- files %>% map(read.table,header = FALSE, sep= "\t") %>%
  map(dfTransform) %>%
  map(write.table, file=paste0(names(DF), "txt", sep="."),
  row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")

answered Jul 19, 2017 at 16:52

Mako212

7,3021 gold badge24 silver badges41 bronze badges

7 Comments

forest_codes Over a year ago

thank you very much for your answer and introducing me to the 'purrr' package. I've tried your walkthrough and encountered an error at the map(dfTransform) step. Error in names(x) <- value : 'names' attribute [6] must be the same length as the vector [1]

Mako212 Over a year ago

@forest_codes check that a) all your files have the same number of columns, b) that you're specifying a name for every column in the names vector colnames(file) <- c("X","Y","Z","I","A","FC") (you must provide a value for every column). If that still doesn't work, try passing col.names as an argument in read.table instead.

forest_codes Over a year ago

cont. This has me a bit perplexed, the data is in .txt format and "\t" separated and that point should be separated into 6 columns. I don't think it has anything to do with your code supplied, but I don't quickly see the error in my data either (I have 2, 5 row files for my small test set)

Mako212 Over a year ago

@forest_codes You can try read.delim which defaults to /t as the separator as has default arguments for reading tab delimited files

forest_codes Over a year ago

I've updated the data files to correct them and now receive a: "In if (file == "") file <- stdout() else if (is.character(file)) { : the condition has length > 1 and only the first element will be used", errot

|

Collectives™ on Stack Overflow

Using lapply to apply a function over read-in list of files and saving output as new list of files

2 Answers 2

3 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related