combining ldply to combine multiple csv files AND add column with file names via mutate/basename

Question

I'm trying to apply the code here which uses ldply to combine multiple csv files into one dataframe

I'm trying to figure out what the appropriate tidyverse syntax is to add a column that lists the name of the file from which the data comes from.

Here's what I have

test <- ldply( .data = list.files(pattern="*.csv"),
              .fun = read.csv,
               header = TRUE) %>%
  mutate(filename=gsub(".csv","",basename(x)))

I get

"Error in basename(x) : object 'x' not found message".

My understanding is that basename(path), but when I set the path as the folder which contains the file, the filename column that ends up getting added just has the folder name.

Any help is much appreciated!

Ronak Shah · Accepted Answer · 2019-06-12 07:41:50Z

2

You could use purrr::map_dfr

purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE),
    ~read.csv(.x) %>% mutate(file = sub(".csv$", "", basename(.x))))

answered Jun 12, 2019 at 7:41

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

dnmc Over a year ago

Thanks for this. I got a error for one of the columns saying it can't be converted from factor to numeric (but as far as i know that specific column is numeric to start with). Is there a parameter that needs to be set differently? Thanks again

Ronak Shah Over a year ago

@RookieSnowbodah I just tried on few csv files on my system. It worked for me. but maybe you can try using stringsAsFactors = FALSE in read.csv. ?

dnmc Over a year ago

just realized that the column that's giving me issue has some numbers and some "< than" #s, i.e "<10". When I tried abc <- purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE), + ~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(file = sub(".csv$", "", basename(.x)))) I get an error saying that column can't be converted from character to numeric. Is there a workaround for this?

Ronak Shah Over a year ago

@RookieSnowbodah Strange, it should still work. Try to read 1 file first. Do files <- list.files(pattern="*.csv", full.names = TRUE) and then check are you able to read it with read.csv(files[1]). Also try creating a copy of the file and read it, see if there is any difference.

akrun · Accepted Answer · 2019-06-12 14:14:04Z

0

We can use imap

library(purrr)
library(dplyr)
library(stringr)
library(readr)
files <- list.files(pattern="*.csv", full.names = TRUE)
fileSub <- str_remove(basename(files), "\\.csv$")
imap_dfr(setNames(files, fileSub), ~ read_csv(.x) %>%
          mutate(file = .y))

answered Jun 12, 2019 at 14:14

akrun

891k38 gold badges590 silver badges700 bronze badges

Comments

Benjamin Simpson · Accepted Answer · 2021-02-12 11:40:37Z

I don't know if this helps anyone, I stumbled across this solution which is very simple.

Context: the .id column created by ldply lists the names of each item in your input vector. So, to combine multiple csv files and create a new column with the file name, you can do:

# get csv files in current working directory as a character vector
file_names <- list.files(pattern="*.csv") #for the example above it is .data=list.files(pattern="*.csv")

# Name these items (in this case equal to the items themselves, but can be subbed out for sample.Ids)
names(file_names) <- paste(file_names) # or for the example above names(.data) <- paste(.data)

# then use ldply to do the hard work
combined_csv <- ldply(file_names, read.csv)

# Names are stored under .id
combined_csv$.id

Collectives™ on Stack Overflow

combining ldply to combine multiple csv files AND add column with file names via mutate/basename

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related