1

I'm trying to apply the code here which uses ldply to combine multiple csv files into one dataframe

I'm trying to figure out what the appropriate tidyverse syntax is to add a column that lists the name of the file from which the data comes from.

Here's what I have

test <- ldply( .data = list.files(pattern="*.csv"),
              .fun = read.csv,
               header = TRUE) %>%
  mutate(filename=gsub(".csv","",basename(x)))

I get

"Error in basename(x) : object 'x' not found message".

My understanding is that basename(path), but when I set the path as the folder which contains the file, the filename column that ends up getting added just has the folder name.

Any help is much appreciated!

3 Answers 3

2

You could use purrr::map_dfr

purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE),
    ~read.csv(.x) %>% mutate(file = sub(".csv$", "", basename(.x))))
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for this. I got a error for one of the columns saying it can't be converted from factor to numeric (but as far as i know that specific column is numeric to start with). Is there a parameter that needs to be set differently? Thanks again
@RookieSnowbodah I just tried on few csv files on my system. It worked for me. but maybe you can try using stringsAsFactors = FALSE in read.csv. ?
just realized that the column that's giving me issue has some numbers and some "< than" #s, i.e "<10". When I tried abc <- purrr::map_dfr(list.files(pattern="*.csv", full.names = TRUE), + ~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(file = sub(".csv$", "", basename(.x)))) I get an error saying that column can't be converted from character to numeric. Is there a workaround for this?
@RookieSnowbodah Strange, it should still work. Try to read 1 file first. Do files <- list.files(pattern="*.csv", full.names = TRUE) and then check are you able to read it with read.csv(files[1]). Also try creating a copy of the file and read it, see if there is any difference.
0

We can use imap

library(purrr)
library(dplyr)
library(stringr)
library(readr)
files <- list.files(pattern="*.csv", full.names = TRUE)
fileSub <- str_remove(basename(files), "\\.csv$")
imap_dfr(setNames(files, fileSub), ~ read_csv(.x) %>%
          mutate(file = .y))

Comments

0

I don't know if this helps anyone, I stumbled across this solution which is very simple.

Context: the .id column created by ldply lists the names of each item in your input vector. So, to combine multiple csv files and create a new column with the file name, you can do:

# get csv files in current working directory as a character vector
file_names <- list.files(pattern="*.csv") #for the example above it is .data=list.files(pattern="*.csv")

# Name these items (in this case equal to the items themselves, but can be subbed out for sample.Ids)
names(file_names) <- paste(file_names) # or for the example above names(.data) <- paste(.data)

# then use ldply to do the hard work
combined_csv <- ldply(file_names, read.csv)

# Names are stored under .id
combined_csv$.id

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.