0

I am trying to read 1000's of csv file & append them and save them as one rds file.

Issue: I am trying to add the filename as a column in each csv so that I know what data has come from which csv file (All files have same columns) but not able to do so.

Example csv's to work with:

# Setting up some example csv files to work with

mtcars_slim <- select(mtcars, 1:3)

write_csv(slice(mtcars_slim, 1:4), "Input data/sub folder/AA_1.csv")
write_csv(slice(mtcars_slim, 5:10), "Input data/sub folder/AAA_2.csv")
write_csv(slice(mtcars_slim, 11:1), "Input data/sub folder/BBB_3.csv")

Code I have tried below:

# this code worked but it doesn't have filename within the dataset

list.files(path = "Input data/sub folder/",
              pattern="*.csv", 
              full.names = T)  %>% 
    map_df(~read_csv(.)) %>% 
  
  saveRDS("output_compiled_data.rds")

So I have tried to modify above code to include filename as column to each csv file in below code chunk but it didn't work.

file_names <- list.files(path = "Input data/sub folder/",
              pattern="*.csv", 
              full.names = T)  %>% 
    map_df(file_names, ~read_csv(.) %>% 
                    mutate(symbol = file_names)) %>% 
  
  saveRDS("output_compiled_data.rds")


data_tbl <- read_rds("output_compiled_data.rds")
data_tbl
0

1 Answer 1

1

One option would be to use a named list of filenames. Afterwards you could add a column with the filename via the .id argument of map_df:

library(dplyr)
library(purrr)
library(readr)

mtcars_slim <- select(mtcars, 1:3)

write_csv(slice(mtcars_slim, 1:4), "AA_1.csv")
write_csv(slice(mtcars_slim, 5:10), "AAA_2.csv")
write_csv(slice(mtcars_slim, 11:1), "BBB_3.csv")

fn <- list.files(
  path = ".",
  pattern = "\\.csv",
  full.names = T
)
names(fn) <- basename(fn)

map_df(fn, ~ read_csv(., show_col_types = FALSE), .id = "file")
#> # A tibble: 21 × 4
#>    file        mpg   cyl  disp
#>    <chr>     <dbl> <dbl> <dbl>
#>  1 AA_1.csv   21       6  160 
#>  2 AA_1.csv   21       6  160 
#>  3 AA_1.csv   22.8     4  108 
#>  4 AA_1.csv   21.4     6  258 
#>  5 AAA_2.csv  18.7     8  360 
#>  6 AAA_2.csv  18.1     6  225 
#>  7 AAA_2.csv  14.3     8  360 
#>  8 AAA_2.csv  24.4     4  147.
#>  9 AAA_2.csv  22.8     4  141.
#> 10 AAA_2.csv  19.2     6  168.
#> # … with 11 more rows
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.