0

I'm realtively new to R and have been trying to find a working answer here for the last three hours, but just cannot seem to find a combination that works.

I have a folder that contains 841 csv files, none of the files have column names. The format is the same for every file (although some of the files might have blank columns due to there simply not being any data available for said column in that file).

I want to be able to read in all 841 csv files, add the column names and then cbind them into a single data frame.

Bringing in a single file and adding the column names is easy enough:

col.names = c("ID", "NAMES_URI",    "NAME1",    "NAME1_LANG",   "NAME2",    "NAME2_LANG",   "TYPE", "LOCAL_TYPE",
          "GEOMETRY_X", "GEOMETRY_Y", "MOST_DETAIL_VIEW_RES", "LEAST_DETAIL_VIEW_RES",  "MBR_XMIN",
          "MBR_YMIN", "MBR_XMAX", "MBR_YMAX", "POSTCODE_DISTRICT", "POSTCODE_DISTRICT_URI",
          "POPULATED_PLACE", "POPULATED_PLACE_URI", "POPULATED_PLACE_TYPE", "DISTRICT_BOROUGH",
          "DISTRICT_BOROUGH_URI", "DISTRICT_BOROUGH_TYPE", "COUNTY_UNITARY",    "COUNTY_UNITARY_URI",
          "COUNTY_UNITARY_TYPE", "REGION", "REGION_URI", "COUNTRY", "COUNTRY_URI",  "RELATED_SPATIAL_OBJECT",
          "SAME_AS_DBPEDIA", "SAME_AS_GEONAMES")

Single_File <- fread(file = "C:/Users/djr/Desktop/PostCodes/Data/HP40.csv", header = FALSE)

setnames(Single_File, col.names)

My issue comes in when I try to read the files in as a list and bind. I've tried examples using lapply or map_dfr, but they always bring up error messages about the vector size not being the same or not being able to fill or about the column specification not being the same.

My current code I am trying is:

  dir(pattern = ".csv") %>% 


 map_dfr(read_csv, col_names = c("ID", "NAMES_URI",    "NAME1",    "NAME1_LANG",   "NAME2",    "NAME2_LANG",   "TYPE", "LOCAL_TYPE",
                                  "GEOMETRY_X", "GEOMETRY_Y", "MOST_DETAIL_VIEW_RES", "LEAST_DETAIL_VIEW_RES",  "MBR_XMIN",
                                  "MBR_YMIN", "MBR_XMAX", "MBR_YMAX", "POSTCODE_DISTRICT", "POSTCODE_DISTRICT_URI",
                                  "POPULATED_PLACE", "POPULATED_PLACE_URI", "POPULATED_PLACE_TYPE", "DISTRICT_BOROUGH",
                                  "DISTRICT_BOROUGH_URI", "DISTRICT_BOROUGH_TYPE", "COUNTY_UNITARY",    "COUNTY_UNITARY_URI",
                                  "COUNTY_UNITARY_TYPE", "REGION", "REGION_URI", "COUNTRY", "COUNTRY_URI",  "RELATED_SPATIAL_OBJECT",
                                  "SAME_AS_DBPEDIA", "SAME_AS_GEONAMES"))

But this just brings up loads of output in the console that is meaningless to me, it seems to be giving a summary of each file.

Is there any simple code to bring in CSV's, add the column names to each and then cbind them all together that anyone has?

4
  • 1
    are you sure you want to cbind() or do you want to add them rowwsiw (rbind())? Commented Dec 6, 2022 at 13:32
  • My head said cbind would be the correct option because there will be differing numbers of rows, but the column names (if I can get it applied) will be identical across the set. You'll have to excuse my ignorance, but I always through rbind was to bind by row and cbind would bind by column name. Commented Dec 6, 2022 at 13:36
  • 1
    I think that code should work. Is it producing an error? Please share the error message. No need to share the other messages about column types. By the way, your title and text are incorrect - you are trying to rbind. Commented Dec 6, 2022 at 13:37
  • 1
    Thank you @D.J I knew the solution must be simple, but I could not get it to work. I was possibly trying to make it too complex. Your method brings in the data, I can then add the name to the variables. Commented Dec 6, 2022 at 13:47

1 Answer 1

1

I am not 100% sure what exactly it is you need but my best guess would be something like this:

library(data.table)

y_path   <- 'C:/your_path/your_folder'
all_csv  <- list.files(path = y_path, pattern = '.csv', full.names = TRUE)
open_csv <- lapply(all_csv, \(x) fread(x, ...)) # ... here just signifying other arguments

one_df <- data.table::rbindlist(open_csv) 
# or: do.call(rbind, open_csv)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.