1

I have been trying to write a code that read from multiple excel file in folder match a name. I have been able to achieve that part using the code at the show some code. The columns in the dataframe is id and Date.

My issue is I want to add another column called Code that will hold a code extracted from the files list to differentiate each row.

Initial dataframe after reaading the file and combinning that datasets

id                          Date              
    ExcelFile/CP1213_.xlsx  2013-05-09        
    ExcelFile/CP1213_.xlsx  2013-01-30      
    ExcelFile/CP1314_.xlsx  2013-02-14        
    ExcelFile/CP1314_.xlsx  2013-03-19        
    ExcelFile/CP1415_.xlsx  2013-02-22      
    ExcelFile/CP1415_.xlsx  2013-02-22      

The table below shows what i want to achieve:

id                          Date            Code   
ExcelFile/CP1213_.xlsx      2013-05-09      CP1213  
ExcelFile/CP1213_.xlsx      2013-01-30      CP1213
ExcelFile/CP1314_.xlsx      2013-02-14      CP1314  
ExcelFile/CP1314_.xlsx      2013-03-19      CP1314  
ExcelFile/CP1415_.xlsx      2013-02-22      CP1415 
ExcelFile/CP1415_.xlsx      2013-02-22      CP1415

The output of the files is a list: "ExcelFile/CP1213_.xlsx" "ExcelFile/CP1314_.xlsx" "ExcelFile/CP1415_.xlsx"

files <- list.files(path = "ExcelFile/", pattern   = "*.xlsx", full.names = T)



tbl <- sapply(files, read_excel, simplify=FALSE) %>% bind_rows(.id = "id") 
5
  • Do you want to extract Code from files ? sub("_.*", "", basename(files)) ? Commented Jul 9, 2019 at 11:46
  • Yes, so next time when a new file is added to the folder it automatically does that. Commented Jul 9, 2019 at 12:16
  • Yes, it will work for all the files with the same pattern. Did you try it? Commented Jul 9, 2019 at 12:18
  • Yeah I have tried it and it worked but I am thinking of how i can put it beside each row. Commented Jul 9, 2019 at 12:19
  • I added an answer explaining that. Commented Jul 9, 2019 at 12:23

2 Answers 2

1

Base on  Ronak Shah idea, you can use the mutate from the dplyr package then use the basename and then extract part of the filename using sub from the id.

files <- list.files(path = "ExcelFile/", pattern   = "*.xlsx", full.names = T)

tbl <- sapply(files, read_excel, simplify=FALSE) %>% bind_rows(.id = "id")

tbl <- tbl %>% mutate(Code = sub("_.*", "", basename(tbl$id)))
Sign up to request clarification or add additional context in comments.

Comments

0

You can use basename and then extract part of the filename using sub

df$Code <- sub("_.*", "", basename(as.character(df$id)))

df
#                      id       Date   Code
#1 ExcelFile/CP1213_.xlsx 2013-05-09 CP1213
#2 ExcelFile/CP1213_.xlsx 2013-01-30 CP1213
#3 ExcelFile/CP1314_.xlsx 2013-02-14 CP1314
#4 ExcelFile/CP1314_.xlsx 2013-03-19 CP1314
#5 ExcelFile/CP1415_.xlsx 2013-02-22 CP1415
#6 ExcelFile/CP1415_.xlsx 2013-02-22 CP1415

Or if you want to extract it from files

df$Code <- sub("_.*", "", basename(files))

4 Comments

Thank you for the help but, that's not what I want to achieve. I have edited the question again. Sorry for any misunderstanding
@NartRazak I used df as dataframe name you have to change it to tbl.
@NartRazak Did it give you any error message or wrong output. I also see that you posted the same answer. How is it different then what I suggested?
No, but don't worry have been able to make it work using your idea thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.