0

I have to read multiple filenames which i will be treating as input for my python script. But the input files may have variable name depending upon the time it got generated.

File1: RM_Sales_Japan_2011201920191124194200.xlsx
File2: RM_Volume_Australia_201120192019154321194200.xlsx

How to accommodate these changes while reading a file instead of exactly specifying the filename every time we run the script?

Things i tried: I have used below method in my previous scripts because it had only one file with known extension:

xlsxfile = "*.xlsx"
filelocation = "/user/script/" + xlsxfile

But with multiple files with similar extension i am not sure how to get the definition done.

EDIT1:

I was trying to get more clarity on using glob with read_excel. Please see my example code below:

import os
import glob
import pandas as pd
os.chdir ('D:\\Users\\RMoharir\\Downloads\\Smart Spend\\Input')

fls=glob.glob("Medical*.*")

df1 = pd.read_excel(fls, parse_cols = 'A:H', skiprows = 10, header = None)

But this gives me an error:

ValueError: Invalid file path or buffer object type: <class 'list'>

Any help is appreciated.

2
  • why doesn’t pass filename as parameter to your script? Commented Nov 25, 2019 at 6:07
  • filename will change depending upon the generation time of the input file. so i just want to use a partial filename Commented Nov 25, 2019 at 6:22

1 Answer 1

2

If you simply need to find all the files that match a given pattern in a directory, os and re modules have you covered.

import os
import re

files = os.listdir()

for file in files:
    if re.match(r".*\.xlsx$", file):
        print(file)

This short program will print out every file in the current directory whose name ends with .xslx. If you need to match a more complicated pattern, you may need to read up on Regular Expressions

Note that os.listdir takes an optional string argument of what path to look in, if not given it will look in the directory the program was ran from

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your answer. I modified your solution as per my need:____________ Python for fl in os.listdir(): if re.match(r"RM_.*\.xlsx$", fl): print(fl) df = pd.read_excel (fl, parse_cols = 'A:E', skiprows = 13, header = None) df.head() . ____________However, i am doing this for every file which does make my code a bit bigger. I am sure there is another way where each filename can be assigned to a different variable and that variable can be used instead of using the loop again and again.
Formatting in comments is not working. If there is ay other link than : stackoverflow.com/editing-help#comment-formatting It would be great
@RahulMoharir You can't format multi-line code in comments. I'm not sure I understand your problem, do you want to first store a list of filtered filenames then re-use that list instead of re-iterating through the directory multiple times? You can use a site like pastebin.com to link to your current code, or just add it to your question
yes that's what am trying to do. Use the loop once and get all the files saved in unique variable to use it later in my code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.