0

Trying to convert multiple XLSB files to CSV. Not sure what is the problem here

import os

import pandas as pd

path = r'C://Users//greencolor//Autoreport//Load_attachments//'
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        print(filename)                         #until here its working
        month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb")
        print(month)                            # I get the error here
        month = month[month['Sales Manager'] == 'DEVON, JOHN'] #filtering by manager
        month.to_csv (path + filename + ".csv", index = None, header=True)

Error

FileNotFoundError: [Errno 2] No such file or directory: 'PB orders Dec.xlsb'

Why I get this error? print(filename) is printing all the XLSB files that name starts with PB orders

1
  • 2
    That means the file path is wrong. filename is just the file name, not the full path. You need to combine it with the root path to get the actual full path Commented Feb 4, 2022 at 9:38

2 Answers 2

3

filename is just the file's name, not the full path. You need to combine it with path to get the full path to the file. You can do that in a safe manner with os.path.join :

import os
...
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        full_path = os.path.join(path, filename)
        print(full_path )                         
        month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Searching with a pattern

An alternative is to use glob to search for files that match a pattern. You still need to generate the full path:

import glob

...

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    full_path = os.path.join(path, filename)
    print(full_path )                         
    month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Avoiding temp files

You still need to check the file name to avoid the temporary files generated when someone opens an Excel file (the files that start with ~) :

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    if not os.path.basename(filename).startswith("~"):
        full_path = os.path.join(path, filename)
            print(full_path )                         
            month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks it works perfectly, but how should I change the last part where it converts to CSV?
basically i want to keep the original name ( PB orders December) but the extension should be csv
@Greencolor there are several answers for this, eg in this question. Most of the answers work if the filename contains only one dot. pathlib's with_suffix can be used to change the extension no matter what the rest of the filename contains
is not it possible to do with df.to_csv?
1

When you do month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb") try to replace it with this:

month = pd.read_excel(path + filename, sheet_name="Raw data ", engine="pyxlsb")

This will prepend the path to your filenames in the given directory.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.