XLSB to CSV with pandas, python

Question

Trying to convert multiple XLSB files to CSV. Not sure what is the problem here

import os

import pandas as pd

path = r'C://Users//greencolor//Autoreport//Load_attachments//'
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        print(filename)                         #until here its working
        month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb")
        print(month)                            # I get the error here
        month = month[month['Sales Manager'] == 'DEVON, JOHN'] #filtering by manager
        month.to_csv (path + filename + ".csv", index = None, header=True)

Error

FileNotFoundError: [Errno 2] No such file or directory: 'PB orders Dec.xlsb'

Why I get this error? print(filename) is printing all the XLSB files that name starts with PB orders

That means the file path is wrong. filename is just the file name, not the full path. You need to combine it with the root path to get the actual full path — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Feb 4, 2022 at 9:38

Panagiotis Kanavos · Accepted Answer · 2022-02-04 09:46:15Z

3

filename is just the file's name, not the full path. You need to combine it with path to get the full path to the file. You can do that in a safe manner with os.path.join :

import os
...
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        full_path = os.path.join(path, filename)
        print(full_path )                         
        month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Searching with a pattern

An alternative is to use glob to search for files that match a pattern. You still need to generate the full path:

import glob

...

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    full_path = os.path.join(path, filename)
    print(full_path )                         
    month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Avoiding temp files

You still need to check the file name to avoid the temporary files generated when someone opens an Excel file (the files that start with ~) :

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    if not os.path.basename(filename).startswith("~"):
        full_path = os.path.join(path, filename)
            print(full_path )                         
            month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

edited Feb 4, 2022 at 9:46

answered Feb 4, 2022 at 9:40

Panagiotis Kanavos

134k16 gold badges211 silver badges270 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Greencolor Over a year ago

Thanks it works perfectly, but how should I change the last part where it converts to CSV?

Greencolor Over a year ago

basically i want to keep the original name ( PB orders December) but the extension should be csv

Panagiotis Kanavos Over a year ago

@Greencolor there are several answers for this, eg in this question. Most of the answers work if the filename contains only one dot. pathlib's with_suffix can be used to change the extension no matter what the rest of the filename contains

Greencolor Over a year ago

is not it possible to do with df.to_csv?

neisor · Accepted Answer · 2022-02-04 09:40:15Z

1

When you do month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb") try to replace it with this:

month = pd.read_excel(path + filename, sheet_name="Raw data ", engine="pyxlsb")

This will prepend the path to your filenames in the given directory.

answered Feb 4, 2022 at 9:40

neisor

4321 gold badge6 silver badges17 bronze badges

Collectives™ on Stack Overflow

XLSB to CSV with pandas, python

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related