How to convert my xlsx files into CSV in bulk via python

Question

I am trying to convert a list of xlsx files into csv format. at the moment i have been able to do this using xlrd and csv but file by file using the below code:

import xlrd
import csv

def csv_from_excel():
    wb = xlrd.open_workbook(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx")
    sh = wb.sheet_by_name('RM')
    csv_file = open('RM 2018-07-30.csv', 'w', newline='')
    wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)

    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    csv_file.close()

csv_from_excel()


import pandas as pd 
import numpy as np 

df = pd.read_csv('RM 2018-07-30.csv', index_col= 0, encoding = 'iso-8859-1')

df2 = df[['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']]

df2.to_csv(r"C:\Users\jonathon.kindred\Desktop\RM\2018\JUL 2018\RM 2018-07-30.csv", index = False)

i need to be able to do this folder by folder rather than file by file. I've managed to get a list of the next folder using glob, see below:

import glob

path = r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\AUG 2018"

files = [f for f in glob.glob(path + "**/*.xlsx", recursive=True)]

for f in files:
    print(f)

The issue is that im finding it hard to combine both scripts so that it follows these steps:

converts the .xlsx into csv
selects only these columns 'Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'
places it into my other folder.

The two folders are; the xlsx location: RM - USE and destination location: RM.

You should pass file names as the argument of your csv_from_excel function. If your are listing the file names correctly, than the the value of f in your last for should be a valid file name. Pass it to your function. — daniboy000
– daniboy000, Commented Dec 4, 2019 at 13:18

marc_s · Accepted Answer · 2019-12-10 21:50:41Z

2

Use os.listdir() to get a list of all files in specific folder

put csv_from_excel() function inside a for loop to iterate through each file in the list

path = "PATH/TO/FOLDER" 
list = os.listdir(path)
for file in list:
    fileName = str(file)
    def csv_from_excel():
        wb = xlrd.open_workbook(fileName)
        sh = wb.sheet_by_name('RM')
        csv_file = open('RM 2018-07-30.csv', 'w', newline='')
        wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)

        for rownum in range(sh.nrows):
            wr.writerow(sh.row_values(rownum))

    csv_file.close()

csv_from_excel()

Update: To select multiple columns in CSV file, use pandas to store the columns content into a pandas data frame, then you can save the data frame as a CSV to a new folder

import pandas
#Store CSV columns into a pandas data frame
colNames = ['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']
data = pandas.read_csv(fileName, names=colNames)
#Extract the CSV columns to a new CSV
df = pandas.DataFrame(data, columns = colNames)
df.to_csv('PATH/TO/NEW/CSV', index=False)

edited Dec 10, 2019 at 21:50

marc_s

760k186 gold badges1.4k silver badges1.5k bronze badges

answered Dec 4, 2019 at 13:23

O Yahya

3762 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JK1993 Over a year ago

that sounds good, how would i apply the column selection by bulk as well?

Pablo de Luis · Accepted Answer · 2019-12-04 13:35:24Z

You can just read your .xlsx files, select your wanted columns and export to csv only using pandas

df = pd.read_excel(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx", sheet_name='RM', usecols=['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'])
df.to_csv('RM 2018-07-30.csv', index=False)

If you are iterating over files and directories, glob it's perfect for it. You can also check Pathlib library, can help you to get only directories or files, extensions or combine them to get your export paths.

For example if you have your list of files using glob:

from pathlib import Path

files = glob(files_path)
output_dir = Path('destination')
for file in files:
    df = pd.read_excel(file, sheet_name='RM', usecols=['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'])

    output_file = Path(file)
    output_file = output_dir / '{}.csv'.format(output_file.stem)

    df.to_csv(output_file, index=False)

rajvi · Accepted Answer · 2019-12-04 13:20:04Z

0

need to install: $ pip install rows openpyxl

import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))

answered Dec 4, 2019 at 13:20

rajvi

1501 silver badge11 bronze badges

Collectives™ on Stack Overflow

How to convert my xlsx files into CSV in bulk via python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related