I am trying to convert a list of xlsx files into csv format. at the moment i have been able to do this using xlrd and csv but file by file using the below code:
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook(r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\JUL 2018\RM 2018-07-30.xlsx")
sh = wb.sheet_by_name('RM')
csv_file = open('RM 2018-07-30.csv', 'w', newline='')
wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
wr.writerow(sh.row_values(rownum))
csv_file.close()
csv_from_excel()
import pandas as pd
import numpy as np
df = pd.read_csv('RM 2018-07-30.csv', index_col= 0, encoding = 'iso-8859-1')
df2 = df[['Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No']]
df2.to_csv(r"C:\Users\jonathon.kindred\Desktop\RM\2018\JUL 2018\RM 2018-07-30.csv", index = False)
i need to be able to do this folder by folder rather than file by file. I've managed to get a list of the next folder using glob, see below:
import glob
path = r"C:\Users\jonathon.kindred\Desktop\RM - USE\2018\AUG 2018"
files = [f for f in glob.glob(path + "**/*.xlsx", recursive=True)]
for f in files:
print(f)
The issue is that im finding it hard to combine both scripts so that it follows these steps:
- converts the .xlsx into csv
- selects only these columns 'Purchase Order','SKU','Markdown','Landed Cost','Original Price','Current Sale Price','Free Stock','OPO','ID Style','Supplier Style No'
- places it into my other folder.
The two folders are; the xlsx location: RM - USE and destination location: RM.