I have a very large CSV data set (several million records). I have already filtered and massaged and split this list to a clients specification. This was all done in Python3.3
The last requirement is that these split lists be saved in Excel format. They have a utility that imports an Excel spreadsheet (in a specific format) into their database after doing some calculations and checking for existing duplicates in the DB. My problem is that their utility only works on Excel 2003 .xls files... I didn't know this ahead of time.
So I can already write the data in the correct format for Excel 2007 using OpenPyXl, but these files won't work. I can write CSV files but those don't work either, their importer needs xls files. Maybe there is a way to batch convert all the files from Excel 2007 xlsx format to xls format, or from csv format to xls format? There are thousands of files so it can't be done by hand.
The best thing to do would be output them in the correct format, but I can't seem to find a python 3 compatible way that will work with Excel 2003 format. xlwt is python 2.x only.
Does anyone have suggestions how I can finish this?
EDIT: This is what the solution looked like.
EDIT2: Added the workbook close as suggested by stenci.
import os
import errno
import glob
import time
import win32com.client
def xlsx_to_xls(path):
xlsx_files = glob.glob(path+'\\*.xlsx')
if len(xlsx_files) == 0:
raise RuntimeError('No XLSX files to convert.')
xlApp = win32com.client.Dispatch('Excel.Application')
for file in xlsx_files:
xlWb = xlApp.Workbooks.Open(os.path.join(os.getcwd(), file))
xlWb.SaveAs(os.path.join(os.getcwd(), file.split('.xlsx')[0] + '.xls'), FileFormat=1)
xlWb.Close()
xlApp.Quit()
time.sleep(2) # give Excel time to quit, otherwise files may be locked
for file in xlsx_files:
os.unlink(file)