1

I have a folder with a large number of Excel workbooks. Is there a way to convert every file in this folder into a CSV file using Python's xlrd, xlutiles, and xlsxWriter?

I would like the newly converted CSV files to have the extension '_convert.csv'.

OTHERWISE...

Is there a way to merge all the Excel workbooks in the folder to create one large file?

I've been searching for ways to do both, but nothing has worked...

5 Answers 5

1

Using pywin32, this will find all the .xlsx files in the indicated directory and open and resave them as .csv. It is relatively easy to figure out the right commands with pywin32...just record an Excel macro and perform the open/save manually, then look at the resulting macro.

import os
import glob
import win32com.client

xl = win32com.client.gencache.EnsureDispatch('Excel.Application')

for f in glob.glob('tmp/*.xlsx'):
    fullname = os.path.abspath(f)
    xl.Workbooks.Open(fullname)
    xl.ActiveWorkbook.SaveAs(Filename=fullname.replace('.xlsx','.csv'),
                             FileFormat=win32com.client.constants.xlCSVMSDOS,
                             CreateBackup=False)
    xl.ActiveWorkbook.Close(SaveChanges=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for writing this out, but I am unable to install the win32com extensions. Can this be done with another extension?
I'm sure it can. I haven't used the ones you mention so just threw this one out there as I am familiar with pywin32 and it lets Excel do all the work.
1

I will give a try with my library pyexcel:

from pyexcel import Book, BookWriter
import glob
import os


for f in glob.glob("your_directory/*.xlsx"):
    fullname = os.path.abspath(f)
    converted_filename = fullname.replace(".xlsx", "_converted.csv")
    book = Book(f)
    converted_csvs = BookWriter(converted_filename)
    converted_csvs.write_book_reader(book)
    converted_csvs.close()

If you have a xlsx that has more than 2 sheets, I imagine you will have more than 2 csv files generated. The naming convention is: "file_converted_%s.csv" % your_sheet_name. The script will save all converted csv files in the same directory where you had xlsx files.

In addition, if you want to merge all in one, it is super easy as well.

from pyexcel.cookbook import merge_all_to_a_book
import glob


merge_all_to_a_book(glob.glob("your_directory/*.xlsx"), "output.xlsx")

If you want to do more, please read the tutorial

Comments

0

Look at openoffice's python library. Although, I suspect openoffice would support MS document files.

Python has no native support for Excel file.

1 Comment

There are a bunch of ways to work with Excel files in Python: python-excel.org. They might not be as native as you would like - I don't think they're in the main distribution or PyPI, but they work for basic tasks.
0

Sure. Iterate over your files using something like glob and feed them into one of the modules you mention. With xlrd, you'd use open_workbook to open each file by name. That will give you back a Book object. You'll then want to have nested loops that iterate over each Sheet object in the Book, each row in the Sheet, and each Cell in the Row. If your rows aren't too wide, you can append each Cell in a Row into a Python list and then feed that list to the writerow method of a csv.writer object.

Since it's a high-level question, this answer glosses over some specifics like how to call xlrd.open_workbook and how to create a csv.writer. Hopefully googling for examples on those specific points will get you where you need to go.

Comments

0

You can use this function to read the data from each file

import xlrd

def getXLData(Filename, min_row_len=1, get_datemode=False, sheetnum=0):
  Data = []
  book = xlrd.open_workbook(Filename)
  sheet = book.sheets()[sheetnum] 
  rowcount = 0
  while rowcount < sheet.nrows:
    row = sheet.row_values(rowcount)
    if len(row)>=min_row_len: Data.append(row)
    rowcount+=1
  if get_datemode: return Data, book.datemode
  else: return Data

and this function to write the data after you combine the lists together

import csv

def writeCSVFile(filename, data, headers = []):
  import csv
  if headers:
    temp = [headers]
    temp.extend(data)
    data = temp
  f = open(filename,"wb")
  writer = csv.writer(f)
  writer.writerows(data)
  f.close()

Keep in mind you may have to re-format the data, especially if there are dates or integers in the Excel files since they're stored as floating point numbers.

Edited to add code calling the above functions:

import glob

filelist = glob.glob("*.xls*")
alldata = []
headers = []
for filename in filelist:
  data = getXLData(filename)
  headers = data.pop(0) # omit this line if files do not have a header row
  alldata.extend(data)

writeCSVFile("Output.csv", alldata, headers)

2 Comments

Can you explain this code a bit more? I tried running it, but I'm not getting any results...Specifically, I need the output (with the combined data) to be placed in the folder directory.
These are functions which you'll have to call using other code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.