Converting all worksheets in an Excel workbook to csv format

Question

My Excel document my.xlsx has two Sheets named Sheet1 and Sheet2. I want to convert all worksheets to csv format using xlsx2csv. I used the following commands:

from xlsx2csv import *
xlsx2csv my.xlsx convert.csv
File "<stdin>", line 1
    xlsx2csv my.xlsx convert.csv
              ^
SyntaxError: invalid syntax

x2c -a my.xlsx my1.csv
  File "<stdin>", line 1
    x2c -a my.xlsx my1.csv
            ^
SyntaxError: invalid syntax

Any help, please.

knl · Accepted Answer · 2019-06-12 01:45:42Z

2

I have not used xlsx2csv before but why don't we try pandas.

Your requirement can be solved like this:

import pandas as pd
for sheet in ['Sheet1', 'Sheet2']:
    df = pd.read_excel('my.xlsx', sheetname=sheet)
    df.to_csv(sheet + '_output.csv', index=False)

answered Jun 12, 2019 at 1:45

knl

1,0711 gold badge19 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MYaseen208 Over a year ago

Thanks @Kelvin for your answer. Would highly appreciate if you give the solution which is not using the names of the sheets. B/c my xlsx is very large which is not easy to open using Excel so do not know the names of sheets at hand. Thanks

knl Over a year ago

@MYaseen208 yes, absolutely. You can do something like this: my_file = pd.ExcelFile('my.xlsx') and after that just loop through all the sheets: for sheet in my_file.sheet_names: df = pd.read_excel('my.xlsx', sheetname=sheet)

xjtan Over a year ago

because pd.read_excel is extremely slow.

jgran Over a year ago

Seems to be since 2018 sheet_name in pd.read_excel('my.xlsx', sheet_name=sheet). See typeerror-with-pandas-read-excel.

caot · Accepted Answer · 2019-06-12 02:28:49Z

2

You can do something as the follows:

import pandas as pd

xls_file = pd.ExcelFile('<path_to_your_excel_file>')
sheet_names = xls_file.sheet_names

for sheet in sheet_names:
    df = xls_file.parse(sheet)

edited Jun 12, 2019 at 2:28

answered Jun 12, 2019 at 2:18

caot

3,3681 gold badge37 silver badges44 bronze badges

Comments

GERMAN RODRIGUEZ · Accepted Answer · 2020-08-22 23:55:59Z

Xlsx2csv python implementation:
Could only execute Xlsx2csv with sheetid parameter. In order to get sheet names and ids, get_sheet_details was used.
csvfrmxlsx creates csv files for each sheet in csv folder under parent directory.

import pandas as pd
from pathlib import Path


def get_sheet_details(filename):
    import os
    import xmltodict
    import shutil
    import zipfile
    sheets = []
    # Make a temporary directory with the file name
    directory_to_extract_to = (filename.with_suffix(''))
    os.mkdir(directory_to_extract_to)
    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(filename, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()
    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = directory_to_extract_to / 'xl' / 'workbook.xml'
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['@sheetId'],  # can be sheetId for some versions
                'name': sheet['@name']  # can be name
            }
            sheets.append(sheet_details)
    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets


def csvfrmxlsx(xlsxfl, df):  # create csv files in csv folder on parent directory
    from xlsx2csv import Xlsx2csv
    for index, row in df.iterrows():  
        shnum = row['id']
        shnph = xlsxfl.parent / 'csv' / Path(row['name'] + '.csv')  # path for converted csv file
        Xlsx2csv(str(xlsxfl), outputencoding="utf-8").convert(str(shnph), sheetid=int(shnum))  
    return


pthfnc = 'c:/xlsx/'
wrkfl = 'my.xlsx'
xls_file = Path(pthfnc + wrkfl)
sheetsdic = get_sheet_details(xls_file)  # dictionary with sheet names and ids without opening xlsx file
df = pd.DataFrame.from_dict(sheetsdic)
csvfrmxlsx(xls_file, df)  # df with sheets to be converted

Collectives™ on Stack Overflow

Converting all worksheets in an Excel workbook to csv format

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related