0

I am working on a task to automate the conversion of an excel file to a PDF using Python. I was able to achieve this by:

  1. Converting the raw excel data (based on a defined range) into a pandas data frame.
  2. Converting the data frame to an HTML file (intermediate step).
  3. Converting the HTML file to a PDF.

I am doing this in the MacOS and it is working as expected. Code for reference:

from openpyxl import load_workbook
import pdfkit as pdf
import pandas as pd
wb = load_workbook(filename = '4-Grain Advertising Report.xlsx', read_only = True, data_only=True)
ws = wb['Advertising 4 Grain P&L']

data_rows = []
for row in ws['K9':'S31']:
    data_cols = []
    for cell in row:
        data_cols.append(cell.value)
    data_rows.append(data_cols)

df = pd.DataFrame(data_rows)
df = df.replace([None], [''], regex = True)
df.to_html('test.html', header=None, index=False)
output = 'test_output.pdf'
pdf.from_file('test.html', output)

However, I am unable to retain the formatting options (for example - cell coloring / bold headers etc.) that were set in the original excel spreadsheet if I follow this method. This gives me a raw table with the values in PDF format.

Any recommendations on how I can retain the source formatting and automate the conversion to PDF via Python would be really helpful. Thanks!

1 Answer 1

1

You won't be able to retain the formatting using Pandas. It does not support styles at all. You can try use other tools like this.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.