I am working on a task to automate the conversion of an excel file to a PDF using Python. I was able to achieve this by:
- Converting the raw excel data (based on a defined range) into a pandas data frame.
- Converting the data frame to an HTML file (intermediate step).
- Converting the HTML file to a PDF.
I am doing this in the MacOS and it is working as expected. Code for reference:
from openpyxl import load_workbook
import pdfkit as pdf
import pandas as pd
wb = load_workbook(filename = '4-Grain Advertising Report.xlsx', read_only = True, data_only=True)
ws = wb['Advertising 4 Grain P&L']
data_rows = []
for row in ws['K9':'S31']:
data_cols = []
for cell in row:
data_cols.append(cell.value)
data_rows.append(data_cols)
df = pd.DataFrame(data_rows)
df = df.replace([None], [''], regex = True)
df.to_html('test.html', header=None, index=False)
output = 'test_output.pdf'
pdf.from_file('test.html', output)
However, I am unable to retain the formatting options (for example - cell coloring / bold headers etc.) that were set in the original excel spreadsheet if I follow this method. This gives me a raw table with the values in PDF format.
Any recommendations on how I can retain the source formatting and automate the conversion to PDF via Python would be really helpful. Thanks!