1

I am quite new to Python/Pandas. I have a situation where I have to update an existing sheet with new data every week. this 'new' data is basically a processed data from raw csv files which are generated every week and I have already written a python code to generate this 'new' data which is basically a pandas Dataframe in my code. Now I want to append this Dataframe object to an existing sheet in my excel workbook. I am already using the below code to write the DF to the XL Workbook into a specific sheet.

workbook_master=openpyxl.load_workbook('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx')

writer=pandas.ExcelWriter('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx',engine='openpyxl',mode='a')

df_latency.to_excel(writer,sheet_name='Latency',startrow=workbook_master['Latency'].max_row,startcol=0,header=False,index=False)

writer.save()
writer.close()

now the problem is when i run the code and open the excel file, instead of writing the dataframe to existing sheet 'Latency', the code creates a new sheet 'Latency1' and writes the Dataframe to it. the contents and the positioning of the Dataframe is correct but I do not understand why the code is creating a new sheet 'Latency1' instead of writing the Dataframe into existing sheet 'Latency'

will greatly appreciate any help here.

Thanks Faheem

2 Answers 2

1

By default, when ExcelWriter is instantiated, it assumes a new Empty Workbook with no Worksheets.

So when you try to write data into 'Latency', it creates a new blank Worksheet instead. In addition, the openpxyl library performs a check before writing to "avoid duplicate names" (see openpxyl docs : line 18), which numerically increment the sheet name to write to 'Latency1' instead.

To go around this problem, copy the existing Worksheets into the ExcelWriter.sheets attribute, after writer is created. Like this:

writer.sheets = dict((ws.title, ws) for ws in workbook_master.worksheets)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks alot Gary. This solved the issue. But i dont understand why ExcelWriter doesnt instantiate book and worksheets when the object is created and specially when the mode='a' plus it also doesnt mention anywhere on the ExcelWriter documentation about this issue. So unless you analyze ExcelWriter code and crack it, you wouldnt have been able to figure it out. Anyways thanks a zillion for this. Really solved my problem!!
Yes, I agree that it is confusing. Perhaps we should create an issue on github for pandas for this use case so as to improve usability.
0

you can append pandas dataframe to an existing sheet with openpyxl or overwrite it.

but it would help if you were cautious about what to do in case of:

  1. The file (.xlsx) does not exist
  2. The sheet does not exist
  3. Include the header of the df in each append or not.

That is why it would be better to create a custom function to help with those case

The custom function help handling:

  • creating a file if does not exist
  • creating a Sheet if does not exist
  • not repeating the header of the df for each append
  • copy the existing Excel file into the memory
  • Selecting the target Sheet to modify
  • append the df content
  • save the Excel file after the changes

creating a custom function "df2xlsx", One of the demerits of this function is that it has a long execution time

# stdlib imports ------------
import os

# Third-party imports --------
import pandas as pd
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows

df2xlsx function

def df2xlsx(df:pd.DataFrame, file:str, sheet_name:str, append:bool = True) -> None:   
   
    '''
    Parameters
    ----------
    df : pd.DataFrame
        target dataframe.
    file : str, optional
        File path. 
    sheet_name : str, optional
        target sheet. 
    append : bool, optional
        [True] Append data, [False] Overwriting the existing sheet withe df data. The default is True.

    Returns
    -------
    None        

    '''
    
    # if the file exists
    if os.path.isfile(file):
        # read the existing file 
        wb     = openpyxl.load_workbook(file)     
        
        # If sheet_name not in the file, create one  
        if sheet_name not in wb.sheetnames:
            wb.create_sheet(sheet_name)
            # Add the header
            header = True
            
        elif sheet_name in wb.sheetnames:
            # if append contents and the file exists
            if append :        
                # remove the header, there is one already
                header = False
                
            elif append == False:
                # Add the header
                header = True  
                # remove sheetnames
                wb.remove(wb[sheet_name])
                
                # create one
                wb.create_sheet(sheet_name)
        
        # if the file exists
    elif os.path.isfile(file) == False:
        # create new workbook in memory 
        wb = openpyxl.Workbook()
        # Add the header
        header = True  
        # If sheet_name not in the file, rename the active sheet  
        if sheet_name not in wb.sheetnames :
            ## select active sheet
            ws = wb.active
            ws.title = sheet_name
           
            
    ## select sheet_name sheet
    ws = wb[sheet_name]
    
    ## write the df to the sheet
    for r in dataframe_to_rows(df, index=False, header=header):
         ws.append(r)        

    # saving xlsx  
    wb.save(file)

code case #1 [appending data to the existing Sheet]

    '''
       some modifications or adding data from other sources
    '''
    # append df_latency to existing xlsx file in "Latency" sheet name 
    df2xlsx(df = df_latency, file = "C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx", sheet_name = "Latency", append = True)

code case #2 [overwiting the existing Sheet]

    '''
       some modifications or adding data from other sources
    '''
    # overwrite df_latency to existing xlsx file in "Latency" sheet name 
    df2xlsx(df = df_latency, file = "C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx", sheet_name = "Latency", append = False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.