2

What I need the code to do:

  • Copy all rows under Columns A - D from 'NewData.xlsx' and update all rows under columns A - D in a copy of 'template.xlsx' called 'updated.xlsx'.

What the code actually does:

  • It (successfully!) creates a new template sheet called 'NEW_DATA' in an updated template file called 'updated.xlsx' and writes the data.... ALL IN ONE CELL.

(I need to transfer a large quantity of GPS data into an existing table for work - in case anyone is curious why I am doing this.)

Unlike previous questions I've read, I do not want to modify the number of columns or rows when updating the data, I do not want to paste the data into a new tab, I do not want to copy the whole worksheet or xlsx file and I don't want to append the data below existing data.

import openpyxl
import pandas as pd

# create variable df containing updated data in excel
DataAsXlsx = r'C:\Users\...\NewData.xlsx'
xl_workbook = pd.ExcelFile(DataAsXlsx)  # Load the excel workbook
df = xl_workbook.parse("Sheet")  # Parse the sheet into a dataframe

#Reads template xlsx, creates template sheet 'NEW_DATA'
template = openpyxl.load_workbook(r'C:\Users\...\template.xlsx')
template.sheetnames
sheet1 = template.worksheets[0]
sheet1.title = 'NEW_DATA'
sheet1 = template['NEW_DATA']

#^^^everything above this line works^^^


#Code below attempts to copy rows AND columns from NewData.xlsx and paste to sheet 'NEW_DATA' in updated.xlsx

for row in range(1, sheet1.max_row+1): 
   cell = sheet1.cell(row=row, column=1)
   if cell.value is not None:
        cell.value = str(df)

#This pastes ALL DATA into ColA of sheet 'NEW_DATA' in updated.xlsx

template.save('updated.xlsx')

Here is what NewData.xlsx looks like in Excel:

what NewData.xlsx looks like in excel

For the purposes of debugging, template.xlsx can be any existing excel file.

I've already read : Update rows and column using openpyxl from python and it helped with iterating over the template file BUT it uses hard coded data "(c)" and this logic does not transfer to what I need.

I've read almost every question on here regarding pandas and openpyxl and also read the docs. I am at a loss for what to do next.

UPDATE

per Charlie's feedback, I did the following:

from openpyxl import load_workbook

wb1 = load_workbook(r'C:\Users\...\NewData.xlsx')
wb2 = load_workbook(r'C:\Users\...\template.xlsx')
ws1 = wb1['Sheet']
ws2 = wb2.get_active_sheet() 

for row in ws1.iter_rows(max_col=4):
        values = (c.value for c in row)
        ws2.append(values)
ws2.save('updated.xlsx')

And this appends the data at the bottom of the existing data set (it should replace the data in COL A - D) any Suggestions would help - I am so close!!

FINAL UPDATE

HOORAY - this works!!!

import pandas as pd

#use pandas to access the new data 
DataAsXlsx = pd.read_excel(r'C:\Users\...\NewData.xlsx', sheet_name='Sheet1')

#this reads the template file
template = r'C:\Users\...\template.xlsx'
df = pd.read_excel(template)

#this creates a new document named FinalAutomatedDataSheet.xlsx
writer = pd.ExcelWriter(r'C:\Users\....\FinalAutomatedDataSheet.xlsx') 

#this line overlays the template file data onto FinalAutomatedDataSheet.xlsx
df.to_excel(writer, startcol=0,startrow=0, index=False)

#This line writes the new data to FinalAutomatedDataSheet.xlsx
#NOTE: you can SPECIFY COLUMN and ROW indices below!!:
DataAsXlsx.to_excel(writer, startcol=0,startrow=0, index=False)

writer.save()
2
  • Here cell.value = str(df) you should specify the column and row of the df too Commented Aug 8, 2018 at 5:25
  • Can you provide an example on how to specify the column and row of the 'df'? This is what I cannot find an example for in the docs or anywhere on StackOverflow. I am just trying to copy data from columns A-D from NewData.xlsx to columns A - D in Updated.xlsx. There has to be a simple way to do this. I assume it'll be something like: df.loc[row_indexer,column_indexer] source: pandas.pydata.org/pandas-docs/stable/indexing.html Commented Aug 8, 2018 at 16:39

1 Answer 1

2

Your current code attempts to stick and entire dataframe into a cell.

If you're just copying between worksheets then I would suggest you use openpyxl's read-only mode to read the data.

from openpyxl import load_workbook
wb1 = load_workbook(read_only=True)
wb2 = load_workbook(template)
ws1 = wb1['Sheet']
ws2 = wb2.create_sheet("NEW_DATA") # it's not quite clear what you want
for row in ws1.iter_rows(max_col=4):
    values = (c.value for c in row)
    ws2.append(values)
Sign up to request clarification or add additional context in comments.

10 Comments

I am copying data from one excel document to another, not between worksheets of the same document. My apologies if this was not clear. The new excel document 'updated.xlsx' is a copy of 'template.xlsx'. The line ws2 = wb2.create_sheet("NEW_DATA") creates a sheet in updated.xlsx. This part of the code works great. I agree that the for loop is not adding data for each cell, but for the whole dataframe. When I tried to implement your solution to see what it does, I got a syntax error for the 'for loop' line. Is it possible to provide another example that iterates over each row and column?
Maybe these answers can help stackoverflow.com/questions/51739488/…
@joe I tried the following from the link you provided: for row in df['A1':'Z100']: for cell in row: sheet1[cell.coordinate].value = cell.value template.save('updated.xlsx') ...and got the following error: TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [A1] of <type 'str'>. I want to iterate over the range(len(df)) until there are no more cells. I would rate my Python skills as a solid beginner so I apologize if the solution is obvious. Let me know if I need to add this to the question so its easier to read.
Also note that the data is mixed-type containing: strings, symbols and numbers. I think this also might be contributing to the issue?
Thanks! It is still appending data at the bottom - not replacing it. Any suggestion on fixing it to replace, not append, the data? I tried other commands for .append() and nothing seems to work (something like .update() or .replace() but both throw an error)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.