21

Is there a way to have pandas read in only the values from excel and not the formulas? It reads the formulas in as NaN unless I go in and manually save the excel file before running the code. I am just working with the basic read excel function of pandas,

import pandas as pd

df = pd.read_excel(filename, sheetname="Sheet1")

This will read the values if I have gone in and saved the file prior to running the code. But after running the code to update a new sheet, if I don't go in and save the file after doing that and try to run this again, it will read the formulas as NaN instead of just the values. Is there a work around that anyone knows of that will just read values from excel with pandas?

6
  • Is your excel spreadsheet in auto calculation mode? Commented Jan 18, 2017 at 15:37
  • Yes formulas are set to auto calculate. Commented Jan 18, 2017 at 16:48
  • 1
    Weird, is your original file saved by a human or coming from a program before you resale it? Commented Jan 18, 2017 at 17:09
  • Saved by a person, we pull in data and then create formulas based on that. But the data we pull in we just copy and pasted into a file and then saved. Commented Jan 18, 2017 at 18:51
  • I also faced similar issue, however it occurs only for first few rows, for rest it picks up the values processed by the formula. Commented Oct 29, 2018 at 15:32

4 Answers 4

13

That is strange. The normal behaviour of pandas is read values, not formulas. Likely, the problem is in your excel files. Probably your formulas point to other files, or they return a value that pandas sees as nan.

In the first case, the sheet needs to be updated and there is nothing pandas can do about that (but read on).

In the second case, you could solve by setting explicit nan values in read_excel:

pd.read_excel(path, sheetname="Sheet1", na_values = [your na identifiers])

As for the first case, and as a workaround solution to make your work easier, you can automate what you are doing by hand using xlwings:

import pandas as pd
import xlwings as xl

def df_from_excel(path):
    app = xl.App(visible=False)
    book = app.books.open(path)
    book.save()
    app.kill()
    return pd.read_excel(path)

df = df_from_excel(path to your file)

If you want to keep those formulas in your excel file just save the file in a different location (book.save(different location)). Then you can get rid of the temporary files with shutil.

Sign up to request clarification or add additional context in comments.

3 Comments

Yeah I knew I could automate the saving, I was just trying to figure out if there was a way around that. The formulas don't point to other files, they are just taking the value from one column and subtracting the sum of values of the other columns, so they just return numbers.
Writing this in Sept, 2020 - pd.read_excel() now reads Values even when the formulas point to other files.
Writing this in Jan 2021 - pd.read_excel() only reads in values if the notebook has been "opened" hence the need to simulate opening and saving the file with xlwings. As far as I can tell, formulas aren't calculated until the workbook is opened, which is why cells with formulas have no value associated with them when the workbook is only updated by the script
0

I had this problem and I resolve it by moving a graph below the first row I was reading. Looks like the position of the graphs may cause problems.

Comments

0

you can use xlrd to read the values. first you should refresh your excel sheet you are also updating the values automatically with python. you can use the function below file = myxl.xls

import xlrd
import win32com.client
import os

def refresh_file(file):
    xlapp = win32com.client.DispatchEx("Excel.Application")
    path = os.path.abspath(file)
    wb =  xlapp.Wordbooks.Open(path)
    wb.RefreshAll()
    xlapp.CalculateUntilAsyncqueriesDone()
    wb.Save()
    xlapp.Quit()

after the file refresh, you can start reading the content.

workbook = xlrd.open_workbook(file)
worksheet = workbook.sheet_by_index(0)
for rowid in range(worksheet.nrows):
    row = worksheet.row(rowid)
    for colid, cell in enumerate(row):
        print(cell.value)

you can loop through however you need the data. and put conditions while you are reading the data. lot more flexibility

3 Comments

What is happening on this code ? I can't make the refresh function work at all
In Rajat's post, wb.save() did not work. But wb.Save() did get the refresh function working.
Hi Your code works for me but how I can save the file silently with same file name. Thank you
-1

I had the exact same problem, but i found out that (in my case) there was no problem.

I use the EXCEL VIEWER extension. Looking in the excel file with that extension it looks like the values are not there, instead the formula is there, if you click on the cell that says "#NAME?.

Anyway ... after doing pd.read_excel the dataframe works correctly.

1 Comment

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.