0

I use pandas to read this excel files, which is downloaded from a website via an automation script. Here's my code:

import pandas as pd
df = pd.read_excel('CallHistory.xlsx')

But it shows errors below:

ValueError                                Traceback (most recent call last)
c:\Users\minhviet\Box\Telio\vietpm\python\crawler\test_crawl_3.ipynb Cell 7' in <module>
      1 import pandas as pd
----> 2 df = pd.read_excel('CallHistory.xlsx')
      3 df

File c:\Users\minhviet\Anaconda3\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File c:\Users\minhviet\Anaconda3\lib\site-packages\pandas\io\excel\_base.py:364, in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
    362 if not isinstance(io, ExcelFile):
    363     should_close = True
--> 364     io = ExcelFile(io, storage_options=storage_options, engine=engine)
    365 elif engine and engine != io.engine:
    366     raise ValueError(
    367         "Engine should not be specified when passing "
    368         "an ExcelFile - ExcelFile already has the engine set"
...
    127     if value not in self.values:
--> 128         raise ValueError(self.__doc__)
    129     super(Set, self).__set__(instance, value)

ValueError: Value must be one of {'visible', 'hidden', 'veryHidden'}

I search about this error and find some information. Seems like the state of the sheet is wrong. https://github.com/exceljs/exceljs/issues/678

I try to open this file in Excel, edit something and save, then I can read it from pandas successfully. However, opening this file is a part of an automation script, so using Excel to open and edit is impossible.

You guys can download file here, hopefully anyone can find a way to handle this file by Python: https://app.box.com/s/8vds9zmhhxhn18p0ngodqeepfcgpzevv

4
  • The file includes Vietnamese language, you might need utf-8 encoding, or other procedures that make the environment can resolve Vietnamese language. Commented Jun 22, 2022 at 9:35
  • I can read that file normally with/without utf-8 encoding yesterday, but there's some changes from service where I download data so this file has a wrong sheet state. Commented Jun 22, 2022 at 10:28
  • I can open the file and edit, or load it into data frame directly , with my Mac in Chinese language base environment, that’s why I guess you probably have to deal with encoding stuff. Commented Jun 22, 2022 at 11:05
  • have you tried this outlink Commented Jun 22, 2022 at 11:16

4 Answers 4

1

I found the solution. I try to read this file through pandas read_excel with different engine (openpyxl and xlrd). Openpyxl shows me above error (ValueError: Value must be one of {'visible', 'hidden', 'veryHidden'}), while xlrd shows another error (KeyError: 'show'). I searched and found that this error was fixed in a commit of xlrd after 1.2.0 ver but before ver 2.0 (which don't support xlsx file anymore), so this fix was not in any official release of xlrd. Here's the information about this fix: https://github.com/python-excel/xlrd/commit/6ec98fc74796a6439c6dd64ed71597a3c50d4986#diff-74efe2926535c21edf8087564ce132fe

So I installed this fixed commit of xlrd. Be sure to remove other versions of xlrd before installing this version.

pip install git+https://github.com/python-excel/xlrd.git@6ec98fc74796a6439c6dd64ed71597a3c50d4986

Then I read this file by engine xlrd. It works.

import pandas as pd
df = pd.read_excel("C://Users/minhviet/Documents/CallHistory.xlsx", engine='xlrd')
Sign up to request clarification or add additional context in comments.

Comments

0
import pandas as pd
df = pd.read_excel("C:/Users/minhviet/Documents/CallHistory.xlsx", engine='xlrd')

After C: only use one / is working for me.

Comments

0

Can you check the slashes in that file location? Windows generally uses backslashes (\) while Mac/Unix/Linux use forward slashes(/). Pandas sometimes recognises both, but it can be inconsistent. You're better off going with the OS default.

When you're using file locations in Python, you have to be careful that the backslashes aren't recognised as escape characters. There are two ways to do this.

  1. Double the backslashes - "C:\Users\path\to\file" becomes "C:\\Users\\path\\to\\file".
  2. Prefix the string with r: r"C:\Users\path\to\file". The r prefix stands for raw, and tells the parser to accept all backslashes literally, never as escape characters.

Comments

0

the correct answer should be

df = pd.read_excel(open('CallHistory.xlsx','rb'),sheet_name=0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.