8

I am trying to import a rather small (217 rows, 87 colums, 15k) csv file for analysis in Python using Panda. The file is rather poorly structured, but I would like to still import it, since it is the raw data which I do not want to manipulate manually outside Python (e.g. with Excel). Unfortunately it always leads to a crash "The kernel appears to have died. It will restart automatically".

https://www.wakari.io/sharing/bundle/uniquely/ReadCSV

Did some research which indicated possible crashes with read_csv, but always for really large files, thus I do not understand the problem. Crash happens both using local installation (Anaconda 64-bit, IPython (Py 2.7) Notebook) and Wakari.

Can anybody help me? Would be really appreciated. Thanks a lot!

Code:

# I have a somehow ugly, illustrative csv file, but it is not too big, 217 rows, 87 colums.
# File can be downloaded at http://www.win2day.at/download/lo_1986.csv

# In[1]:

file_csv = 'lo_1986.csv'
f = open(file_csv, mode="r")
x = 0
for line in f:
    print x, ": ", line
    x = x + 1
f.close()


# Now I'd like to import this csv into Python using Pandas - but this always lead to a crash:
# "The kernel appears to have died. It will restart automatically."

# In[ ]:

import pandas as pd
pd.read_csv(file_csv, delimiter=';')

# What am I doing wrong?
1
  • Apart from the file being very messed up to a csv, I can see some special characters. Try some different encodings, or treat the file before reading it with pandas Commented Aug 19, 2014 at 1:01

2 Answers 2

8

It is because of invalid character (e.g. 0xe0) in the file

If you add encoding parameter to the read_csv() call, you will see this stacktrace instead of a segfault

>>> df = pandas.read_csv("/tmp/lo_1986.csv", delimiter=";", encoding="utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 205, in _read
    return parser.read()
  File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 608, in read
    ret = self._engine.read(nrows)
  File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1028, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas/parser.c:6745)
  File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:6964)
  File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas/parser.c:7780)
  File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:8793)
  File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:9484)
  File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:10642)
  File "parser.pyx", line 1051, in pandas.parser.TextReader._string_convert (pandas/parser.c:10905)
  File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas/parser.c:15657)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: unexpected end of data

You can do some preprocessing to remove these characters before asking pandas to read in the file

Attached a picture to highlight the invalid characters in the file

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

4

Thanks a lot for your remarks. I could not agree more to the comment, that this is indeed a very messed up csv. But unfortunately that is the way the Austrian State Lottery shares their information an drawn numbers and payout quotes.

I continued playing around, also looking at the special characters. In the end the, at least for me, solution was surprisingly simple:

pd.read_csv(file_csv, delimiter=';', encoding='latin-1', engine='python')

The added encoding helps to display the special characters correctly, but the game changes was the engine parameter. To be honest I do not understand why, but now it works.

Thanks again!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.