0

I have tried to fetch data from the web (a csv file) using Pandas in Jupyter Notebook:

import pandas as pd
df1 = pd.read_csv("https://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv")

The first time I get the following error:

IncompleteRead: IncompleteRead(5738795 bytes read, 2437944 more expected)

I try it again in a different cell in jupyter notebook and get another error:

URLError:

I try a third time and Jupyter Notebook keeps hanging for ages

Any idea what these two errors means (what is pandas trying to tell me, what happened), and how to fix them?

2
  • URLError: <urlopen error [Errno 11004] getaddrinfo failed> Commented May 27, 2017 at 18:03
  • Other than the encoding issue, it works fine for me. (I used df1 = pd.read_csv("https://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv", encoding='latin')). Commented May 27, 2017 at 18:07

1 Answer 1

1

If you use curl to download the file, or hit it with a web browser that shows the text, you'll see that the file is not UTF-8 encoded, which is what Pandas assumes it is. I cannot tell you what the encoding should be for this dataset, but you can cheat and use ISO-8859-1 to at least get it loaded and simulate the naive (and totally untrue) assumption that 1 byte == 1 char until you can get a handle on what the encoding should be.

import pandas as pd
url = "https://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv"
df1 = pd.read_csv(url, encoding="iso-8859-1")
print(df1)

Then, read up on this. It's an oldie, but a goodie: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) . Like he says, "No excuses!"

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.