Accessing Github data in Jupyter Books

Question

Getting a tokenising error when I try to access a csv file in Jupyter Books. Had a look at some responses but none seem to help. Any help would be appreciated. Thanks.

url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df = pd.read_csv(url, header=0, sep=',', quotechar='"')
insert_df.head()

ERROR:

---------------------------------------------------------------------------

ParserError                               Traceback (most recent call last)

<ipython-input-21-21c294baaa45> in <module>()
      1 url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
----> 2 insert_df = pd.read_csv(url, header=0, sep=',', quotechar='"')
      3 insert_df.head()

3 frames

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in read(self, nrows)
   2155     def read(self, nrows=None):
   2156         try:
-> 2157             data = self._reader.read(nrows)
   2158         except StopIteration:
   2159             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 79, saw 2

try insert_df = pd.read_html(url) and result will be list, so your dataset will be insert_df[0].head() — simpleApp
– simpleApp, Commented May 13, 2021 at 0:59

simpleApp · Accepted Answer · 2021-05-13 01:04:11Z

1

two options:

1st: read as html

url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df = pd.read_html(url)
insert_df[0].head(2)

2nd read as raw, observe the URL,"raw" in it.

url="https://raw.githubusercontent.com/Kallikrates/bde_at2/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df_raw = pd.read_csv(url, header=0, sep=',', quotechar='"')
insert_df_raw.head(2)

output:

answered May 13, 2021 at 1:04

simpleApp

3,1782 gold badges15 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kallikrates Over a year ago

Amazing. Thanks @simpleApp!

Collectives™ on Stack Overflow

Accessing Github data in Jupyter Books

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related