Using pandas to read HTML

Question

This should be easy but I've got errors that I can't work out. I've got some air pollution stats for the UK that I want to parse.

https://uk-air.defra.gov.uk/data/DAQI-regional-data?regionIds%5B%5D=999&aggRegionId%5B%5D=999&datePreset=6&startDay=01&startMonth=01&startYear=2022&endDay=01&endMonth=01&endYear=2023&queryId=&action=step2&go=Next+

But using read_html results in the error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 2

df = pd.read_html("https://uk-air.defra.gov.uk/data/DAQI-regional-data?regionIds%5B%5D=999&aggRegionId%5B%5D=999&datePreset=6&startDay=01&startMonth=01&startYear=2022&endDay=01&endMonth=01&endYear=2023&queryId=&action=step2&go=Next+")
df

This returns the data as a list. But I want to turn that list into a dataframe.

Which is the best way to solve the problem?

If I click the link it downloads as CSV. But thank you for taking the time to point that out. At least the community solved the problem. — elksie5000
– elksie5000, Commented Apr 19, 2023 at 13:34

Timeless · Accepted Answer · 2023-04-19 13:23:41Z

5

read_html always returns a list of DataFrames even if there is only one. You need to index it.

pandas.read_html
Read HTML tables into a list of DataFrame objects.

Returns dfs A list of DataFrames.

df = pd.read_html("https://uk-air.defra.gov.uk/...")[0] # <-- add [0] at the end

Output :

print(df)

           Date  ...  West Yorkshire Urban Area
0    01/01/2022  ...                          2
1    02/01/2022  ...                          3
2    03/01/2022  ...                          3
..          ...  ...                        ...
362  29/12/2022  ...                          3
363  30/12/2022  ...                          3
364  31/12/2022  ...                          3

[365 rows x 33 columns]

answered Apr 19, 2023 at 13:23

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

elksie5000 Over a year ago

Thank you sir. What is the purpose of the [0] by the way?

Timeless Over a year ago

You're welcome. It is used to acces the first item (which is a DataFrame) of the list.

Gal Dreiman · Accepted Answer · 2023-04-19 13:27:24Z

1

Panadas read_html actually handles such cases:

import pandas as pd

# Specify the URL of the HTML page containing the table
url = "..."

# Use the pandas read_html() method to read the table data into a list of dataframes
tables = pd.read_html(url)

# If there are multiple tables on the page, you can select the one you want by index
table = tables[0]

answered Apr 19, 2023 at 13:27

Gal Dreiman

4,0192 gold badges24 silver badges43 bronze badges

Comments

Hector Chocobar-Torrejon · Accepted Answer · 2023-04-19 14:32:42Z

My code

import pandas as pd
url = "https://uk-air.defra.gov.uk/data/DAQI-regional-data?regionIds%5B%5D=999&aggRegionId%5B%5D=999&datePreset=6&startDay=01&startMonth=01&startYear=2022&endDay=01&endMonth=01&endYear=2023&queryId=&action=step2&go=Next+"
dfs = pd.read_html(url)
type(dfs)  # Output: list
len(dfs)  # Output: 1
df = pd.DataFrame(dfs)
type(df)  # Output: pandas.core.frame.DataFrame

df.columns
""" Output:
Index(['Date', 'Central Scotland', 'East Midlands', 'Eastern',
   'Greater London', 'Highland', 'North East', 'North East Scotland',
   'North Wales', 'North West & Merseyside', 'Northern Ireland',
   'Scottish Borders', 'South East', 'South Wales', 'South West',
   'West Midlands', 'Yorkshire & Humberside',
   'Belfast Metropolitan Urban Area', 'Brighton/Worthing/Littlehampton',
   'Bristol Urban Area', 'Cardiff Urban Area', 'Edinburgh Urban Area',
   'Glasgow Urban Area', 'Greater Manchester Urban Area',
   'Leicester Urban Area', 'Liverpool Urban Area', 'Nottingham Urban Area',
   'Portsmouth Urban Area', 'Sheffield Urban Area', 'Swansea Urban Area',
   'Tyneside', 'West Midlands Urban Area', 'West Yorkshire Urban Area'],
  dtype='object')
"""

Collectives™ on Stack Overflow

Using pandas to read HTML

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related