0

I am having trouble trying to get the data from this web table. I was wondering if anyone could shed some light on my situation, thanks!

HTML:

enter image description here

1
  • it would be better if you share your codes instead of the picture. Commented Mar 10, 2020 at 14:23

1 Answer 1

0

You want to use pandas.read_html() in order to collect an HTML table into a pandas DataFrame.

Usage:

>>> import pandas as pd
>>> pd.read_html(io)

Where from the docs:

io : str or file-like
A URL, a file-like object, or a raw string containing HTML.

For example, see with this Wikipedia URL:

In [0]: url = "https://en.wikipedia.org/wiki/List_of_Olympic_records_in_swimming"

In [1]: pd.read_html(url)[0]
Out[1]:
                      Event       Time                                               Name               Nation                Games            Date      Ref
0            50 m freestyle      21.30                                        César Cielo         Brazil (BRA)         2008 Beijing  16 August 2008      [4]
1           100 m freestyle      47.05                                     Eamon Sullivan      Australia (AUS)         2008 Beijing  13 August 2008   [5][A]
2           200 m freestyle    1:42.96                                     Michael Phelps  United States (USA)         2008 Beijing  12 August 2008      [6]
3           400 m freestyle    3:40.14                                           Sun Yang          China (CHN)          2012 London    28 July 2012      [7]
4          1500 m freestyle  ♦14:31.02                                           Sun Yang          China (CHN)          2012 London   4 August 2012      [8]
5          100 m backstroke     ♦51.85                                        Ryan Murphy  United States (USA)  2016 Rio de Janeiro  13 August 2016   [9][B]
6          200 m backstroke    1:53.41                                        Tyler Clary  United States (USA)          2012 London   2 August 2012     [10]
7        100 m breaststroke      57.13                                         Adam Peaty  Great Britain (GBR)  2016 Rio de Janeiro   7 August 2016     [11]
8        200 m breaststroke    2:07.22                                     Ippei Watanabe          Japan (JPN)  2016 Rio de Janeiro   9 August 2016  [12][C]
9           100 m butterfly      50.39                                   Joseph Schooling      Singapore (SGP)  2016 Rio de Janeiro  12 August 2016     [13]
10          200 m butterfly    1:52.03                                     Michael Phelps  United States (USA)         2008 Beijing  13 August 2008     [14]
11  200 m individual medley    1:54.23                                     Michael Phelps  United States (USA)         2008 Beijing  15 August 2008     [15]
12  400 m individual medley   ♦4:03.84                                     Michael Phelps  United States (USA)         2008 Beijing  10 August 2008     [16]
13  4×100 m freestyle relay   ♦3:08.24  Michael Phelps (47.51)Garrett Weber-Gale (47.0...  United States (USA)         2008 Beijing  11 August 2008     [17]
14  4×200 m freestyle relay    6:58.56  Michael Phelps (1:43.31)Ryan Lochte (1:44.28)R...  United States (USA)         2008 Beijing  13 August 2008     [18]
15     4×100 m medley relay    3:27.95  Ryan Murphy (51.85)Cody Miller (59.03) Michael...  United States (USA)  2016 Rio de Jainero  13 August 2016     [19]
Sign up to request clarification or add additional context in comments.

3 Comments

does it matter if the site requires a login first?
If the website requires a login first, then better collect its raw HTML beforehands using traditional requests library (passing it the necessary credentials) and then use read_html to just parse the table from raw HTML. (and not from URL).
You can accept the answer if you're satisfied with it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.