How do I use Python and BeautifulSoup to scrape data from an html table?

Question

if you look at this page https://metals-api.com/currencies there is an html table with 2 columns. I would like to extract all the rows from column1 into a list/array. How do I go about this?

import requests
from bs4 import BeautifulSoup

URL = "https://metals-api.com/currencies"
page = requests.get(URL)


soup = BeautifulSoup(page.content, "html.parser")


with open('outpu2t.txt', 'w', encoding='utf-8') as f: 

    f.write(soup.text)

To clarify I am not looking to run some fetch price commands against these tickers, I'm trying to compile a list of tickers so I can add them to a dropdown menu for my app

The site you're trying to access goes out of their way to block most requests like this, because they have an API you can pay for for this information. Bypassing this would require something far more complicated than just the basic requests library. — BeRT2me
– BeRT2me, Commented Apr 14, 2022 at 19:09
The site blocks easy scraping attempts against the HTML, you would need to pretend to be a full browser. Consider using selenium to that end. — ifly6
– ifly6, Commented Apr 14, 2022 at 19:11
I can get the html by just viewing the source in the browser, I'm not actually trying to see the prices for those tickers, I want to make a dropdown menu in my app with those tickers as select options — stopbanningme
– stopbanningme, Commented Apr 14, 2022 at 19:18

Md. Fazlul Hoque · Accepted Answer · 2022-04-14 19:17:47Z

1

If I understand the question, then you can try the next example

import requests
from bs4 import BeautifulSoup
import pandas as pd
data=[]
URL = "https://metals-api.com/currencies"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
for code in soup.select('.table tbody tr td:nth-child(1)'):
    code =code.text
    data.append(code)
df=pd.DataFrame(data,columns=['code'])
#df.to_csv('code.csv',index=False)# to store data
print(df)

Output:

     code
0     XAU
1     XAG
2     XPT
3     XPD
4     XCU
..    ...
209  LINK
210   XLM
211   ADA
212   BCH
213   LTC

[214 rows x 1 columns]

edited Apr 14, 2022 at 19:17

answered Apr 14, 2022 at 19:12

Md. Fazlul Hoque

16.2k5 gold badges15 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

stopbanningme Over a year ago

You are in fact understanding the question correctly, I will now attempt to figure out what you did lol, much love!

stopbanningme Over a year ago

For sure its accepted, could you please add some comments perhaps explaining what .table tbody tr td:nth-child(1) and df.to_csv('code.csv',index=False) do? Very new to python would appreciate it greatly <3

Md. Fazlul Hoque Over a year ago

table tbody tr td:nth-child(1) is css selector with bs4 see doc::crummy.com/software/BeautifulSoup/bs4/doc and df.to_csv('code.csv',index=False) save data into a csv file into pc just uncomment and run the then you will find a csv file .Thanks

BeRT2me · Accepted Answer · 2022-04-14 19:43:31Z

I sit corrected, I initially just tried pd.read_html("https://metals-api.com/currencies") which normally works, but apparently with a very slight work around it can still work just fine.

import pandas as pd
import requests
URL = "https://metals-api.com/currencies"
page = requests.get(URL)
df = pd.read_html(page.content)[0]
print(df)

Output:

     Code                                               Name
0     XAU  1 Ounce of 24K Gold. Use Carat endpoint to dis...
1     XAG                                             Silver
2     XPT                                           Platinum
3     XPD                                          Palladium
4     XCU                                             Copper
..    ...                                                ...
209  LINK                                          Chainlink
210   XLM                                            Stellar
211   ADA                                            Cardano
212   BCH                                       Bitcoin Cash
213   LTC                                           Litecoin

[214 rows x 2 columns]

Collectives™ on Stack Overflow

How do I use Python and BeautifulSoup to scrape data from an html table?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related