Fetching a list from a website using beautifulsoup in a dataframe column

Question

I am trying to fetch the keywords from the article website. The website keywords look like this:

`This is the link:` `https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980`

I am using this to fetch the keywords:

   Article_Keyword = bs.find('div', {'class':'ListTags'}).get_text()

and this is how what i am getting:

Themen Bundesgerichtshof Amazon Verband Sozialer Wettbewerb Kundenbewertung Tape dpa

I need to get it by separating each keyword by comma. I can do it by RE but some keywords are with more than one word so i need that as one keyword.

is there any way to get each keyword by separating with comma?

You should lookup for a elements under your Article_keyword. Not sure this work, Article_Keyword.find_all("a") — Wonka
– Wonka, Commented Feb 20, 2020 at 9:36
It will work i guess also but i need a separator between them like coma — s_khan92
– s_khan92, Commented Feb 20, 2020 at 9:38
it will be a list, can do ",".join() to get elements in list as string separated by , — Wonka
– Wonka, Commented Feb 20, 2020 at 9:41

Sundeep · Accepted Answer · 2020-02-20 09:50:27Z

1

I used a child class element to Identify each element separately. I hope the below code helps.

from bs4 import BeautifulSoup as soup
from requests import get
url = "https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980"
clnt = get(url)
page=soup(clnt.text,"html.parser")
data = page.find('div', attrs={'class':'ListTags'})
data1 = [ele.text for ele in data.find_all('a',attrs={'class':'PageArticle_keyword'})]
print(data1)
print(",".join(data1))

Output:

>> ['Bundesgerichtshof', 'Amazon', 'Verband Sozialer Wettbewerb', 'Kundenbewertung', 'Tape', 'dpa']
>> Bundesgerichtshof,Amazon,Verband Sozialer Wettbewerb,Kundenbewertung,Tape,dpa

Make sure you approve the answer if usefull.

answered Feb 20, 2020 at 9:50

Sundeep

2,5053 gold badges23 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wonka · Accepted Answer · 2020-02-20 09:50:06Z

1

Try this:

Article_Keyword = bs.find('div', {'class':'ListTags'})
aes_Article_Keyword  = Article_Keyword.find_all("a")

s_Article_Keyword = ", ".join([x.text for x in aes_Article_Keyword])

answered Feb 20, 2020 at 9:50

Wonka

1,9012 gold badges15 silver badges23 bronze badges

Comments

Manali Kagathara · Accepted Answer · 2020-02-20 10:10:43Z

1

try this

import requests
from bs4 import BeautifulSoup

url = 'https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980'
page = requests.get(url)
soup1 = BeautifulSoup(page.content, "lxml")

Article_Keyword = soup1.find('div',{'class':'ListTags'}).find_all("a")
Article_Keyword = ", ".join([keyword.text.strip() for keyword in Article_Keyword])

print(Article_Keyword)

edited Feb 20, 2020 at 10:10

answered Feb 20, 2020 at 9:48

Manali Kagathara

7615 silver badges11 bronze badges

3 Comments

Wonka Over a year ago

This will fails IF spaceblank in tag word

Manali Kagathara Over a year ago

no. it will not fail, that case does not happen here. a blank tag will not appear in the string.

Wonka Over a year ago

I means tag "Hello word" will be splited "Hello", "world" and desired is "one tag", "Hello World", "more tag"

Collectives™ on Stack Overflow

Fetching a list from a website using beautifulsoup in a dataframe column

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related