0

So I made a list of elements from a HTML-Page and counted the frequency of these elements. But I just need some specific elements like "bb" and "nw". So I don't know what position they'll have in the list and I'm not sure how to seperate them from the other elements.

This is my code so far:

from bs4 import BeautifulSoup
import urllib2
import re
import operator
from collections import Counter
from string import punctuation

source_code = urllib2.urlopen('https://de.wikipedia.org/wiki/Liste_von_Angriffen_auf_Fl%C3%BCchtlinge_und_Fl%C3%BCchtlingsunterk%C3%BCnfte_in_Deutschland/bis_2014')
html = source_code.read()
soup = BeautifulSoup(html, "html.parser")

text = (''.join(s.findAll(text=True))for s in soup.findAll('a'))

c = Counter((x.rstrip(punctuation).lower() for y in text for x in y.split()))

bb,nw=operator.itemgetter(1,2)(c.most_common())
print(bb,nw)

Thank you for your help and any hints.

1
  • 1
    What do you mean by you need only specific elements? Do you mean that you need their frequency? Commented Mar 28, 2016 at 20:11

1 Answer 1

2

You could use a filter:

relevant_items = ('bb', 'nw')
items = filter(lambda x: x[0] in relevant_items, c.most_common())

Alternatively, you can already filter in the comprehension:

c = Counter((x.rstrip(punctuation).lower() for y in text for x in y.split() if x in relevant_items))
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot. This was exactly what I was looking vor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.