1

I'm trying to get abbreviations of US states but this code:

from bs4 import BeautifulSoup
from urllib.request import urlopen
url='https://simple.wikipedia.org/wiki/List_of_U.S._states'
web=urlopen(url)
source=BeautifulSoup(web, 'html.parser')
table=source.find('table', {'class': 'wikitable sortable jquery-tablesorter'})
abbs=table.find_all('b')
print(abbs.get_text())

returns AttributeError: 'Nonetype' object has no attribute 'find_all'. What's the problem of my code?

4
  • 2
    source.find is returning None, which has no attribute find_all Commented Nov 19, 2017 at 17:58
  • It can't find the element 'wikitable sortable jquery-tablesorter'. Commented Nov 19, 2017 at 18:09
  • It's called 'wikitable sortable' in the HTML. Commented Nov 19, 2017 at 18:09
  • @Roy I think my answer will give you what you are looking for. Commented Nov 19, 2017 at 18:15

3 Answers 3

1

As Patrick suggested,

source.first() returns only the first element.

Source code of first() method for the reference:

def find(self, name=None, attrs={}, recursive=True, text=None, **kwargs):
    """Return only the first child of this Tag matching the given criteria."""
    r = None
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
    if l:
        r = l[0]
    return r
findChild = find

After extracting table it class name was wikitable sortable.
So as per above code, it was returning None.

So you may want to change your code as...

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states'
web = urlopen(url)
source = BeautifulSoup(web, 'html.parser')

table = source.find('table', class_='wikitable')
abbs = table.find_all('b')

abbs_list = [i.get_text().strip() for i in abbs]
print(abbs_list)

I hope it'll answer your question. :)

Sign up to request clarification or add additional context in comments.

1 Comment

Solved the problem. Thanks a lot!
1

Here you go.

I changed the class in source.find to 'wikitable sortable'. Also, the method abbs.get_text() gave me an error, so I just used a generator function to get the text you wanted.

from bs4 import BeautifulSoup
from urllib.request import urlopen

web = urlopen('https://simple.wikipedia.org/wiki/List_of_U.S._states')
source = BeautifulSoup(web, 'lxml')
table = source.find(class_='wikitable sortable').find_all('b')
b_arr = '\n'.join([x.text for x in table])
print(b_arr)

Partial Output:

AL
AK
AZ
AR
CA
CO

1 Comment

There is no need to use string replace, beautiful soup gives you methods to extract text using ele.text.strip()
0

As suggested in the comments the HTML at the url doesn't have a table with the class

'wikitable sortable jquery-tablesorter'

But the class is actually

'wikitable sortable'

Also once you apply find_all, it returns a list containing all tags so you can't directly apply get_text() to it. You can use list comprehension to strip out the text for each element in the list. Here's the code which will work for your problem

from bs4 import BeautifulSoup
from urllib.request import urlopen
url='https://simple.wikipedia.org/wiki/List_of_U.S._states'
web=urlopen(url)
source=BeautifulSoup(web, 'html.parser')
table=source.find('table', {'class': 'wikitable sortable'})
abbs=table.find_all('b')
values = [ele.text.strip() for ele in abbs]
print(values)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.