0

I am new to Python.I want to scrape the iso code with the state list of the country from the wiki website. Here's the Link

Required Output:

mapState={'Alabama': 'US-AL', 'Alaska': 'US-AK',.....,'Wyoming':'US-WY}'

Here's the Code i tried:

import requests
from bs4 import BeautifulSoup
def crawl_wiki():
    url = 'https://en.wikipedia.org/wiki/ISO_3166-2:US'
    source_code = requests.get(url)
    plain_text = source_code.text
    print(plain_text)

crawl_wiki()

I have got the text from the site. But i don't know how to get the dict of state with code. Help me with some solutions.

4 Answers 4

1
import pandas as pd

df = pd.read_html(
    "https://en.wikipedia.org/wiki/ISO_3166-2:US")[0]
result = df['Subdivision name (en)'], df['Code']
d = pd.DataFrame(result)
d = d.T
newd = d.set_index('Subdivision name (en)', 'Code').to_dict()
print(newd['Code'])

Output:

{'Alabama': 'US-AL', 'Alaska': 'US-AK', 'Arizona': 'US-AZ', 'Arkansas': 'US-AR', 'California': 'US-CA', 'Colorado': 'US-CO', 'Connecticut': 'US-CT', 'Delaware': 'US-DE', 'Florida': 'US-FL', 'Georgia': 'US-GA', 'Hawaii': 'US-HI', 'Idaho': 'US-ID', 'Illinois': 'US-IL', 'Indiana': 'US-IN', 'Iowa': 'US-IA', 'Kansas': 'US-KS', 'Kentucky': 'US-KY', 'Louisiana': 'US-LA', 'Maine': 'US-ME', 'Maryland': 'US-MD', 'Massachusetts': 'US-MA', 'Michigan': 'US-MI', 'Minnesota': 'US-MN', 'Mississippi': 'US-MS', 'Missouri': 'US-MO', 'Montana': 'US-MT', 'Nebraska': 'US-NE', 'Nevada': 'US-NV', 'New Hampshire': 'US-NH', 'New Jersey': 'US-NJ', 'New Mexico': 'US-NM', 'New York': 'US-NY', 'North Carolina': 'US-NC', 'North Dakota': 'US-ND', 'Ohio': 'US-OH', 'Oklahoma': 'US-OK', 'Oregon': 'US-OR', 'Pennsylvania': 'US-PA', 'Rhode Island': 'US-RI', 'South Carolina': 'US-SC', 'South Dakota': 'US-SD', 'Tennessee': 'US-TN', 'Texas': 'US-TX', 'Utah': 'US-UT', 'Vermont': 'US-VT', 'Virginia': 'US-VA', 'Washington': 'US-WA', 'West Virginia': 'US-WV', 'Wisconsin': 'US-WI', 'Wyoming': 'US-WY', 'District of Columbia': 'US-DC', 'American Samoa': 'US-AS', 'Guam': 'US-GU', 'Northern Mariana Islands': 'US-MP', 'Puerto Rico': 'US-PR', 'United States Minor Outlying Islands': 'US-UM', 'Virgin Islands, U.S.': 'US-VI'}
Sign up to request clarification or add additional context in comments.

Comments

0

try this:

import bs4
import requests

response = requests.get('https://en.wikipedia.org/wiki/ISO_3166-2:US')
html = response.content.decode('utf-8')

soup = bs4.BeautifulSoup(html, "lxml")
code_list = soup.select("#mw-content-text > div > table:nth-child(11) > tbody > tr > td:nth-child(1) > span")
name_list = soup.select("#mw-content-text > div > table:nth-child(11) > tbody > tr > td:nth-child(2) > a")


mapState = {}
## mapState={'Alabama': 'US-AL', 'Alaska': 'US-AK',.....,'Wyoming':'US-WY}'

for i in range(len(code_list)):
    mapState[code_list[i].string] = name_list[i].string


print(mapState)

Comments

0

This is a SimplifiedDoc scheme, similar to BeautifulSoup

import requests
from simplified_scrapy.simplified_doc import SimplifiedDoc 
url = 'https://en.wikipedia.org/wiki/ISO_3166-2:US'
response = requests.get(url)
doc = SimplifiedDoc(response.text,start='Subdivision category',end='</table>')
datas = [tr.tds for tr in doc.trs]
mapState = {}
for tds in datas:
  mapState[tds[1].a.text]=tds[0].text

Comments

0

try pandas read_html -

https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html

then extract pandas df to dict

example -

import pandas as pd

df = pd.read_html("https://en.wikipedia.org/wiki/ISO_3166-2:US")[0].to_dict()
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.