1

This is my code for which I am getting an IndexError.

# importing the required libraries
import pandas as pd

# Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium 
from folium import plugins

# Manipulating the default plot size
plt.rcParams['figure.figsize'] = 10, 12

# Disable warnings 
import warnings
warnings.filterwarnings('ignore')
# for date and time opeations
from datetime import datetime
# for file and folder operations
import os
# for regular expression opeations
import re
# for listing files in a folder
import glob
# for getting web contents
import requests 
# for scraping web contents
from bs4 import BeautifulSoup
# get data

# link at which web data recides
link = 'https://www.mohfw.gov.in/'
# get web data
req = requests.get(link)
# parse web data
soup = BeautifulSoup(req.content, "html.parser")
# find the table
# ==============
# our target table is the last table in the page

# get the table head
# table head may contain the column names, titles, subtitles
thead = soup.find_all('thead')[-1]
# print(thead)

# get all the rows in table head
# it usually have only one row, which has the column names
head = thead.find_all('tr')
# print(head)

# get the table tbody
# it contains the contents
tbody = soup.find_all('tbody')[-1]
# print(tbody)

# get all the rows in table body
# each row is each state's entry
body = tbody.find_all('tr')
# print(body)
IndexError

Traceback (most recent call last)
<ipython-input-7-eda41c6e195c> in <module>
     15 # get the table tbody
     16 # it contains the contents
---> 17 tbody = soup.find_all('tbody')[-1]
     18 # print(tbody)
     19 

IndexError: list index out of range
5
  • [-1] This is getting the element of index -1, which doesn't make sense for a list, hence the error. it could be that this is supposed to be [::-1], which is slice notation to reverse the list order. Commented Jul 28, 2020 at 6:02
  • You want to extract table information from site? Commented Jul 28, 2020 at 6:03
  • 2
    @HymnsForDisco The [-1] index does work in Python - it gets the last element in the list :) It's very useful, actually. Where the OP may be going wrong is the case in which there are no elements in the list. In this case, there is no "last element", so Python will throw an error. Commented Jul 28, 2020 at 6:08
  • @GrantSchulte Ah good point. Seems some time away in more strict languages has made me forget some of the tricks of Python Commented Jul 28, 2020 at 6:13
  • After I fixed the issue [:-1], I got another error in this line::: body = tbody.find_all('tr') AttributeError: 'list' object has no attribute 'find_all' Commented Jul 31, 2020 at 3:08

2 Answers 2

2

This error occurs due to the list being empty. when uncertain about list being empty do a check. For a list l:-

if len(l) != 0:
    k = l[-1]
else:
    k = None
Sign up to request clarification or add additional context in comments.

Comments

0

When you extract the table there's no tbody tag in the table.

When you analyse the website properly you can find that the website makes an ajax call to get the table info. The following script saves the json data to a file. The beauty is you don't need to pass anything to get this data. This always returns the latest data.

import requests, json

url = 'https://www.mohfw.gov.in/data/datanew.json'
res = requests.get(url)

with open("data.json", "w") as f:
    json.dump(res.json(), f)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.