6

I want to delete all rows after the row containing the string "End of the 4th Quarter". Currently, this is row 474 but it will change depending on the game.

from bs4 import BeautifulSoup
import requests
import pandas as pd
import re

url = "http://www.espn.com/nba/playbyplay?gameId=400900395"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,"html.parser")

data_rows = soup.findAll("tr")[4:]

play_data = []
for i in range(len(data_rows)):
    play_row = []

    for td in data_rows[i].findAll('td'):
        play_row.append(td.getText())

    play_data.append(play_row)

df = pd.DataFrame(play_data)

df.to_html("pbp_data")

3 Answers 3

7

Here is how I would tackle it:

ur_row = your_df.ix[your_df['Column_Name_Here']=='End of the 4th Quarter'].index.tolist()

ur_row is getting the index number of the row that meets the condition. Then we use slicing to get everythin up to that row. (The +1 is to capture the row including "End of 4th Quarter")

df.iloc[:ur_row[0]+1]

Hope this is simple to follow. I will gladly explain more if need be!

Sign up to request clarification or add additional context in comments.

Comments

4

If you are sure there is always such a string somewhere in your data frame, you can use idxmax() to find out the corresponding index and then take all the rows before the index with loc:

df.loc[:(df == 'End of the 4th Quarter').any(1).idxmax()]

Here is a few lines at the end:

df.loc[:(df == 'End of the 4th Quarter').any(1).idxmax()].tail()

enter image description here

1 Comment

If you know which column to look the string for, you can ditch any: df.loc[:(df['myColumn'] == 'End of the 4th Quarter').idxmax()]. And if you want to remove the row with the first occurance of the string, do df.loc[:(df['myColumn'] == 'End of the 4th Quarter').idxmax()-1]. Added this since it took me some time to figure out the command line :)
1

Identify the index of the row with:

row = df[df['Column Name'] == 'End of the 4th quarter'].index.tolist()[0]

And then keep only the rows up to that row with:

df = df.iloc[:row-1]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.