0

I scraped some table data from a website. The actual table on the website looks like this -

Actual Table

I used beautifulsoup to get this data which in this format -

[<td class="TableHeadingLeft" width="175">
                        Team
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Mat
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Won
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Lost
                    </td>,
 <td class="TableHeadingRight" width="35">
                        Tied
                    </td>,
 <td class="TableHeadingRight" width="35">
                        N/R
                    </td>,
 <td class="TableHeadingRight" width="45">
                        Points
                    </td>,
 <td class="TableHeadingRight" width="55">
                        Net R/R
                    </td>,
 <td class="TableHeadingRight" width="75">
                        For
                    </td>,
 <td class="TableHeadingRight" width="75">
                        Against
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=MIN">Mumbai Indians</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        9       
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        18      
                    </td>,
 <td align="right">
                        +0.421      
                    </td>,
 <td align="right">
                        2380/275.1      
                    </td>,
 <td align="right">
                        2282/277.2      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=CSK">Chennai Super Kings</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        9       
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        18      
                    </td>,
 <td align="right">
                        +0.131      
                    </td>,
 <td align="right">
                        2043/274.1      
                    </td>,
 <td align="right">
                        2012/274.5      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=DDV">Delhi Capitals</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        9       
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        18      
                    </td>,
 <td align="right">
                        +0.044      
                    </td>,
 <td align="right">
                        2207/272.5      
                    </td>,
 <td align="right">
                        2238/278.1      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=SUN">Sunrisers Hyderabad</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        6       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        12      
                    </td>,
 <td align="right">
                        +0.577      
                    </td>,
 <td align="right">
                        2288/269.2      
                    </td>,
 <td align="right">
                        2200/277.5      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=KKR">Kolkata Knight Riders</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        6       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        12      
                    </td>,
 <td align="right">
                        +0.028      
                    </td>,
 <td align="right">
                        2466/270.4      
                    </td>,
 <td align="right">
                        2419/266.2      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=KXI">Kings XI Punjab</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        6       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        12      
                    </td>,
 <td align="right">
                        -0.251      
                    </td>,
 <td align="right">
                        2429/276.3      
                    </td>,
 <td align="right">
                        2503/277.0      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=RRO">Rajasthan Royals</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        1       
                    </td>,
 <td align="right">
                        11      
                    </td>,
 <td align="right">
                        -0.449      
                    </td>,
 <td align="right">
                        2153/257.0      
                    </td>,
 <td align="right">
                        2192/248.2      
                    </td>,
 <td align="left">
 <a class="LinkOff" href="MatchList.asp?s=2019&amp;t=RCB">Royal Challengers Bangalore</a>
 </td>,
 <td align="right">     
                        14      
                    </td>,
 <td align="right">
                        5       
                    </td>,
 <td align="right">
                        8       
                    </td>,
 <td align="right">
                        0       
                    </td>,
 <td align="right">
                        1       
                    </td>,
 <td align="right">
                        11      
                    </td>,
 <td align="right">
                        -0.607      
                    </td>,
 <td align="right">
                        2146/258.4      
                    </td>,
 <td align="right">
                        2266/254.3      
                    </td>]

Now, I used a loop and some code to extract the data that I need.

for data in table_data.find_all('td'):
    print(''.join(data.text.split()))

The output of this -

Team
Mat
Won
Lost
Tied
N/R
Points
NetR/R
For
Against
MumbaiIndians
14
9
5
0
0
18
+0.421
2380/275.1
2282/277.2
ChennaiSuperKings
14
9
5
0
0
18
+0.131
2043/274.1
2012/274.5
DelhiCapitals
14
9
5
0
0
18
+0.044
2207/272.5
2238/278.1
SunrisersHyderabad
14
6
8
0
0
12
+0.577
2288/269.2
2200/277.5
KolkataKnightRiders
14
6
8
0
0
12
+0.028
2466/270.4
2419/266.2
KingsXIPunjab
14
6
8
0
0
12
-0.251
2429/276.3
2503/277.0
RajasthanRoyals
14
5
8
0
1
11
-0.449
2153/257.0
2192/248.2
RoyalChallengersBangalore
14
5
8
0
1
11
-0.607
2146/258.4
2266/254.3

Created an empty dataframe of the desired shape -

col_name = ['Team','Mat','Won','Lost','Tied','N/R','Points','NetR/R','For','Against']

import pandas as pd 
df = pd.DataFrame(data=None, columns=col_name)

Now, I couldn't able to understand, how to add these data to this dataframe.

3 Answers 3

1

There is a simple way to create a dataframe from list of list.

Code :

ll = []
for data in table_data.find_all('td'):
    ll.append(''.join(data.text.split()))  

n = 10
final = [ll[i:i + n] for i in range(0, len(ll), n)]
col = final [0]
df = pd.DataFrame(final) 
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df

ll is list of list from your table data.

final is split the ll with 10 columns as you mentioned in columns

The last two lines are to make the first row as column name because we just created a dataframe without giving any column names.

Sign up to request clarification or add additional context in comments.

Comments

0

Use pd.concat

import pandas as pd 

col_name = ['Team','Mat','Won','Lost','Tied','N/R','Points','NetR/R','For','Against']

# Main dataframe
df = pd.DataFrame(data=None, columns=col_name)

# df1 is the data frame that you want to concat in your original data frame

df = pd.concat([df, df1])

Comments

0

Once you have the table data as 1D list

table_data_list = [''.join(i.text.split() for i in table_data.find_all('td')]

Convert the data into 2D using numpy reshape

df = pd.DataFrame(np.array(table_data_list).reshape(-1,10))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.