I scraped some table data from a website. The actual table on the website looks like this -
I used beautifulsoup to get this data which in this format -
[<td class="TableHeadingLeft" width="175">
Team
</td>,
<td class="TableHeadingRight" width="35">
Mat
</td>,
<td class="TableHeadingRight" width="35">
Won
</td>,
<td class="TableHeadingRight" width="35">
Lost
</td>,
<td class="TableHeadingRight" width="35">
Tied
</td>,
<td class="TableHeadingRight" width="35">
N/R
</td>,
<td class="TableHeadingRight" width="45">
Points
</td>,
<td class="TableHeadingRight" width="55">
Net R/R
</td>,
<td class="TableHeadingRight" width="75">
For
</td>,
<td class="TableHeadingRight" width="75">
Against
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=MIN">Mumbai Indians</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
9
</td>,
<td align="right">
5
</td>,
<td align="right">
0
</td>,
<td align="right">
0
</td>,
<td align="right">
18
</td>,
<td align="right">
+0.421
</td>,
<td align="right">
2380/275.1
</td>,
<td align="right">
2282/277.2
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=CSK">Chennai Super Kings</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
9
</td>,
<td align="right">
5
</td>,
<td align="right">
0
</td>,
<td align="right">
0
</td>,
<td align="right">
18
</td>,
<td align="right">
+0.131
</td>,
<td align="right">
2043/274.1
</td>,
<td align="right">
2012/274.5
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=DDV">Delhi Capitals</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
9
</td>,
<td align="right">
5
</td>,
<td align="right">
0
</td>,
<td align="right">
0
</td>,
<td align="right">
18
</td>,
<td align="right">
+0.044
</td>,
<td align="right">
2207/272.5
</td>,
<td align="right">
2238/278.1
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=SUN">Sunrisers Hyderabad</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
6
</td>,
<td align="right">
8
</td>,
<td align="right">
0
</td>,
<td align="right">
0
</td>,
<td align="right">
12
</td>,
<td align="right">
+0.577
</td>,
<td align="right">
2288/269.2
</td>,
<td align="right">
2200/277.5
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=KKR">Kolkata Knight Riders</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
6
</td>,
<td align="right">
8
</td>,
<td align="right">
0
</td>,
<td align="right">
0
</td>,
<td align="right">
12
</td>,
<td align="right">
+0.028
</td>,
<td align="right">
2466/270.4
</td>,
<td align="right">
2419/266.2
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=KXI">Kings XI Punjab</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
6
</td>,
<td align="right">
8
</td>,
<td align="right">
0
</td>,
<td align="right">
0
</td>,
<td align="right">
12
</td>,
<td align="right">
-0.251
</td>,
<td align="right">
2429/276.3
</td>,
<td align="right">
2503/277.0
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=RRO">Rajasthan Royals</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
5
</td>,
<td align="right">
8
</td>,
<td align="right">
0
</td>,
<td align="right">
1
</td>,
<td align="right">
11
</td>,
<td align="right">
-0.449
</td>,
<td align="right">
2153/257.0
</td>,
<td align="right">
2192/248.2
</td>,
<td align="left">
<a class="LinkOff" href="MatchList.asp?s=2019&t=RCB">Royal Challengers Bangalore</a>
</td>,
<td align="right">
14
</td>,
<td align="right">
5
</td>,
<td align="right">
8
</td>,
<td align="right">
0
</td>,
<td align="right">
1
</td>,
<td align="right">
11
</td>,
<td align="right">
-0.607
</td>,
<td align="right">
2146/258.4
</td>,
<td align="right">
2266/254.3
</td>]
Now, I used a loop and some code to extract the data that I need.
for data in table_data.find_all('td'):
print(''.join(data.text.split()))
The output of this -
Team
Mat
Won
Lost
Tied
N/R
Points
NetR/R
For
Against
MumbaiIndians
14
9
5
0
0
18
+0.421
2380/275.1
2282/277.2
ChennaiSuperKings
14
9
5
0
0
18
+0.131
2043/274.1
2012/274.5
DelhiCapitals
14
9
5
0
0
18
+0.044
2207/272.5
2238/278.1
SunrisersHyderabad
14
6
8
0
0
12
+0.577
2288/269.2
2200/277.5
KolkataKnightRiders
14
6
8
0
0
12
+0.028
2466/270.4
2419/266.2
KingsXIPunjab
14
6
8
0
0
12
-0.251
2429/276.3
2503/277.0
RajasthanRoyals
14
5
8
0
1
11
-0.449
2153/257.0
2192/248.2
RoyalChallengersBangalore
14
5
8
0
1
11
-0.607
2146/258.4
2266/254.3
Created an empty dataframe of the desired shape -
col_name = ['Team','Mat','Won','Lost','Tied','N/R','Points','NetR/R','For','Against']
import pandas as pd
df = pd.DataFrame(data=None, columns=col_name)
Now, I couldn't able to understand, how to add these data to this dataframe.
