0

I have been able to scrape a table from a website that requires credentials and I came to the point that I got the DataFrame in weird The code lines as for this part

soup = BeautifulSoup(res_count.content, 'lxml')
df = pd.DataFrame(soup.select('#ctl00_ContentPlaceHolder1_GridView2 tr'))

print(df)

the output is surrounded with a lot of brackets

    0                                                  1          2  \
0  \n                                            [[[ ]]]  [[[كود]]]   
1  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[1]]]   
2  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[2]]]   
3  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[3]]]   
4  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[4]]]   
5  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[5]]]   
6  \n  [[[<input onclick="javascript:__doPostBack('ct...    [[[6]]]   

                    3                    4   5  
0  [[[الصف الدراسى]]]  [[[العــــــــدد]]]  \n  
1    [[[الصف الأول]]]             [[[66]]]  \n  
2   [[[الصف الثانى]]]             [[[69]]]  \n  
3   [[[الصف الثالث]]]             [[[67]]]  \n  
4   [[[الصف الرابع]]]             [[[59]]]  \n  
5   [[[الصف الخامس]]]             [[[51]]]  \n  
6   [[[الصف السادس]]]             [[[52]]]  \n  

How can I drop the first two columns and as for the rest of columns I need to get the text without all those brackets [[[....]]]

** I tried the following lines

rows = soup.select('#ctl00_ContentPlaceHolder1_GridView2 tr')
print(rows)

and I got the result like that

[<tr bgcolor="#B9C989">
<th scope="col"><font color="Navy" face="Arial"><b> </b></font></th><th scope="col"><font color="Navy" face="Arial"><b>كود</b></font></th><th scope="col"><font color="Navy" face="Arial"><b>الصف الدراسى</b></font></th><th scope="col"><font color="Navy" face="Arial"><b>العــــــــدد</b></font></th>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$0')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>1</b></font></td><td><font color="#333333"><b>الصف الأول</b></font></td><td><font color="#333333"><b>66</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$1')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>2</b></font></td><td><font color="#333333"><b>الصف الثانى</b></font></td><td><font color="#333333"><b>69</b></font></td>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$2')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>3</b></font></td><td><font color="#333333"><b>الصف الثالث</b></font></td><td><font color="#333333"><b>67</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$3')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>4</b></font></td><td><font color="#333333"><b>الصف الرابع</b></font></td><td><font color="#333333"><b>59</b></font></td>
</tr>, <tr bgcolor="#DCE0D0">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$4')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>5</b></font></td><td><font color="#333333"><b>الصف الخامس</b></font></td><td><font color="#333333"><b>51</b></font></td>
</tr>, <tr bgcolor="White">
<td><font color="#333333"><b><input onclick="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView2','Select$5')" type="button" value="اختر"/></b></font></td><td><font color="#333333"><b>6</b></font></td><td><font color="#333333"><b>الصف السادس</b></font></td><td><font color="#333333"><b>52</b></font></td>
</tr>]

2 Answers 2

2

I need to get the text without all those brackets [[[....]]]

You might use str.strip for that. Consider following example:

import pandas as pd
df = pd.DataFrame({'A':['[[[1]]]','[[[2]]]','[[[3]]]']})
df['A'] = df['A'].str.strip('[]')
print(df)

Output:

   A
0  1
1  2
2  3
Sign up to request clarification or add additional context in comments.

8 Comments

Great. How can I drop the unneeded columns? .. And as for the example you posted deals only with one column .. How can I do the same with all the columns within the DataFrame?
@YasserKhalil use pandas.DataFrame.drop with axis=1 and labels being list of column names to remove
Thanks a lot. I could do the drop point by using df = df.drop([df.columns[0] , df.columns[1], df.columns[5]] , axis='columns'). How can I strip all the columns in one shot.
@YasserKhalil you might use pandas.DataFrame.apply following way df.apply(lambda x:x.str.strip('[]'),axis=1) (as we need function to deliver use lambda keyword to create anonymous function)
I tried this line as you suggested df.apply(lambda x:x.str.strip('[]'),axis=1) but the same result !!!
|
0

Thanks a lot. I have searched a lot till I found a suitable solution that I could modify to get what I need

soup = BeautifulSoup(res_count.content, 'lxml')
table = soup.find_all('table', id='ctl00_ContentPlaceHolder1_GridView2')
df = pd.read_html(str(table))[0]
df = df.drop([df.columns[0]] ,  axis='columns')
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.