I am trying to access my school's intranet to web scrape it and retrieve the table with the homework I have to complete, I searched the web for any solutions but I couldn't find any. I will not provide the login credentials for obvious reasons, but i will provide the html data. Any help is great, thanks.
My code so far:
import requests
while True:
Post_Login_URL = 'http://parents.netherhall.org/'
Request_URL = 'https://parents.netherhall.org/parents/students/?admissionno=011161&page=homework'
username = input('What is your username? ')
password = input('What is your password? ')
payload = {
'username': username,
'password': password
}
with requests.Session() as session:
post = session.post(Post_Login_URL, data=payload)
r = session.get(Request_URL)
print(r.text)
the response I get:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML dir=ltr><HEAD><TITLE>The page cannot be displayed</TITLE>
<STYLE id=L_defaultr_1>A:link {
FONT: 8pt/11pt verdana; COLOR: #ff0000
}
A:visited {
FONT: 8pt/11pt verdana; COLOR: #4e4e4e
}
</STYLE>
<META content=NOINDEX name=ROBOTS>
<META http-equiv=Content-Type content="text-html; charset=UTF-8">
<META content="MSHTML 5.50.4522.1800" name=GENERATOR></HEAD>
<BODY bgColor=#ffffff>
<TABLE cellSpacing=5 cellPadding=3 width=410>
<TBODY>
<TR>
<TD id=L_defaultr_0 valign=middle align=left width=360>
<H1 id=L_defaultr_2 style="FONT: 13pt/15pt verdana; COLOR: #000000"><ID id=L_defaultr_3><!--Problem-->The page cannot be displayed
</ID></H1></TD></TR>
<TR>
<TD width=400 colSpan=2><FONT id=L_defaultr_4
style="FONT: 8pt/11pt verdana; COLOR: #000000"><ID id=L_defaultr_5><B>Explanation: </B>There is a problem with the page you are trying to reach and it cannot be displayed.</ID></FONT></TD></TR>
<TR>
<TD width=400 colSpan=2><FONT id=L_defaultr_6
style="FONT: 8pt/11pt verdana; COLOR: #000000">
<HR color=#c0c0c0 noShade>
<P id=L_defaultr_7><B>Try the following:</B></P>
<UL>
<LI id=L_defaultr_8><B>Refresh page:</B> Search for the page again by clicking the Refresh button. The timeout may have occurred due to Internet congestion.
<LI id=L_defaultr_9><B>Check spelling:</B> Check that you typed the Web page address correctly. The address may have been mistyped.
<LI id=L_defaultr_10><B>Access from a link:</B> If there is a link to the page you are looking for, try accessing the page from that link.
</UL>
<HR color=#c0c0c0 noShade>
<P id=L_defaultr_11>Technical Information (for support personnel)</P>
<UL>
<LI id=L_defaultr_12>Error Code: 401 Unauthorized. The server requires authorization to fulfill the request. Access to the Web server is denied. Contact the server administrator. (12209)
</UL></FONT></TD></TR></TBODY></TABLE></BODY></HTML>
https://parents.netherhall.org/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=39but when i tried it, it didn't work.requestsbutseleniumis very easy to use when logging into websites because it automates a browser. You can always run it in headless as well.