12

I am struggling to find a method in python which allows you to read data in a currently used web browser. Effectively, I am trying to download a massive dataframe of data on a locally controlled company webpage and implement it into a dataframe. The issue is that the website has a fairly complex authentication token process which I have not been able to bypass using Selenium using a slew of webdrivers, Requests, urllib, and cookielib using a variety of user parameters. I have given up on this front entirely as I am almost positive that there is more to the authentication process than can be achieved easily with these libraries.

However, I did manage to bypass the required tokenization process when I quickly tested opening a new tab in a current browser which was already logged in using WebBrowser. Classically, WebBrowser does not offer a read function meaning that even though the page can be opened the data on the page cannot be read into a pandas dataframe. This got me thinking I could use Win32com, open a browser, login, then run the rest of the script, but again, there is no general read ability of the dispatch for internet explorer meaning I can't send the information I want to pandas. I'm stumped. Any ideas?

I could acquire the necessary authentication token scripts, but I am sure that it would take a week or two before anything would happen on that front. I would obviously prefer to get something in the mean time while I wait for the actual auth scripts from the company.

Update: I received authentication tokens from the company, however it requires using a python package on another server I do not have access too, mostly because its an oddity that I am using Python in my department. Thus the above still applies - need a method for reading and manipulating an open browser.

6
  • Selenium could work with the existing browser window on your desktop, you could give it another try. Commented Nov 6, 2017 at 22:45
  • github.com/seleniumhq/selenium-google-code-issue-archive/issues/… Commented Nov 7, 2017 at 14:33
  • If I understand the above right, the above desired functionality was decided to never be incorporated. There are some solutions to the issue in the comments, but they are all in other programming languages. Commented Nov 7, 2017 at 14:49
  • 2
    Is it really necessary to attach to the running browser? You could start the browser using selenium, then authenticate manually (your script can be waiting for the page that appears after login) and once login is finished, the script will open and read the page you need. Commented Nov 9, 2017 at 23:44
  • 1
    I am not sure how experienced you are with network traffic, but in general you could do a "man in the middle" attack between your browser and the server, e.g. with burp (portswigger.net/burp). You halt the last request before you obtain the data package and just copy the request into your python request. If you need to automate things, there is also a burp python api. But as I am writing this, I think it is also rather complicated. :P Commented Nov 10, 2017 at 18:59

1 Answer 1

10
+50

Step-by-step

1) Start browser with Selenium.

2) Script should start waiting for certain element that inform you that you got required page and logged in.

3) You can use this new browser window to login to page manually.

4) Script detects that you are on required page and logged in.

5) Script processes page the way you like.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# start webdriver (opens Chrome in new window)
chrome = webdriver.Chrome()

# initialize waiter with maximum 300 seconds to wait.
waiter = WebDriverWait(chrome , 300)

# Will wait for appear of #logout element.
# I assume it shows that you are logged in.
wait.until(EC.presence_of_element_located(By.ID, "logout"))

# Extract data etc.

It might be easier if you use your Chrome user's profile. This way you may have previous session continued so you will not need to do any login actions.

options = webdriver.ChromeOptions() 
options.add_argument("user-data-dir=FULL_PATH__TO_PROFILE")
chrome = webdriver.Chrome(chrome_options=options)
chrome.get("https://your_page_here")
Sign up to request clarification or add additional context in comments.

1 Comment

Please fix my English if possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.