4

I'd like to know the number of blocked trackers detected by Ublock Origin using Python (running on linux server, so no GUI) and Selenium (with firefox driver). I don't necessarly need to really block them but i need to know how much there are.

Ublock Origin has a logger (https://github.com/gorhill/uBlock/wiki/The-logger#settings-dialog)) which i'd like to scrap.

This logger is available through an url like this: moz-extension://fc469b55-3182-4104-a95c-6b0b4f87cf0f/logger-ui.html#_ where the part in italic is the UUID of Ublock Origin Addon.

In this logger, for each entry, there is a div with class set to "logEntry" (yellow oblong in the screenshot below), and i'd like to get the datas in the green oblong: enter image description here

So far, i got this:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options as FirefoxOptions
browser_options = FirefoxOptions()
browser_options.headless = True
              
#   Activate add on
str_ublock_extension_path = "/usr/local/bin/uBlock0_1.45.3b10.firefox.signed.xpi"
browser = webdriver.Firefox(executable_path='/usr/loca/bin/geckodriver',options=browser_options)        
str_id  = browser.install_addon(str_ublock_extension_path)
        
#   Getting the UUID which is new each time the script is launched
profile_path = browser.capabilities['moz:profile']    
id_extension_firefox = "[email protected]"
with open('{}/prefs.js'.format(profile_path), 'r') as file_prefs:
     lines = file_prefs.readlines()
     for line in lines:
     if 'extensions.webextensions.uuids' in line:
         sublines = line.split(',')
         for subline in sublines:
             if id_extension_firefox in subline:
                internal_uuid = subline.split(':')[1][2:38]
                                    
        str_uoo_panel_url = "moz-extension://" + internal_uuid + "/logger-ui.html#_"
        ubo_logger = browser.get(str_uoo_panel_url)
        ubo_logger_log_entries = ubo_logger.find_element(By.CLASS_NAME, "logEntry")
        
        for log_entrie in ubo_logger_log_entries:
            print(log_entrie.text)
    

Using this "weird" url with moz-extension:// seems to work considering that print(browser.page_source) will display some relevant html code.

Problem: ubo_logger.find_element(By.CLASS_NAME, "logEntry") got nothing. What did i did wrong?

1 Answer 1

3

I found this to work:

parent = driver.find_element(by=By.XPATH, value='//*[@id="vwContent"]')
children = parent.find_elements(by=By.XPATH, value='./child::*')

for child in children:
    attributes = (child.find_element(by=By.XPATH, value='./child::*')).find_elements(by=By.XPATH, value='./child::*')
    print(attributes[4].text)

You could then also do:

if attributes[4].text.isdigit():
    result = int(attributes[4].text)

This converts the resulting text into an int.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Teddy. After running some tests, it appears that find_element(by=By.XPATH, value = is incorrect. It should be parent.find_elements_by_xpath('./child::*')
Teddy solution didn't work at all. Lots of errors here and there. First, you can't loop through children (see there: stackoverflow.com/questions/39356818/…).
Well that depends on your version. find_elements(by=By.XPATH, value='./child::*') for newer versions and find_elements_by_xpath('./child::*') for older versions.
@8oris what kind of errors are you talking about?
children = parent[0].find_elements_by_xpath('./child::*') instead of children = parent.find_elements(by=By.XPATH, value='./child::*')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.