0

I need to scrap information from a page (Edit: removed NSFW link) page. Before entering the page, there's a button I should click to be able to get the page itself. Im using Python 2.7.10 and selenium, with PhantomJS 1.9.8.

Heres my code:

#!/bin/env/python
# -*- coding: cp1250 -*-
import urllib
import urllib2
import time
from bs4 import BeautifulSoup
import sys, os
from selenium.webdriver.support.wait import WebDriverWait
from selenium import webdriver

reload(sys)
sys.setdefaultencoding("cp1250")

base_url = "https://www.24dolores.pl/"
waiting_time = 20

def get_browser():
    return webdriver.PhantomJS("phantomjs.exe")

def download_page_src(url):
    try:
        browser = get_browser()
        wait = WebDriverWait(browser, 30)
        browser.get(url)
        time.sleep(5)
        close = browser.find_element_by_class_name('.enter_pl')
        close.click()
        html = browser.page_source
        browser.close()
        return html
    except urllib2.HTTPError, error:
        return error
    except urllib2.URLError, error:
        return error
    except Exception, error:
        return error

page = download_page_src(base_url)
print page

And the error it gives:

C:\Documents and Settings\student>cd C:\Documents and Settings\student\Pulpit

C:\Documents and Settings\student\Pulpit>python test.py
Message: {"errorMessage":"Unable to find element with class name '.enter_pl'","r
equest":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Co
nnection":"close","Content-Length":"98","Content-Type":"application/json;charset
=UTF-8","Host":"127.0.0.1:1708","User-Agent":"Python-urllib/2.7"},"httpVersion":
"1.1","method":"POST","post":"{\"using\": \"class name\", \"sessionId\": \"1d0a4
d60-add2-11e5-84a6-b5f372943a74\", \"value\": \".enter_pl\"}","url":"/element","
urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/ele
ment","relative":"/element","port":"","host":"","password":"","user":"","userInf
o":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["
element"]},"urlOriginal":"/session/1d0a4d60-add2-11e5-84a6-b5f372943a74/element"
}}
Screenshot: available via screen
1
  • Srsly? NSFW link? It's just the page I need the data from ... Why downvote? Commented Dec 29, 2015 at 19:59

1 Answer 1

1

Just remove the dot of the class name. If you want to keep using css selector you do:

driver.find_element_by_css_selector(".enter_pl")
Sign up to request clarification or add additional context in comments.

4 Comments

The error is the same, btw, theres no such method like get_element_by_css_selector, its find_element_by_css_selector
Find instead of get, my bad
My way of work in those cases is to try simulate the case on my browser. Open inspector and first run, $('.enter_pl') .
I know that. It works with Firefox: pastie.org/private/b9opevxrhjvpiihidikha but the problem is I need it to work with PhantomJS.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.