web crawl problem using selenium to click on the link

Question

i want to use selenium to go into the url that i signed and click on the 1st link on the list and get text data.

병역법위반 [대법원 2018. 11. 1., 선고, 2016도10912, 전원합의체 판결]

this is html code for the link on that web page i have tried pretty much every method i can find on online. is it possible that this web page is somehow protected?

from selenium import webdriver
from bs4 import BeautifulSoup
# selenium webdriver chrome


driver = webdriver.Chrome("chromedriver.exe")

# "get url
driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")


elem = driver.find_elements_by_css_selector("""#viewHeightDiv > table > 
tbody > tr:nth-child(1) > td.s_tit > a""")
if len(elem):
    elem.click()

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
notices = soup.find('div', id='bodyContent')

for n in notices:
    print(n)

so from my code selenium opens up and goes to url and it does not click on what i want to. so the print data i get is not what i was looking for.

i want to know how to web crawl http://law.go.kr/precSc.do?tabMenuId=tab103&query=

maybe there is a way not using selenium? i pick selenium since this webs url is not fixed. last url that is fixed is http://law.go.kr/precSc.do?tabMenuId=tab103&query=

You don't need to click on a link to open a new page. Just find the link and make a new request. — Igor Dragushhak
– Igor Dragushhak, Commented Feb 19, 2019 at 11:48

Sers · Accepted Answer · 2019-02-19 12:12:34Z

1

Here code with necessary waits to click on link and get text:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)

driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")

#Wait for visibility of the first link in viewHeightDiv. Necessary to get text.
elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#viewHeightDiv a")))
#Get first word of the link. Will be you used to check if page loaded by checking title of the text.
title = elem.text.strip().split(" ")[0]

elem.click()
#Wait for h2 to have title we get before.
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#viewwrapCenter h2"), title))

content = driver.find_element_by_css_selector("#viewwrapCenter").text
print(content)

answered Feb 19, 2019 at 12:12

Sers

12.3k2 gold badges14 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Kwanhee Hwang Over a year ago

is it possible to save the data and go back to the original url and get 2nd link's data? and keep goes on?

Sers Over a year ago

You can try to get href from all a elements, than go through all. You can find example here

Kwanhee Hwang Over a year ago

thx have another question... when im trying to save those text in to txt file. how should i return text crawl from web to str???

Kwanhee Hwang Over a year ago

plus and since the web address is not just simple as example. like its not just page 1, 2, 3, ... it has licPrec199906 for each one. and when i tryto get into the url it only goes to the first page..for ex. law.go.kr/precSc.do?tabMenuId=tab67 this is the url i started with and after clicking 1st link url is law.go.kr/precSc.do?tabMenuId=tab67#licPrec204188 but when i try type this url it goes to the url that i started with

Sers Over a year ago

@KwanheeHwang yes you right, but it's possible. You need to create new question for that.

Collectives™ on Stack Overflow

web crawl problem using selenium to click on the link

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related