0

i want to use selenium to go into the url that i signed and click on the 1st link on the list and get text data.

병역법위반  [대법원 2018. 11. 1., 선고, 2016도10912, 전원합의체 판결]

this is html code for the link on that web page i have tried pretty much every method i can find on online. is it possible that this web page is somehow protected?

from selenium import webdriver
from bs4 import BeautifulSoup
# selenium webdriver chrome


driver = webdriver.Chrome("chromedriver.exe")

# "get url
driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")


elem = driver.find_elements_by_css_selector("""#viewHeightDiv > table > 
tbody > tr:nth-child(1) > td.s_tit > a""")
if len(elem):
    elem.click()

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
notices = soup.find('div', id='bodyContent')

for n in notices:
    print(n)

so from my code selenium opens up and goes to url and it does not click on what i want to. so the print data i get is not what i was looking for.

i want to know how to web crawl http://law.go.kr/precSc.do?tabMenuId=tab103&query=

maybe there is a way not using selenium? i pick selenium since this webs url is not fixed. last url that is fixed is http://law.go.kr/precSc.do?tabMenuId=tab103&query=

2
  • You don't need to click on a link to open a new page. Just find the link and make a new request. Commented Feb 19, 2019 at 11:48
  • link does not work Commented Feb 20, 2019 at 7:16

1 Answer 1

1

Here code with necessary waits to click on link and get text:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)

driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")

#Wait for visibility of the first link in viewHeightDiv. Necessary to get text.
elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#viewHeightDiv a")))
#Get first word of the link. Will be you used to check if page loaded by checking title of the text.
title = elem.text.strip().split(" ")[0]

elem.click()
#Wait for h2 to have title we get before.
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#viewwrapCenter h2"), title))

content = driver.find_element_by_css_selector("#viewwrapCenter").text
print(content)
Sign up to request clarification or add additional context in comments.

5 Comments

is it possible to save the data and go back to the original url and get 2nd link's data? and keep goes on?
You can try to get href from all a elements, than go through all. You can find example here
thx have another question... when im trying to save those text in to txt file. how should i return text crawl from web to str???
plus and since the web address is not just simple as example. like its not just page 1, 2, 3, ... it has licPrec199906 for each one. and when i tryto get into the url it only goes to the first page..for ex. law.go.kr/precSc.do?tabMenuId=tab67 this is the url i started with and after clicking 1st link url is law.go.kr/precSc.do?tabMenuId=tab67#licPrec204188 but when i try type this url it goes to the url that i started with
@KwanheeHwang yes you right, but it's possible. You need to create new question for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.