0

With reference to this post, I got the solution from @DebanjanB, but however I'm unable to use that solution for all my PRODUCT TYPE, it seems working only for Acrylics and Coal Tar. How can I use It for all the PRODUCT TYPE

This is the solution

1) print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

But When I use for

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

This doesn't works

Any suggestion on how this could work.

Thanks

5
  • I know I mention this in my answer on the other post but all the data is loaded when the page is first loaded. If you just use requests you will be easily be able to scrape all the data within seconds. I would at least try my code from the last post, you will see that all data shows up. Commented Jun 4, 2019 at 16:57
  • @antfuentes87 I get your point of using request for extracting all product line without using selenium, but at the end I want to map all those product list against which PRODUCT TYPE the are, so for that reason I first click a element under PRODUCT TYPE dropdown which acts nothing but like a filter for it Commented Jun 5, 2019 at 4:10
  • Oh ok, I did not get that part. Well that is super easy. On the li with the class topLevel it has a data-types attribute which tells you the type. You can easily just add that right into the dictionary (look at my answer on the other question). It can still be done without selenium and using requests only. Like I said ALL data is in the HTML on the request. Commented Jun 5, 2019 at 4:22
  • Yes now, I'm able to extract that as well from the the link..Thanks for making me use of request rather than selenium Commented Jun 5, 2019 at 4:34
  • No problem, glad I could make another person see the light :) Commented Jun 5, 2019 at 4:36

3 Answers 3

2

I have tried with following code and it returns me product type you are after.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.get("http://www.carboline.com/products/")
driver.maximize_window()
driver.find_element_by_css_selector('a.close-privacy-cookie.acceptButton').click()
element=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"h5#Typeh5 span")))
element.click()
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//div[@aria-labelledby='Typeh5']//ul[@id='Type']//li//label[contains(.,'Alkyds')]"))).click()
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.XPATH, "//ul[@id='productList']//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

Output:

['Carbocoat 115', 'Carbocoat 115 VOC', 'Carbocoat 116', 'Carbocoat 140', 'Carbocoat 150 Universal Primer', 'Carbocoat 153', 'Carbocoat 2600', 'Carbocoat 2900', 'Carbocoat 2901', 'Carbocoat 30', 'Carbocoat 45 Industrial Enamel', 'Carbocoat 56', 'Carbocoat 70', 'Carbocoat 8215', 'Carbocoat 8215 Non-Skid', 'Carbocoat 8215 VOC', 'Carbocoat 8216 Non-Skid', 'Carbocoat 8225', 'Carbocoat 8229 Non-Lift Primer', 'Carbocoat 8239', 'Carbocoat 8245', 'Carbocoat 8259 WR', 'Carbocoat 8287 WR', 'Carbocoat OEM Universal Primer']
Sign up to request clarification or add additional context in comments.

2 Comments

The PRODUCT TYPE is not required. kindly refer the link to my previous post stackoverflow.com/questions/56439472/…
@deepesh : Updated the code with relevant click(Alkyds)
1

Does this get what you need?

import pandas as pd
from bs4 import Beautifulsoup
import requests

response = requests.get('http://www.carboline.com/products/')
soup = BeautifulSoup(response.text, 'html.parser')

products = soup.find('ul', {'id':'productList'})
lists = products.find_all('li',{'class':'topLevel'})

results = pd.DataFrame()
for each in lists:
    a = each.find('a')
    text = a.text
    href = a['href']
    results = results.append(pd.DataFrame([[text, href]], columns = ['product_type', 'href'])).reset_index(drop=True)

Output:

print(results)
                       product_type                                              href
0                  A/D Firefilm III              /products/product-details/?prod=35AD
1                A/D Firefilm III C              /products/product-details/?prod=48AD
2                  A/D TC-55 SEALER              /products/product-details/?prod=30AD
3                  Accelerator A-20              /products/product-details/?prod=50AD
4                    Acrilast Caulk              /products/product-details/?prod=0177
5         Add-2 Mildewcide Additive              /products/product-details/?prod=0658
6                      Additive 101              /products/product-details/?prod=P262
7                       Additive 47              /products/product-details/?prod=0547
8                     Additive 8504              /products/product-details/?prod=8504
9                     Additive 8505              /products/product-details/?prod=8505
10                    Additive 8506              /products/product-details/?prod=8506
11                    Additive 8509              /products/product-details/?prod=8509
12                Bitumastic 300 LH              /products/product-details/?prod=0168
13                 Bitumastic 300 M  /products/product-details/?prod=0165&global=true
14             Bitumastic 300 M COE              /products/product-details/?prod=0391
15                    Bitumastic 50              /products/product-details/?prod=0025
16                    Carbocoat 115              /products/product-details/?prod=0801
17                Carbocoat 115 VOC              /products/product-details/?prod=206F
18                    Carbocoat 116              /products/product-details/?prod=0295
19                    Carbocoat 140              /products/product-details/?prod=228F
20   Carbocoat 150 Universal Primer  /products/product-details/?prod=0808&global=true
21                    Carbocoat 153              /products/product-details/?prod=0632
22                   Carbocoat 2600              /products/product-details/?prod=0005
23                   Carbocoat 2900              /products/product-details/?prod=0010
24                   Carbocoat 2901              /products/product-details/?prod=0012
25                     Carbocoat 30              /products/product-details/?prod=P483
26   Carbocoat 45 Industrial Enamel              /products/product-details/?prod=0171
27                     Carbocoat 56              /products/product-details/?prod=DM56
28                     Carbocoat 70              /products/product-details/?prod=1519
29                   Carbocoat 8215              /products/product-details/?prod=8215
..                              ...                                               ...
470                       Thinner 2              /products/product-details/?prod=0522
471                      Thinner 21              /products/product-details/?prod=0521
472                     Thinner 213              /products/product-details/?prod=0555
473                     Thinner 214              /products/product-details/?prod=0556
474                     Thinner 215              /products/product-details/?prod=0557
475                     Thinner 221              /products/product-details/?prod=0546
476                     Thinner 224              /products/product-details/?prod=0574
477                   Thinner 225 E              /products/product-details/?prod=0591
478                     Thinner 228              /products/product-details/?prod=0570
479                     Thinner 230              /products/product-details/?prod=0551
480                     Thinner 231              /products/product-details/?prod=0516
481                     Thinner 234              /products/product-details/?prod=0562
482                     Thinner 235              /products/product-details/?prod=0563
483                   Thinner 236 E              /products/product-details/?prod=0564
484                     Thinner 238              /products/product-details/?prod=0566
485                     Thinner 241              /products/product-details/?prod=0374
486                   Thinner 242 E              /products/product-details/?prod=T242
487                   Thinner 243 E              /products/product-details/?prod=T243
488                     Thinner 246              /products/product-details/?prod=T246
489                     Thinner 248              /products/product-details/?prod=215F
490                      Thinner 25              /products/product-details/?prod=0525
491                     Thinner 254              /products/product-details/?prod=0631
492                      Thinner 26              /products/product-details/?prod=0526
493                      Thinner 33              /products/product-details/?prod=0533
494                      Thinner 38              /products/product-details/?prod=TH39
495                      Thinner 45              /products/product-details/?prod=0545
496                      Thinner 72              /products/product-details/?prod=0572
497                      Thinner 76              /products/product-details/?prod=0576
498             Zinc Filler Type II              /products/product-details/?prod=0229
499            Zinc Filler Type III              /products/product-details/?prod=0224

[500 rows x 2 columns]

2 Comments

I already tried giving him this answer on the other post but I am thinking he is really keen on using selenium only. Which is a shame because all the data loads when the site first loads. No need for selenium when you can just prase the HTML that comes from requesting the website link.
total agree. Selenium is nice, but really only ideal as a last resort.
1

I would shorten as follows, do a starts with operator substring match on href attribute value

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

r = requests.get('http://www.carboline.com/products/')
soup = bs(r.content, 'lxml')
df = pd.DataFrame([(item.text, 'http://www.carboline.com' + item['href']) for item in soup.select('[href^="/products/product-details/?prod="]')], columns = ['product', 'link'])
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.