0

I'm trying to do some scraping for educational purposes, I just started and am fairly noob at python.

My problem is, in selenium I am trying to scrape a product page, take the name, price, shipping price, and sale counts and append them all into a dictionary to be pasted into a text file for further use.

My problem is, on this website there are 60 items a page, and the price variable is split into 4: "$", "56", ".", and "32" cents. So when I use the loop, it's either giving me "16" as a number for price, or its giving me the individual names of products and the price of each one is like:

Name: productname, Price: 15 Name: productname2, Price: "."

So it's splitting up all the prices seperate variables.

import sys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time

def InitializeSearch():

    productnnn = []
    productppp = []
    
    driver = webdriver.Firefox()
    driver.get("https://www.aliexpress.us/w/wholesale-legos.html?spm=a2g0o.detail.search.0")
    driver.maximize_window()
    driver.execute_script("window.scrollTo(0, 1000)")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, 2000)")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, 3000)")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, 4000)")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, 4500)")
    time.sleep(3)
    
    
    productname = driver.find_elements(By.XPATH, "//h3[@class='kc_j0']") ##text is questionable
    productprice = driver.find_elements(By.XPATH, "//span[@style='font-size:20px;decimal_point:.;comma_style:,;currency-symbol:$;show-decimal:true;symbol_position:left']")
    productsalecount = driver.find_elements(By.XPATH, "//span[@class='kc_jv']")
    productshippingfee = driver.find_elements(By.XPATH, "//span[@class='ml_a1 ml_mn']")

#####This is where the code needs to go#####

InitializeSearch()

Above in #### is where I was putting the code, I have tried quite a bit of different methods including:

for n in productprice:
    pricedict = {}
    pricedict["Price"] = (n.text) ###the .text is required as driver returns a web elem
    print(pricedict)

nesting this in the exact same one for product name, and nesting them all in another loop that counts through productname.

So basically, how do I take the driver elements I have here, cycle through all 60 of them and then append it all into a dictionary to later append to a .text file? even though the price is split into 4 variables?

Sidenote: driver returns a web element and I can encode(errors=ignore) and I get a byte object (the string starting with b')

when I decode it, it turns back into a web element unless I add ascii(encodedstring.decode(errors=ignore))

How do I convert this in selenium to a regular old string object and not a web element??

Tl;Dr: Make my Driver find elements combine all the variables into a dictionary for each individual item cleanly.

for n[0:3] in productprice: dictionary["price"] = (n.text)

Expecting the variables from the html/javascript to be cleanly laid out into a dictionary for each individual item.

3
  • Scrolling and delaying like that is unlikely to be a reliable approach Commented Apr 9 at 6:57
  • driver always gives web element - so you can use it with next find to search details in this element, or to execute_javascript on this element, or to send keyboard/mouse event to this element. This allows first to find rows in table and later search values in cells/columns in this row. Commented Apr 9 at 11:22
  • if you use find_elements with char s at the end then it should gives list with all elements (or list with one element, or empty list) and it may need to use for-loop to work with every element separatelly and get text or other value from element. If you need only first element then you can use find_element without char s Commented Apr 9 at 11:25

1 Answer 1

1

You can select the parent div element, that would give you the full text by .text method.

Also, Your selectros are gonna fail as the class prefix are dynamically generated. So it will frequently fail to find elements.

In the following, I've used chromedriver istead of firefox and updated the selectors so that you can find the elements everytime.

If you want to exclude the $ sign from the price, then you can get format it later.

import sys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

def InitializeSearch():

    productnnn = []
    productppp = []
    chrome_options = Options()
    chrome_options.add_argument('--start-maximized')
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36")  # Set user agent
    driver = webdriver.Chrome(chrome_options)
    driver.get("https://www.aliexpress.us/w/wholesale-legos.html?spm=a2g0o.detail.search.0")

    driver.execute_script("arguments[0].scrollIntoView();", driver.find_element(By.CSS_SELECTOR,'div.footer-copywrite'))
    time.sleep(1)
    
    
    card_items = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item")
    productname = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item h3") 
    productprice = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item div[class$=_k1]")
    productsalecount = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item span[class$=_jv]")
    productshippingfee = driver.find_elements(By.XPATH, "//span[@class='ml_a1 ml_mn']")

    for n in productprice:
        pricedict = {}
        pricedict["Price"] = (n.text) ###the .text is required as driver returns a web elem
        print(pricedict)

    driver.quit()

InitializeSearch()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.