0

I started playing with python and come across something that should be very simple but I cannot make it work... I had below HTML

<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
    <ul id="navlist" class="s-productscontainer2">

What I need is to use parser.xpath to get value of data-category element.

Im trying for example:

cgy = xpath('//div["data-category"]')

What Im doing wrong ?

4
  • what is parser? And what are you expecting to return? Commented Jun 3, 2019 at 14:43
  • Ignore "parser." , I need to understand how to get string "Category1" which is assigned to data-category from above html Commented Jun 3, 2019 at 14:55
  • @kunduK has given the answer I would (+) then as you need to extract the attribute value. Use select and index if not the first or select_one if the first. Commented Jun 3, 2019 at 15:04
  • Just fyi, your <div> and <ul> tags aren't closed; don't know if that's relevant to anything, though Commented Jun 3, 2019 at 15:51

2 Answers 2

2

Try Selenium webdriver with python.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("url here")
element=driver.find_element_by_xpath("//div[@id='productlistcontainer']")
print(element.get_attribute('data-category'))

Or you can use Beautifulsoup which is python library.

from bs4 import BeautifulSoup

doc = """
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
    <ul id="navlist" class="s-productscontainer2">
"""

soup = BeautifulSoup(doc,'html.parser')
print(soup.select_one('div#productlistcontainer')['data-category'])
Sign up to request clarification or add additional context in comments.

Comments

1

Personally I use lxml html to do my parsing because it is fast and easy to work with in my opinion. I could of shorten up how the category is actually being extracted but I wanted to show you as much detail as possible so you can understand what is going on.

from lxml import html

def extract_data_category(tree):
    elements = [
        e
        for e in tree.cssselect('div#productlistcontainer')
        if e.get('data-category') is not None
    ]
    element = elements[0]
    content = element.get('data-category')
    return content

response = """
<h2 class="sr-only">Available Products</h2>
<div id="productlistcontainer" data-defaultpageno="1" data-descfilter="" class="columns4 columnsmobile2" data-noproductstext="No Products Found" data-defaultsortorder="rank" data-fltrselectedcurrency="GBP" data-category="Category1" data-productidstodisableshortcutbuttons="976516" data-defaultpagelength="100" data-searchtermcategory="" data-noofitemsingtmpost="25">
<ul id="navlist" class="s-productscontainer2">
"""

tree = html.fromstring(response)
data_category = extract_data_category(tree)
print (data_category)

4 Comments

thank you , do you know what would be equivalent of above functionality by using import requests requests.get (xxx)
What is the website you are trying to request? But I do think that has to be another question as you did not ask anything about requests in this question.
Your solution is working fine if data-category exist in HTML , if it hits page where data-category do not exist im getting error "Index out of range" - which means empty list. I was trying to add IF after elements but this doesent change anything: element = elements.append('abc') if len(elements) == 0 else elements[0]
Well of course, can easily add in error handling. You could even save the response to a file and look it over to see why data-category is not there. This was just a simple example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.