0

I have a XML file test.xml and code I have is reading the only 1st attribute, not all the other attributes

test1.xml

<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number sep = "help" dep = "paraug" note = "zempu">QWZ5671</item_number>
         <line cap = "delp" des = "laraug" fote = "cat">QWZ5671</line>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
      </catalog_item>
   </product>
</catalog>

code:

from lxml import etree
from collections import defaultdict

root_1 = etree.parse('test1.xml').getroot()
d1= []
for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d1.append(dict(item))

d1 = sorted(d1, key = lambda x: x['item_number'])
print(d1)

Current output: values of 1st arributes from each element i.e. sep from <item_number> and cap from <line> are being fetched into the dictionary, NOT dep and note from <item_number> and des and fote from <line>

[{'gender': ["Men's"], 'sep': ['help'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'line': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'price': ['39.5']}]

Expected output: to fetch all other attributes also

[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'],'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': ['laraug'], 'fote': ['cat'], 'line': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'price': ['39.5']}]
2
  • I have not looked at your code in any detail at all, but if you say that you want all of something and are only getting the first value, then the immediate thing I notice is a couple of places where you have indexed something using [0], and that would certainly be the first place I would look to fix this, because you probably want to be looping over that thing, rather than taking the first value. Commented Jun 25, 2020 at 8:19
  • I see that Sushanth has now posted an answer for you. Hopefully you can see why this works, and how it relates to what I wrote above. The approach for how you would debug this is just as important as the answer itself. Commented Jun 25, 2020 at 8:31

1 Answer 1

2

try this,

for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        # iterate over the items
        for k, v in x.attrib.items():
            item[k].append(v)

        if x.text.strip():
            item[x.tag].append(x.text.strip())

    d1.append(dict(item))
Sign up to request clarification or add additional context in comments.

2 Comments

The if x.attrib is probably unnecessary here. If it is an empty dictionary then loop will complete successfully with 0 iterations.
actually there is small issue here... I am using this code to read the data from 2 XML files and compare the data. My code was comparing data for only those item numbers which are common in both XMLs and hence if data was same , it will pass. but your code is comparing those item numbers also which are not common in both the XMLs. could you please do some change in the given code only only for getting all the attributes @Sushanth

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.