Not getting all the attributes from element tags in XML using python

Question

I have a XML file test.xml and code I have is reading the only 1st attribute, not all the other attributes

test1.xml

<?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number sep = "help" dep = "paraug" note = "zempu">QWZ5671</item_number>
         <line cap = "delp" des = "laraug" fote = "cat">QWZ5671</line>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
      </catalog_item>
   </product>
</catalog>

code:

from lxml import etree
from collections import defaultdict

root_1 = etree.parse('test1.xml').getroot()
d1= []
for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d1.append(dict(item))

d1 = sorted(d1, key = lambda x: x['item_number'])
print(d1)

Current output: values of 1st arributes from each element i.e. sep from <item_number> and cap from <line> are being fetched into the dictionary, NOT dep and note from <item_number> and des and fote from <line>

[{'gender': ["Men's"], 'sep': ['help'], 'item_number': ['QWZ5671'], 'cap': ['delp'], 'line': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'price': ['39.5']}]

Expected output: to fetch all other attributes also

[{'gender': ["Men's"], 'sep': ['help'], 'dep': ['paraug'], 'note': ['zempu'],'item_number': ['QWZ5671'], 'cap': ['delp'], 'des': ['laraug'], 'fote': ['cat'], 'line': ['QWZ5671'], 'cool_number': ['QWZ5671'], 'price': ['39.5']}]

I have not looked at your code in any detail at all, but if you say that you want all of something and are only getting the first value, then the immediate thing I notice is a couple of places where you have indexed something using [0], and that would certainly be the first place I would look to fix this, because you probably want to be looping over that thing, rather than taking the first value. — alani
– alani, Commented Jun 25, 2020 at 8:19
I see that Sushanth has now posted an answer for you. Hopefully you can see why this works, and how it relates to what I wrote above. The approach for how you would debug this is just as important as the answer itself. — alani
– alani, Commented Jun 25, 2020 at 8:31

sushanth · Accepted Answer · 2020-06-25 08:24:40Z

2

try this,

for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        # iterate over the items
        for k, v in x.attrib.items():
            item[k].append(v)

        if x.text.strip():
            item[x.tag].append(x.text.strip())

    d1.append(dict(item))

edited Jun 25, 2020 at 8:24

answered Jun 25, 2020 at 8:20

sushanth

8,2923 gold badges20 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

alani Over a year ago

The if x.attrib is probably unnecessary here. If it is an empty dictionary then loop will complete successfully with 0 iterations.

Abhinandan Mehandiratta Over a year ago

actually there is small issue here... I am using this code to read the data from 2 XML files and compare the data. My code was comparing data for only those item numbers which are common in both XMLs and hence if data was same , it will pass. but your code is comparing those item numbers also which are not common in both the XMLs. could you please do some change in the given code only only for getting all the attributes @Sushanth

Collectives™ on Stack Overflow

Not getting all the attributes from element tags in XML using python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related