0

I want to extract text from a value tag, my xml code fragment and tries are as given below:

<datas>
  <data>
    <column datatype='string' name='[Sub-Category (group)]' role='dimension' type='nominal'>
      <calculation class='categorical-bin' column='[Product Sub-Category]' new-bin='false'>
        <bin value='&quot;Envelopes&quot;'>
          <value>&quot;Envelopes&quot;</value>
          <value>&quot;Labels&quot;</value>
          <value>&quot;Pens &amp; Art Supplies&quot;</value>
          <value>&quot;Rubber Bands&quot;</value>
          <value>&quot;Scissors, Rulers and Trimmers&quot;</value>
        </bin>
      </calculation>
   </column>      
</data>
</datas>

MY try:

root = 'myxmlfile.xml'
valuelist = []
for i in root.findall('./datas/data/column/calculation/bin')
    val  = i.find('value')
    if val:
       for j in val:
           valuelist.append(j.text)
  • I didn't get proper result.
1
  • 1
    you are not closing column tag in your xml file also. Commented Feb 28, 2018 at 9:57

3 Answers 3

3

Try this:

root = open('/your/path_to_file/data.xml', 'rb+')
doc =  ET.parse(root).getroot()
valuelist = []
for i in doc.findall('.//bin'):
    val  = i.findall('value')
    for v in val:
        valuelist.append(v.text)
print valuelist

Output:

['"Envelopes"', '"Labels"', '"Pens & Art Supplies"', '"Rubber Bands"', '"Scissors, Rulers and Trimmers"']
[Finished in 0.0s]
Sign up to request clarification or add additional context in comments.

Comments

1

This might help

# -*- coding: utf-8 -*-
s = """<datas>
  <data>
<column datatype='string' name='[Sub-Category (group)]' role='dimension' type='nominal'>
              <calculation class='categorical-bin' column='[Product Sub-Category]' new-bin='false'>
                <bin value='&quot;Envelopes&quot;'>
                  <value>&quot;Envelopes&quot;</value>
                  <value>&quot;Labels&quot;</value>
                  <value>&quot;Pens &amp; Art Supplies&quot;</value>
                  <value>&quot;Rubber Bands&quot;</value>
                  <value>&quot;Scissors, Rulers and Trimmers&quot;</value>
                </bin>
              </calculation>
    </column>
 </data>
</datas>"""

import xml.etree.ElementTree as et
tree = et.fromstring(s)
for i in tree.findall('.//data/column/calculation/bin'):
    for j in i.findall('value'):
        print(j.text)

Output:

"Envelopes"
"Labels"
"Pens & Art Supplies"
"Rubber Bands"
"Scissors, Rulers and Trimmers"

Comments

1

Rakesh's answer is great, just thought I'd add a bit of explanation of why your code wasn't working.

To begin with you need to convert your XML into an ElementTree - this is basically just a Python object with a tree-like structure of elements and subelements that corresponds to your XML, but is something you can then work with in Python.

If your XML is in a file (rather than just a string within your code), you can do:

tree = ET.parse('myxmlfile.xml')

The root is then the "outermost" element of this tree, which you need to get hold of to be able to work your way around the tree and find elements etc:

root = tree.getroot()

(If you do ET.fromstring(s), this returns the root element so you don't need the getroot step.)

In your example, root is the datas element, which was one of your problems: your path doesn't need to include 'datas' as that's where you're starting from already.

val = i.find('value') will only return the first value element, not a list of all the value elements which is what you want. So when you try to do for j in val, Python is actually trying to find subelements of the value element (which don't exist) so it doesn't have anything to append to valuelist. You need to use findall() here, and if you combine this with a for loop, then you don't need to do the if val check, as the for loop simply won't run if findall() comes back empty.

Putting all this together:

import xml.etree.ElementTree as ET

tree = ET.parse('myxmlfile.xml')  # change to wherever your file is located
root = tree.getroot()

binlist = []
for i in root.findall('./data/column/calculation/bin'):
    valuelist = []
    for j in i.findall('value'):
        valuelist.append(j.text)
    binlist.append(valuelist)

binlist is then a list, with each item in the list being a list of values for that bin.

If you only have one bin, then you can simplify the second half of the code:

import xml.etree.ElementTree as ET

tree = ET.parse('myxmlfile.xml')  # change to wherever your file is located
root = tree.getroot()

bin = root.find('./data/column/calculation/bin')
valuelist = []
for j in bin.findall('value'):
   valuelist.append(j.text)

Note that I've used ET not et for the import of ElementTree (this seems to be the convention). This also assumes that datas is the first element of your XML. If the snippet you've given is some nested inside a bigger XML file, you'll need to get to that element first by doing something like:

bin = root.find('<path to bin element>')

These references might be helpful for you:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.