0

I'd like to be able to search an xml formatted file by the text value and return the id it is part of. I've looked through the python library at the xml commands but only saw examples for searching by elements/nodes. I have a simplified xml sample below and I'd like search for "3x3 Eyes" for example and return "2". It should also search for the exact text minus case. There will normally be multiple entries for title under each anime so the search can stop at the first match. Thanks

<?xml version="1.0" encoding="UTF-8"?>
<animetitles>
  <anime aid="1">
    <title type="official" xml:lang="fr">Crest of the Stars</title>
    <title type="official" xml:lang="fr">Crest of the Stars</title>
  </anime>
  <anime aid="2">
    <title type="official" xml:lang="en">3x3 Eyes</title>
  </anime>
  <anime aid="3">
    <title type="official" xml:lang="en">3x3 Eyes: Legend of the Divine Demon</title>
  </anime>
</animetitles>
2
  • You want the exact match or also a partial match? This examples should return only 2 or [2, 3]? Commented Aug 18, 2013 at 1:09
  • Well originally I was looking for exact match, but since you mentioned that I might go with starts with because of the nature of what I'm doing. Either way I have the general idea of how to search the text, so it's a trivial matter after that. Also I'm only looking for the first match, which also is easy to add now. Commented Aug 18, 2013 at 2:00

2 Answers 2

1
tree = et.parse( ... )

# Unique match
results = []
for anime in tree.findall('anime'):
    for title in anime.findall('title'):
        if title.text == '3x3 Eyes':
            results.append(anime.get('aid'))
print results

# Everything that starts with
results = []
for anime in tree.findall('anime'):
    for title in anime.findall('title'):
        if title.text.startswith('3x3 Eyes'):
            results.append(anime.get('aid'))
print results

First one returns [2], the second one [2, 3].

Or a little bit more cryptic but, hey, why not :)

results = [anime.get('aid') for anime in tree.findall('anime')
           for title in anime.findall('title') if title.text == '3x3 Eyes']
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the code. This is exactly what I'm looking for. I'll probably be changing how it matches the text depending on what I want to do with my script overall.
0

You can use ElementTree for your purpose.

import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()

def findParentAttrib(string):
    for neighbor in root.iter():
        for parent in neighbor.getiterator():
            for child in parent:
                if child.text == string:
                    return parent.attrib['aid']

print findParentAttrib("3x3 Eyes") # returns 2

Also refer to this page.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.