1

I have an xml file details.xml and the xml file looks something like this,

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 0.7.55.7 8b86ff77">
<meta osm_base="2019-08-02T12:21:02Z"/>

<bounds minlat="19.0983000" minlon="72.8890000" maxlat="19.1184000" maxlon="72.9206000"/>

<node id="245670274" lat="19.1000660" lon="72.8961407" version="5" timestamp="2015-10-27T04:31:16Z" changeset="34895909" uid="3339404" user="Anushka&amp;saroj">
<tag k="AND_a_nosr_p" v="10004762"/>
<tag k="name" v="Kulkarni Wadi"/>
<tag k="place" v="locality"/>
<tag k="source" v="AND"/>
</node>

<node id="245670576" lat="19.1030072" lon="72.8885419" version="4" timestamp="2017-11-22T06:20:01Z" changeset="53992152" uid="1306" user="PlaneMad">
<tag k="source" v="AND"/>
</node>

<node id="619199656" lat="19.1023916" lon="72.9200375" version="3" timestamp="2015-07-03T06:26:42Z" changeset="32379895" uid="2897305" user="Ashok09"/>

<way id="353138857" version="2" timestamp="2015-06-12T10:57:15Z" changeset="31917729" uid="2900596" user="harisha">
<nd ref="3589055782"/>
<nd ref="3589055908"/>
<nd ref="3589055924"/>
<nd ref="3589055914"/>
<nd ref="3589055921"/>
<nd ref="3589055916"/>
<nd ref="3589055922"/>
<nd ref="3589055909"/>
<nd ref="3589055913"/>
<nd ref="3589055904"/>
<nd ref="3589055782"/>
<tag k="building" v="yes"/>
</way>
</osm>

I want to fetch all the information inside the 'node' tag and ignore all other things,

for examples in the above xml we have 3 'node' tag and I want all nested(if available else what is available) information from each tag.

The result should look like, if I store those info in a list,

ids=['245670576','245670576','619199656'] 
lat=['19.1000660','19.1030072','19.1023916']
lon=['72.8961407','72.8885419','72.9200375']
k=[['AND_a_nosr_p','name','place','source'],['source'],[]]
v=[['10004762','Kulkarni Wadi','locality','AND'],['AND'],[]]

How to do it in most efficient way using python ?

3
  • to correlate all subsequent data to each separate node - it's better to store it in a dictionary, don't you think? Commented Aug 3, 2019 at 9:52
  • @RomanPerekhrest I could store it also in dictionary, I need to fetch the data first, How to do it ? Commented Aug 3, 2019 at 10:23
  • Your xml is missing a closing tag for <osm>. Commented Aug 3, 2019 at 10:25

4 Answers 4

1

Extended solution:

import pprint
from xml.etree.ElementTree import ElementTree as ET

tree = ET().parse(source='input.xml')
nodes_data = {}
for node in tree.findall('./node'):
    k = 'node_' + node.attrib['id']   # custom node key
    nodes_data[k] = node.attrib
    tag_attribs = list(zip(*[tag.attrib.items() for tag in list(node)]))
    if not tag_attribs:
        nodes_data[k].update({'k': [], 'v': []})
    else:
        k_items, v_items = zip(*[tag.attrib.items() for tag in list(node)])
        nodes_data[k].update({'k': [t[1] for t in k_items], 'v': [t[1] for t in v_items]})

pprint.pprint(nodes_data)

Actual output:

{'node_245670274': {'changeset': '34895909',
                    'id': '245670274',
                    'k': ['AND_a_nosr_p', 'name', 'place', 'source'],
                    'lat': '19.1000660',
                    'lon': '72.8961407',
                    'timestamp': '2015-10-27T04:31:16Z',
                    'uid': '3339404',
                    'user': 'Anushka&saroj',
                    'v': ['10004762', 'Kulkarni Wadi', 'locality', 'AND'],
                    'version': '5'},
 'node_245670576': {'changeset': '53992152',
                    'id': '245670576',
                    'k': ['source'],
                    'lat': '19.1030072',
                    'lon': '72.8885419',
                    'timestamp': '2017-11-22T06:20:01Z',
                    'uid': '1306',
                    'user': 'PlaneMad',
                    'v': ['AND'],
                    'version': '4'},
 'node_619199656': {'changeset': '32379895',
                    'id': '619199656',
                    'k': [],
                    'lat': '19.1023916',
                    'lon': '72.9200375',
                    'timestamp': '2015-07-03T06:26:42Z',
                    'uid': '2897305',
                    'user': 'Ashok09',
                    'v': [],
                    'version': '3'}}
Sign up to request clarification or add additional context in comments.

Comments

0

You can use lxml for the job.
It has a findall() method which you can use.
Then you can iterate over it's attrib and parse them into list.

1 Comment

I have tried it but some problem is there, could you write a template ?
0

I would suggest using

xml.etree.ElementTree

I couldn’t get to parsing your copy of xml code. I think this copy has a tag closing problems. But in general I would use somthing like:

import xml.etree.ElementTree as et

tree = et.parse('PATH TO XML FILE')
root = tree.getroot()

ids = []
lat = []

for element in root.findall('node'):
    id = element.attrib['id']
    ids.append(id)
    lat = element.attrib['lat']
    ids.append(lat)

Mind you that this code was not tested. If there are mistakes in it I apologies.

Comments

0

Try this:

import lxml.html

xml  = [your xml above]

tree = lxml.html.fromstring(xml)

ids= tree.xpath('//node/@id')
lat = tree.xpath('//node/@lat')
lon = tree.xpath('//node/@lon')
k = tree.xpath('//node/tag/@k')
v = tree.xpath('//node/tag/@v')

print(ids)
print(lat)
print(lon)
print(k)
print(v)

Output:

['245670274', '245670576', '619199656']
['19.1000660', '19.1030072', '19.1023916']
['72.8961407', '72.8885419', '72.9200375']
['AND_a_nosr_p', 'name', 'place', 'source', 'source']
['10004762', 'Kulkarni Wadi', 'locality', 'AND', 'AND']

1 Comment

the third and fourth list should have 3 elements as well, list of lists

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.