I am dealing with an XML data file that has the tracking data of players during a football match. See a snippet the top of the XML data file:
<?xml version="1.0" encoding="utf-8"?>
<Tracking update="2017-01-23T14:41:26">
<Match id="2019285" dateMatch="2016-09-13T18:45:00" matchNumber="13">
<Competition id="20159" name="UEFA Champions League 2016/2017" />
<Stadium id="85265" name="Estádio do SL Benfica" pitchLength="10500" pitchWidth="6800" />
<Phases>
<Phase start="2016-09-13T18:45:35.245" end="2016-09-13T19:31:49.09" leftTeamID="50157" />
<Phase start="2016-09-13T19:47:39.336" end="2016-09-13T20:37:10.591" leftTeamID="50147" />
</Phases>
<Frames>
<Frame utc="2016-09-13T18:45:35.272" isBallInPlay="0">
<Objs>
<Obj type="7" id="0" x="-46" y="-2562" z="0" sampling="0" />
<Obj type="0" id="105823" x="939" y="113" sampling="0" />
<Obj type="0" id="250086090" x="1194" y="1425" sampling="0" />
<Obj type="0" id="250080473" x="37" y="2875" sampling="0" />
<Obj type="0" id="250054760" x="329" y="833" sampling="0" />
<Obj type="1" id="98593" x="-978" y="654" sampling="0" />
<Obj type="0" id="250075765" x="1724" y="392" sampling="0" />
<Obj type="1" id="53733" x="-4702" y="45" sampling="0" />
<Obj type="0" id="250101112" x="54" y="1436" sampling="0" />
<Obj type="1" id="250017920" x="-46" y="-2562" sampling="0" />
<Obj type="1" id="105588" x="-1449" y="209" sampling="0" />
<Obj type="1" id="250003757" x="-2395" y="-308" sampling="0" />
<Obj type="1" id="101473" x="-690" y="-644" sampling="0" />
<Obj type="0" id="250075775" x="2069" y="-895" sampling="0" />
<Obj type="1" id="103695" x="-1654" y="-2022" sampling="0" />
<Obj type="0" id="250073809" x="4712" y="-16" sampling="0" />
<Obj type="1" id="63733" x="-2393" y="1145" sampling="0" />
<Obj type="0" id="250015755" x="-42" y="31" sampling="0" />
<Obj type="0" id="250055905" x="1437" y="-2791" sampling="0" />
<Obj type="0" id="250042422" x="1169" y="-1250" sampling="0" />
</Objs>
</Frame>
<Frame utc="2016-09-13T18:45:35.319" isBallInPlay="0">
<Objs>
<Obj type="7" id="0" x="-46" y="-2558" z="0" sampling="0" />
<Obj type="0" id="105823" x="938" y="113" sampling="0" />
<Obj type="0" id="250086090" x="1198" y="1426" sampling="0" />
<Obj type="0" id="250080473" x="36" y="2874" sampling="0" />
<Obj type="0" id="250054760" x="330" y="833" sampling="0" />
<Obj type="1" id="98593" x="-980" y="654" sampling="0" />
<Obj type="0" id="250075765" x="1727" y="393" sampling="0" />
<Obj type="1" id="53733" x="-4712" y="44" sampling="0" />
<Obj type="0" id="250101112" x="54" y="1435" sampling="0" />
<Obj type="1" id="250017920" x="-46" y="-2558" sampling="0" />
<Obj type="1" id="105588" x="-1449" y="209" sampling="0" />
<Obj type="1" id="250003757" x="-2396" y="-310" sampling="0" />
<Obj type="1" id="101473" x="-692" y="-645" sampling="0" />
<Obj type="0" id="250075775" x="2071" y="-896" sampling="0" />
<Obj type="1" id="103695" x="-1655" y="-2016" sampling="0" />
<Obj type="0" id="250073809" x="4712" y="-17" sampling="0" />
<Obj type="1" id="63733" x="-2395" y="1145" sampling="0" />
<Obj type="0" id="250015755" x="-42" y="29" sampling="0" />
<Obj type="0" id="250055905" x="1435" y="-2793" sampling="0" />
<Obj type="0" id="250042422" x="1169" y="-1250" sampling="0" />
</Objs>
</Frame>
</Frames>
</Match>
</Tracking>
From my understanding this is how I have broken down the file:
- The root file is Tracking
- Match is the child of Tracking
- Competition, Stadium, Phases and Frames are the children of Match
- Phase is the child of Phases.
- Frame is the child of Frames.
- There are many Frame children within Frames. In fact, there is a Frame child for every 45milliseconds of the entire football game. Within each Frame child, there are the player positions for each player, referees and the ball. The actual file continues for thousands and thousands of lines of data. But this snippet is only the first two frames.
I am trying to run the following code to see all the data in the match child:
for x in myroot[0]:
print(x.tag,x.attrib,x.text)
This is the output:
Competition {'id': '20159', 'name': 'UEFA Champions League 2016/2017'} None
Stadium {'id': '85265', 'name': 'Estádio do SL Benfica', 'pitchLength': '10500', 'pitchWidth': '6800'} None
Phases {}
Frames {}
As you can see, the output is two empty dictionaries for phases and frames. How would I get the data from these children?
Furthermore, my next challenge is trying to get this data into a pandas data frame, how would I go about doing this?
I would want the pandas date frame to look something like this (example of two frames but would want it for every frame):