0

I am dealing with an XML data file that has the tracking data of players during a football match. See a snippet the top of the XML data file:

<?xml version="1.0" encoding="utf-8"?>
<Tracking update="2017-01-23T14:41:26">
  <Match id="2019285" dateMatch="2016-09-13T18:45:00" matchNumber="13">
    <Competition id="20159" name="UEFA Champions League 2016/2017" />
    <Stadium id="85265" name="Estádio do SL Benfica" pitchLength="10500" pitchWidth="6800" />
    <Phases>
      <Phase start="2016-09-13T18:45:35.245" end="2016-09-13T19:31:49.09" leftTeamID="50157" />
      <Phase start="2016-09-13T19:47:39.336" end="2016-09-13T20:37:10.591" leftTeamID="50147" />
    </Phases>
    <Frames>
      <Frame utc="2016-09-13T18:45:35.272" isBallInPlay="0">
        <Objs>
          <Obj type="7" id="0" x="-46" y="-2562" z="0" sampling="0" />
          <Obj type="0" id="105823" x="939" y="113" sampling="0" />
          <Obj type="0" id="250086090" x="1194" y="1425" sampling="0" />
          <Obj type="0" id="250080473" x="37" y="2875" sampling="0" />
          <Obj type="0" id="250054760" x="329" y="833" sampling="0" />
          <Obj type="1" id="98593" x="-978" y="654" sampling="0" />
          <Obj type="0" id="250075765" x="1724" y="392" sampling="0" />
          <Obj type="1" id="53733" x="-4702" y="45" sampling="0" />
          <Obj type="0" id="250101112" x="54" y="1436" sampling="0" />
          <Obj type="1" id="250017920" x="-46" y="-2562" sampling="0" />
          <Obj type="1" id="105588" x="-1449" y="209" sampling="0" />
          <Obj type="1" id="250003757" x="-2395" y="-308" sampling="0" />
          <Obj type="1" id="101473" x="-690" y="-644" sampling="0" />
          <Obj type="0" id="250075775" x="2069" y="-895" sampling="0" />
          <Obj type="1" id="103695" x="-1654" y="-2022" sampling="0" />
          <Obj type="0" id="250073809" x="4712" y="-16" sampling="0" />
          <Obj type="1" id="63733" x="-2393" y="1145" sampling="0" />
          <Obj type="0" id="250015755" x="-42" y="31" sampling="0" />
          <Obj type="0" id="250055905" x="1437" y="-2791" sampling="0" />
          <Obj type="0" id="250042422" x="1169" y="-1250" sampling="0" />
        </Objs>
      </Frame>
      <Frame utc="2016-09-13T18:45:35.319" isBallInPlay="0">
        <Objs>
          <Obj type="7" id="0" x="-46" y="-2558" z="0" sampling="0" />
          <Obj type="0" id="105823" x="938" y="113" sampling="0" />
          <Obj type="0" id="250086090" x="1198" y="1426" sampling="0" />
          <Obj type="0" id="250080473" x="36" y="2874" sampling="0" />
          <Obj type="0" id="250054760" x="330" y="833" sampling="0" />
          <Obj type="1" id="98593" x="-980" y="654" sampling="0" />
          <Obj type="0" id="250075765" x="1727" y="393" sampling="0" />
          <Obj type="1" id="53733" x="-4712" y="44" sampling="0" />
          <Obj type="0" id="250101112" x="54" y="1435" sampling="0" />
          <Obj type="1" id="250017920" x="-46" y="-2558" sampling="0" />
          <Obj type="1" id="105588" x="-1449" y="209" sampling="0" />
          <Obj type="1" id="250003757" x="-2396" y="-310" sampling="0" />
          <Obj type="1" id="101473" x="-692" y="-645" sampling="0" />
          <Obj type="0" id="250075775" x="2071" y="-896" sampling="0" />
          <Obj type="1" id="103695" x="-1655" y="-2016" sampling="0" />
          <Obj type="0" id="250073809" x="4712" y="-17" sampling="0" />
          <Obj type="1" id="63733" x="-2395" y="1145" sampling="0" />
          <Obj type="0" id="250015755" x="-42" y="29" sampling="0" />
          <Obj type="0" id="250055905" x="1435" y="-2793" sampling="0" />
          <Obj type="0" id="250042422" x="1169" y="-1250" sampling="0" />
        </Objs>
      </Frame>
    </Frames>
  </Match>
</Tracking>

From my understanding this is how I have broken down the file:

  • The root file is Tracking
  • Match is the child of Tracking
  • Competition, Stadium, Phases and Frames are the children of Match
  • Phase is the child of Phases.
  • Frame is the child of Frames.
  • There are many Frame children within Frames. In fact, there is a Frame child for every 45milliseconds of the entire football game. Within each Frame child, there are the player positions for each player, referees and the ball. The actual file continues for thousands and thousands of lines of data. But this snippet is only the first two frames.

I am trying to run the following code to see all the data in the match child:

for x in myroot[0]:
        print(x.tag,x.attrib,x.text)

This is the output:

Competition {'id': '20159', 'name': 'UEFA Champions League 2016/2017'} None
Stadium {'id': '85265', 'name': 'Estádio do SL Benfica', 'pitchLength': '10500', 'pitchWidth': '6800'} None
Phases {} 

Frames {} 

As you can see, the output is two empty dictionaries for phases and frames. How would I get the data from these children?

Furthermore, my next challenge is trying to get this data into a pandas data frame, how would I go about doing this?

I would want the pandas date frame to look something like this (example of two frames but would want it for every frame):

Expected output

7
  • 2
    share the xml data, not pics. also share a visual of ur expected output dataframe(a couple of rows is sufficient). this allows more users better understand what u want and give an appropriate answer Commented Apr 30, 2020 at 7:52
  • Hi Sammy, thanks for the response. I cannot share the xml data as it is 250 MB of data. I will share what I would like the output frame to be, thanks for the tip. Commented Apr 30, 2020 at 8:14
  • oh, u dont have to share the entire data. u shared a screenshot. instead of that screenshot, share that snippet of the data itself. not the entire 250mb. this is an example of how to share ur xml data : link. the answer might be of help as well Commented Apr 30, 2020 at 8:15
  • Thanks for the advice @sammywemmy! I have shared the xml data, and a visual of the expected output data frame. Thanks again Commented Apr 30, 2020 at 8:41
  • @sammywemmy I have tried the link you sent. It looks like it is a similar problem to mine, but for some reason I cannot replicate that solution for my problem. Any help would be much appreciated, thanks :) Commented Apr 30, 2020 at 8:57

1 Answer 1

1

I used the xml etree module to iterate through the xml and pull the relevant data. comments are in the code below to explain the process : Have a look at it, and play with the code. Hopefully, it fits ur use case

import xml.etree.ElementTree as ET
from collections import defaultdict

d = defaultdict(list)
#since u r reading from a file,
# root should be root = ET.parse('filename.xml').getroot()
#mine is wrapped in a string hence :
 root = ET.fromstring(data)
#required data is in the Frame section
for ent in root.findall('./Match//Frame'):
    #this gets us the timestamp
    Frame = ent.attrib['utc']
    for entry in ent.findall('Objs/Obj'):
        #append the objects to the relevant timestamp
        d[Frame].append(entry.attrib)

df = (pd.concat((pd.DataFrame(value) #create dataframe of the values
                 .assign(Frame=key) #assign keys to the dataframe
                 .filter(['id','Frame','x','y','z']) #keep only required columns
                 for key, value in d.items()),
                axis=1) #concatenate on the columns axis
     )

df.head()

id  Frame   x   y   z   id  Frame   x   y   z
0   0   2016-09-13T18:45:35.272 -46 -2562   0   0   2016-09-13T18:45:35.319 -46 -2558   0
1   105823  2016-09-13T18:45:35.272 939 113 NaN 105823  2016-09-13T18:45:35.319 938 113 NaN
2   250086090   2016-09-13T18:45:35.272 1194    1425    NaN 250086090   2016-09-13T18:45:35.319 1198    1426    NaN
3   250080473   2016-09-13T18:45:35.272 37  2875    NaN 250080473   2016-09-13T18:45:35.319 36  2874    NaN
4   250054760   2016-09-13T18:45:35.272 329 833 NaN 250054760   2016-09-13T18:45:35.319 330 833 NaN
Sign up to request clarification or add additional context in comments.

1 Comment

you are a legend, thank you! Really appreciate it. Have not decided on my use case with the data 100%, but now I can use your awesome code template to figure something out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.