2

I'm trying to parse json I've recieved from an api into a pandas DataFrame. That json is ierarchical, in this example I have city code, line name and list of stations for this line. Unfortunately I can't "unpack" it. Would be gratefull for help and explanation.

Json:

{'id': '1',
 'lines': [{'hex_color': 'FFCD1C',
   'id': '8',
   'name': 'Калининская',          <------Line name
   'stations': [{'id': '8.189',
     'lat': 55.745113,
     'lng': 37.864052,
     'name': 'Новокосино',         <------Station 1   
     'order': 0},
    {'id': '8.88',
     'lat': 55.752237,
     'lng': 37.814587,
     'name': 'Новогиреево',        <------Station 2
     'order': 1},
etc.

I'm trying to recieve evrything from lowest level and the add all higher level information (starting from linename):

c = r.content
j = simplejson.loads(c)

tmp=[]
i=0
data1=pd.DataFrame(tmp)
data2=pd.DataFrame(tmp)

pd.concat
station['name']

for station in j['lines']:

    data2 = data2.append(pd.DataFrame(station['stations'], station['name']),ignore_index=True)
data2

Once more - the questions are: How to make it work? Is this solution an optimal one, or there are some functions I should know about?

Update: The Json parses normally:

json_normalize(j)

id  lines                                              name
1   [{'hex_color': 'FFCD1C', 'stations': [{'lat': ...   Москва

Current DataFrame I can get:

data2 = data2.append(pd.DataFrame(station['stations']),ignore_index=True)
    id      lat         lng         name        order
0   8.189   55.745113   37.864052   Новокосино  0
1   8.88    55.752237   37.814587   Новогиреево 1

Desired dataframe can be represented as:

id  lat     lng                     name            order  Line_Name    Id_Top Name_Top
0   8.189   55.745113   37.864052   Новокосино      0      Калининская  1       Москва 
1   8.88    55.752237   37.814587   Новогиреево     1      Калининская  1       Москва
3
  • please post a JSON with at least two top-level elements ('id': '1' and 'id': '2') and make sure that it can be parsed (it should be a valid JSON/dictionary) and provide your desired data set Commented Feb 4, 2018 at 10:27
  • @MaxU it is funny, but the Json I'm working with now have only 1 top-level element. I'll edit the question in a minute. Commented Feb 4, 2018 at 10:31
  • OK, can you post your desired data set / DF? Commented Feb 4, 2018 at 10:31

2 Answers 2

1

In addition to MaxU's answer, I think you still need the highest level id, this should work:

json_normalize(data, ['lines','stations'], ['id',['lines','name']],record_prefix='station_')
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! That is it!
Glad it helped!
Nice one ! +1 :-)
0

Assuming you have the following dictionary:

In [70]: data
Out[70]:
{'id': '1',
 'lines': [{'hex_color': 'FFCD1C',
   'id': '8',
   'name': 'Калининская',
   'stations': [{'id': '8.189',
     'lat': 55.745113,
     'lng': 37.864052,
     'name': 'Новокосино',
     'order': 0},
    {'id': '8.88',
     'lat': 55.752237,
     'lng': 37.814587,
     'name': 'Новогиреево',
     'order': 1}]}]}

Solution: use pandas.io.json.json_normalize:

In [71]: pd.io.json.json_normalize(data['lines'],
                                   ['stations'],
                                   ['name', 'id'],
                                   meta_prefix='parent_')
Out[71]:
      id        lat        lng         name  order  parent_name parent_id
0  8.189  55.745113  37.864052   Новокосино      0  Калининская         8
1   8.88  55.752237  37.814587  Новогиреево      1  Калининская         8

UPDATE: reflects updated question

res = (pd.io.json.json_normalize(data,
                                 ['lines', 'stations'],
                                 ['id', ['lines', 'name']],
                                 meta_prefix='Line_')
         .assign(Name_Top='Москва'))

Result:

In [94]: res
Out[94]:
      id        lat        lng         name  order Line_id Line_lines.name Name_Top
0  8.189  55.745113  37.864052   Новокосино      0       1     Калининская   Москва
1   8.88  55.752237  37.814587  Новогиреево      1       1     Калининская   Москва

1 Comment

Thank you! That solves like 80% of the problem and much more elegant than my solution. Yet there are 2 levels higher than name one, how one more parent level could be added?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.