0

I am using the pyflightdata library to search for flight stats. It returns json inside a list of dicts.

Here is an example of the first dictionary in the list after my query:

> flightlog = {'identification': {'number': {'default': 'KE504', 'alternative': 'None'}, 'callsign': 'KAL504', 'codeshare': 'None'}
, 'status': {'live': False, 'text': 'Landed 22:29', 'estimated': 'None', 'ambiguous': False, 'generic': {'status': {'text': 'landed', 'type': 'arrival', 'color': 'green', 'diverted': 'None'}
, 'eventTime': {'utc_millis': 1604611778000, 'utc_date': '20201105', 'utc_time': '2229', 'utc': 1604611778, 'local_millis': 1604615378000, 'local_date': '20201105', 'local_time': '2329', 'local': 1604615378}}}
, 'aircraft': {'model': {'code': 'B77L', 'text': 'Boeing 777-FEZ'}, 'registration': 'HL8075', 'country': {'name': 'South Korea', 'alpha2': 'KR', 'alpha3': 'KOR'}}
, 'airline': {'name': 'Korean Air', 'code': {'iata': 'KE', 'icao': 'KAL'}}
, 'airport': {'origin': {'name': 'London Heathrow Airport', 'code': {'iata': 'LHR', 'icao': 'EGLL'}, 'position': {'latitude': 51.471626, 'longitude': -0.467081, 'country': {'name': 'United Kingdom', 'code': 'GB'}, 'region': {'city': 'London'}}
, 'timezone': {'name': 'Europe/London', 'offset': 0, 'abbr': 'GMT', 'abbrName': 'Greenwich Mean Time', 'isDst': False}}, 'destination': {'name': 'Paris Charles de Gaulle Airport', 'code': {'iata': 'CDG', 'icao': 'LFPG'}, 'position': {'latitude': 49.012516, 'longitude': 2.555752, 'country': {'name': 'France', 'code': 'FR'}, 'region': {'city': 'Paris'}}, 'timezone': {'name': 'Europe/Paris', 'offset': 3600, 'abbr': 'CET', 'abbrName': 'Central European Time', 'isDst': False}}, 'real': 'None'}
, 'time': {'scheduled': {'departure_millis': 1604607300000, 'departure_date': '20201105', 'departure_time': '2115', 'departure': 1604607300, 'arrival_millis': 1604612700000, 'arrival_date': '20201105', 'arrival_time': '2245', 'arrival': 1604612700}, 'real': {'departure_millis': 1604609079000, 'departure_date': '20201105', 'departure_time': '2144', 'departure': 1604609079, 'arrival_millis': 1604611778000, 'arrival_date': '20201105', 'arrival_time': '2229', 'arrival': 1604611778}, 'estimated': {'departure': 'None', 'arrival': 'None'}, 'other': {'eta_millis': 1604611778000, 'eta_date': '20201105', 'eta_time': '2229', 'eta': 1604611778}}}

This dictionary is a huge, multi-nested, json mess and I am struggling to find a way to make it readable. I guess something like this:

 identification     number      default                 KE504
                                alternative             None
                    callsign                            KAL504
                    codeshare                           None

 status             live                                False
                    text                                Landed 22:29
                    Estimated                           None
                    ambiguous                           False
...

I am trying to turn it into a pandas DataFrame, with mixed results.

In this post it was explained that MultiIndex values have to be tuples, not dictionaries, so I used their example to convert my dictionary:

> flightlog_tuple = {(outerKey, innerKey): values for outerKey, innerDict in flightlog.items() for innerKey, values in innerDict.items()}

Which worked, up to a certain point.

df2 = pd.Series(flightlog_tuple)

gives the following output:

identification  number                {'default': 'KE504', 'alternative': 'None'}
                callsign                                                   KAL504
                codeshare                                                    None
status          live                                                        False
                text                                                 Landed 22:29
                estimated                                                    None
                ambiguous                                                   False
                generic         {'status': {'text': 'landed', 'type': 'arrival...
aircraft        model                  {'code': 'B77L', 'text': 'Boeing 777-FEZ'}
                registration                                               HL8075
                country         {'name': 'South Korea', 'alpha2': 'KR', 'alpha...
airline         name                                                   Korean Air
                code                                {'iata': 'KE', 'icao': 'KAL'}
airport         origin          {'name': 'London Heathrow Airport', 'code': {'...
                destination     {'name': 'Paris Charles de Gaulle Airport', 'c...
                real                                                         None
time            scheduled       {'departure_millis': 1604607300000, 'departure...
                real            {'departure_millis': 1604609079000, 'departure...
                estimated                {'departure': 'None', 'arrival': 'None'}
                other           {'eta_millis': 1604611778000, 'eta_date': '202...
dtype: object

Kind of what I was going for but some of the indexes are still in the column with values because there are so many levels. So I followed this explanation and tried to add more levels:

level_up = {(level1Key, level2Key, level3Key): values for level1Key, level2Dict in flightlog.items() for level2Key, level3Dict in level2Dict.items() for level3Key, values in level3Dict.items()}
df2 = pd.Series(level_up)

This code gives me AttributeError: 'str' object has no attribute 'items'. I don't understand why the first 2 indexes worked, but the others give an error.

I've tried other methods like MultiIndex.from_tuple or DataFrame.from_dict, but I can't get it to work.

This Dictionary is too complex as a beginner. I don't know what the right approach is. Maybe I am using DataFrames in the wrong way. Maybe there is an easier way to access the data that I am overlooking.

Any help would be much appreciated!

2
  • what you have is many tables in a single json you need to logically split them out and relate them back to each other using relevant keys to match them up, something like this would be a good starting point dfs = {k : pd.json_normalize(v) for k,v in flightlog.items()} Commented Nov 6, 2020 at 15:34
  • I just realized the pyflightdata API returns data in single quotes, so contrary to their documentation, it isn't JSON. json_normalize works partially, but data is still unreadable. Commented Nov 6, 2020 at 17:57

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.