1

I'm trying to pop MongoDB documents that consist array of JSON by using pop, usually it works well if MongoDB documents is only JSON, in array of JSON it produce too much column, so I can't pop to non-JSON format easily. for more detailed question I give explanation below

Here's my data

Id locations
1   [{'timestamp': 2018-05-28 15:00:00, 'lat': 0.0..
2   [{'timestamp': 2018-05-28 15:00:00, 'lat': 0.0..

What I try to pop

df = df.join(pd.DataFrame(df.pop('locations').values.tolist(), index=df.index))

The output is

Id   0                              1                            ...      136
1   {'timestamp': 2018-05-28...    {'timestamp': 2018-05-28...           {'timestamp': 2018-05-28... 
2   {'timestamp': 2018-05-28...    {'timestamp': 2018-05-28...           None 

The output that I expected is

Id   0                             
1   {'timestamp': 2018-05-28... 
1.  {'timestamp': 2018-05-28... 
    ...
    {'timestamp': 2018-05-28... 
2   {'timestamp': 2018-05-28...
    ...
    {'timestamp': 2018-05-28...

So, I can pop again

1 Answer 1

1

I think need melt:

df2 = df.join(pd.DataFrame(df.pop('locations').values.tolist(), index=df.index)).melt('Id')

Or stack:

s = (pd.DataFrame(df.pop('locations').values.tolist(), index=df.index)
       .stack()
       .reset_index(level=1, drop=True))

df2 = df.join(s.rename('new'))

Or numpy solution with repeat Id values and flatenning nested lists:

df2 = pd.DataFrame({
        "Id": np.repeat(df.Id.values, df.locations.str.len()),
        "new": list(chain.from_iterable(df.locations))})
print (df2)
   Id                                               new
0   1  {'timestamp': '2018-05-28 15:00:00', 'lat': 0.0}
1   1  {'timestamp': '2018-05-28 16:00:00', 'lat': 0.0}
2   2  {'timestamp': '2018-05-28 10:00:00', 'lat': 0.0}
3   2  {'timestamp': '2018-05-28 17:00:00', 'lat': 0.0}
4   2  {'timestamp': '2018-05-28 18:00:00', 'lat': 0.0}

Setup:

df = pd.DataFrame({'Id':[1,2], 
                   'locations':[[{'timestamp': '2018-05-28 15:00:00', 'lat': 0.0}, {'timestamp': '2018-05-28 16:00:00', 'lat': 0.0}],
                                [{'timestamp': '2018-05-28 10:00:00', 'lat': 0.0}, {'timestamp': '2018-05-28 17:00:00', 'lat': 0.0}, {'timestamp': '2018-05-28 18:00:00', 'lat': 0.0}]]})
print (df)


   Id                                          locations
0   1  [{'timestamp': '2018-05-28 15:00:00', 'lat': 0...
1   2  [{'timestamp': '2018-05-28 10:00:00', 'lat': 0...
Sign up to request clarification or add additional context in comments.

2 Comments

works, but not as expected, still debugging, probably mistake in my side
the first is give different output that I expected, the third AttributeError: 'DataFrame' object has no attribute 'locations' but the second works very well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.