Pandas - replace all NaN values in DataFrame with empty python dict objects

Question

I have a pandas DataFrame where each cell contains a python dict.

>>> data = {'Q':{'X':{2:2010}, 'Y':{2:2011, 3:2009}},'R':{'X':{1:2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

I'd like to replace the NaN with an empty dict, to get this result:

                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        {}

However, because the fillna function interprets empty dict not as a scalar value but as a mapping of column --> value, it does NOTHING if I simply do this (i.e. it doesn't work):

>>> frame.fillna(inplace=True, value={})
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

Is there any way to use fillna to accomplish what I want? Do I have to iterate through the entire DataFrame or construct a silly dict with all my columns mapped to empty dict?

The "silly solution" won't even work, since it'll then try to use that dict to figure out which values to use within each column's Series. At which point you need to write out the index of every value. So no, it's not workable, unfortunately. Just use loc as described below. — Mark Whitfield
– Mark Whitfield, Commented Sep 17, 2014 at 19:45
@MarkWhitfield Don't give up yet! You can create a dict within a dict to make it work. See my solution. — Shashank Agarwal
– Shashank Agarwal, Commented Sep 17, 2014 at 19:49
It's worth noting that storing nonscalar entries in cells isn't really supported, and a lot of pandas functionality will break. YMMV, of course. — DSM
– DSM, Commented Sep 18, 2014 at 1:35
@DSM, doesn't numpy support Python objects in cells? I was under the impression pandas is based on numpy and supports any data type as well? — ValAyal
– ValAyal, Commented Sep 18, 2014 at 2:00

ValAyal · Accepted Answer · 2014-09-18 00:45:05Z

19

I was able to use DataFrame.applymap in this way:

>>> from pandas import isnull
>>> frame=frame.applymap(lambda x: {} if isnull(x) else x)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

This solution avoids the pitfalls in both EdChum's solution (where all NaN cells wind up pointing at same underlying dict object in memory, preventing them from being updated independently from one another) and Shashank's (where a potentially large data structure needs to be constructed with nested dicts, just to specify a single empty dict value).

edited Sep 18, 2014 at 0:45

answered Sep 17, 2014 at 21:56

ValAyal

1,2393 gold badges11 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Josh Bode · Accepted Answer · 2019-05-26 23:48:04Z

7

DataFrame.where is a way of achieving this quite directly:

>>> data = {'Q': {'X': {2: 2010}, 'Y': {2: 2011, 3: 2009}}, 'R': {'X': {1: 2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

>>> frame.where(frame.notna(), lambda x: [{}])
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

Also, it appears to be a bit faster:

>>> %timeit frame.where(frame.notna(), lambda x: [{}])
791 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit frame.applymap(lambda x: {} if isnull(x) else x)
1.07 ms ± 7.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

(on larger datasets I've observed speedups of ~10x)

edited May 26, 2019 at 23:48

answered May 26, 2019 at 3:01

Josh Bode

3,8021 gold badge30 silver badges19 bronze badges

Comments

Shashank Agarwal · Accepted Answer · 2014-09-17 20:16:52Z

3

The problem is that when a dict is passed to fillna, it tries to fill the values based on the columns in the frame. So the first solution I tried was -

frame.fillna({column: {} for column in frame.columns})

But if a dictionary is provided at the second level like this, it tries to match the keys against the index, so the solution that worked was -

frame.fillna({column: {ind: {} for ind in frame.index} for column in frame.columns})

Which gives -

                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

EdChum's answer is probably better for your needs, but this can be used when you don't want to make changes in place.

EDIT: The solution above works well for smaller frames, but can be a problem for larger frames. Using replace can solve that.

frame.replace(np.nan, {column: {} for column in frame.columns})

edited Sep 17, 2014 at 20:16

answered Sep 17, 2014 at 19:48

Shashank Agarwal

2,8141 gold badge25 silver badges25 bronze badges

3 Comments

ValAyal Over a year ago

Yes, this works, but the reason I called it the "silly" solution in my original question is that it seems wasteful to construct a large data structure (my DataFrame isn't really 2x2, as you can imagine) full of empty dicts. Especially since python dicts are a bit memory hungry.

Shashank Agarwal Over a year ago

@ValAyal Agreed, it's a bit silly. For smaller frames it works, but for larger frames, this can be an issue. However, we can use replace! See edited answer.

ValAyal Over a year ago

using frame.replace has a similar problem as EdChum's answer with frame.loc. All the NaNs in a particular column wind up pointing to the same dict object in memory, so they can't be independently changed. I was finally able to accomplish what I was looking for using frame.applymap (see my answer).

K3---rnc · Accepted Answer · 2017-10-07 00:59:19Z

2

Use .values accessor to assign into numpy array directly:

frame.R = frame.R.astype(object)  # assertion

frame.R.values[frame.R.isnull()] = {}

answered Oct 7, 2017 at 0:59

K3---rnc

7,0993 gold badges34 silver badges50 bronze badges

Comments

EdChum · Accepted Answer · 2014-09-17 19:42:59Z

1

This works using loc:

In [6]:

frame.loc[frame['R'].isnull(), 'R'] = {}
frame
Out[6]:
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

answered Sep 17, 2014 at 19:42

EdChum

397k204 gold badges836 silver badges583 bronze badges

8 Comments

EdChum Over a year ago

@acushner Sorry I don't understand what do you mean by all? This sets just the null 'R' values to an empty dict

acushner Over a year ago

sorry, didn't mean to delete. won't they all be the same empty dict? in other words, if you modify one later, it changes them all.

EdChum Over a year ago

@acushner how do you figure that? Are you saying if I had multiple NaN's updated them all to empty dicts, then modified just one row you're expecting all empty dicts to update?

acushner Over a year ago

it's all the same empty dict. if i do df.ix['Y', 'R']['snth'] = 12, then all locations where i set an empty dict will look like {'snth': 12}

EdChum Over a year ago

@acushner I see what you mean but this is a problem for the OP storing empty dicts, for me this is a wierd thing to do and is just going to open up a world of pain

|

JDenman6 · Accepted Answer · 2021-03-18 16:17:45Z

@Josh_Bode's answer helped me a lot. Here's a very slightly different version. I used mask() instead of where() (pretty trivial change). I also updated the way to assign an empty dictionary. By creating a list of dict instances as long as the frame and then assigning that, I avoided the trap of many copies of the same dict.

>>> data = {'Q': {'X': {2: 2010}, 'Y': {2: 2011, 3: 2009}}, 'R': {'X': {1: 2013}}}
>>> frame = DataFrame(data)
>>> frame
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}        NaN

>>> frame.mask(frame.isna(), lambda x: [{} for _ in range(len(frame)])
                    Q          R
X           {2: 2010}  {1: 2013}
Y  {2: 2011, 3: 2009}         {}

Collectives™ on Stack Overflow

Pandas - replace all NaN values in DataFrame with empty python dict objects

6 Answers 6

Comments

Comments

3 Comments

Comments

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

3 Comments

Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related