Python Pandas replace NaN in one column with value from another column of the same row it has be as list column

Question

Input dataframe

data = {

'id' :[70,70,1148,557,557,104,581,69],
'r_id' : [[70,34, 44, 23, 11, 71], [70, 53, 33, 73, 41], 
          np.nan, np.nan, np.nan, np.nan,np.nan,[69, 68, 7],]
}

df = pd.DataFrame.from_dict(data)
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                       NaN
3   557                       NaN
4   557                       NaN
5   104                       NaN
6   581                       NaN
7    69               [69, 68, 7]

Output dataframe,

data = {

'id' :[70,70,1148,557,557,104,581,69],
'r_id' : [[70,34, 44, 23, 11, 71], [70, 53, 33, 73, 41], 
          [1148], [557], [557], [104],[581],[69, 68, 7]]
}

df = pd.DataFrame.from_dict(data)
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

I want the target column r_id with a list column the source column id is not a list, referred the below links in stackoverflow, python-pandas-replace-nan-in-one-column Tried the following as well, data_merge_rel.RELATED_DEVICE.fillna(data_merge_rel.DF0_Desc_Label_i.to_list(), inplace=True)

Erfan · Accepted Answer · 2019-12-18 18:12:18Z

2

We can use list_comprehension + Series.fillna.

First we create a list with all the id values converted to list type. Then we replace NaN here by our list values:

df['temp'] = [[x] for x in df['id']]
df['r_id'] = df['r_id'].fillna(df['temp'])
df = df.drop(columns='temp')

Or in one line using apply (thanks r.ook)

df['r_id'] = df['r_id'].fillna(df['id'].apply(lambda x: [x]))

     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

edited Dec 18, 2019 at 18:12

answered Dec 18, 2019 at 17:57

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

r.ook Over a year ago

If you are using list comprehension, why not df['r_id'].fillna(df['id'].apply(lambda x: [x]))?

Erfan Over a year ago

Yes was thinking about it as well. But went for the more "readable" approach. But added it as second option. Thanks @r.ook

anky · Accepted Answer · 2019-12-18 17:38:56Z

2

You can use explode() and groupby():

(df.explode('r_id').ffill(axis=1).reset_index().groupby(['index','id'],sort=False).agg(list)
                                                               .reset_index(1))

         id                      r_id
index                                
0        70  [70, 34, 44, 23, 11, 71]
1        70      [70, 53, 33, 73, 41]
2      1148                    [1148]
3       557                     [557]
4       557                     [557]
5       104                     [104]
6       581                     [581]
7        69               [69, 68, 7]

answered Dec 18, 2019 at 17:38

anky

75.3k11 gold badges46 silver badges76 bronze badges

4 Comments

vinsent paramanantham Over a year ago

Thanking you, can you post both the solution, I have a bigger dataframe need to see the performance as well

vinsent paramanantham Over a year ago

I thought you had two solution one with explode and an another with group by

vinsent paramanantham Over a year ago

It went and did the same operation on other columns as well :(

anky Over a year ago

@vinsentparamanantham you can use the column name before agg such as .groupby(['index','id'],sort=False)['r_id'].agg(list)

Ben.T · Accepted Answer · 2019-12-18 17:52:45Z

You can transform the column id to an array, add a dimension, then make a list of it and fillna with a Series like:

df['r_id'] = df['r_id'].fillna(pd.Series(df.id.to_numpy()[:,None].tolist(), index=df.index))
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

or if you don't have a lot of nan, it may worth to select only these rows prior to do anything:

mask_na = df.r_id.isna()
df.loc[mask_na, 'r_id'] = pd.Series(df.loc[mask_na,'id'].to_numpy()[:,None].tolist(), 
                                    index=df[mask_na].index)

Dan · Accepted Answer · 2019-12-18 17:54:37Z

1

I think anky_91's answer will be faster, but you could also try this:

df['r_id'] = np.where(df['r_id'].isnull(),
                      df['id'].apply(lambda x: [x]),
                      df['r_id'])

Output:

     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

answered Dec 18, 2019 at 17:54

Dan

1,5871 gold badge13 silver badges20 bronze badges

Collectives™ on Stack Overflow

Python Pandas replace NaN in one column with value from another column of the same row it has be as list column

4 Answers 4

2 Comments

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related