4

Input dataframe

data = {

'id' :[70,70,1148,557,557,104,581,69],
'r_id' : [[70,34, 44, 23, 11, 71], [70, 53, 33, 73, 41], 
          np.nan, np.nan, np.nan, np.nan,np.nan,[69, 68, 7],]
}

df = pd.DataFrame.from_dict(data)
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                       NaN
3   557                       NaN
4   557                       NaN
5   104                       NaN
6   581                       NaN
7    69               [69, 68, 7]

Output dataframe,

data = {

'id' :[70,70,1148,557,557,104,581,69],
'r_id' : [[70,34, 44, 23, 11, 71], [70, 53, 33, 73, 41], 
          [1148], [557], [557], [104],[581],[69, 68, 7]]
}

df = pd.DataFrame.from_dict(data)
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

I want the target column r_id with a list column the source column id is not a list, referred the below links in stackoverflow, python-pandas-replace-nan-in-one-column Tried the following as well, data_merge_rel.RELATED_DEVICE.fillna(data_merge_rel.DF0_Desc_Label_i.to_list(), inplace=True)

4 Answers 4

2

We can use list_comprehension + Series.fillna.

First we create a list with all the id values converted to list type. Then we replace NaN here by our list values:

df['temp'] = [[x] for x in df['id']]
df['r_id'] = df['r_id'].fillna(df['temp'])
df = df.drop(columns='temp')

Or in one line using apply (thanks r.ook)

df['r_id'] = df['r_id'].fillna(df['id'].apply(lambda x: [x]))
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]
Sign up to request clarification or add additional context in comments.

2 Comments

If you are using list comprehension, why not df['r_id'].fillna(df['id'].apply(lambda x: [x]))?
Yes was thinking about it as well. But went for the more "readable" approach. But added it as second option. Thanks @r.ook
2

You can use explode() and groupby():

(df.explode('r_id').ffill(axis=1).reset_index().groupby(['index','id'],sort=False).agg(list)
                                                               .reset_index(1))

         id                      r_id
index                                
0        70  [70, 34, 44, 23, 11, 71]
1        70      [70, 53, 33, 73, 41]
2      1148                    [1148]
3       557                     [557]
4       557                     [557]
5       104                     [104]
6       581                     [581]
7        69               [69, 68, 7]

4 Comments

Thanking you, can you post both the solution, I have a bigger dataframe need to see the performance as well
I thought you had two solution one with explode and an another with group by
It went and did the same operation on other columns as well :(
@vinsentparamanantham you can use the column name before agg such as .groupby(['index','id'],sort=False)['r_id'].agg(list)
1

You can transform the column id to an array, add a dimension, then make a list of it and fillna with a Series like:

df['r_id'] = df['r_id'].fillna(pd.Series(df.id.to_numpy()[:,None].tolist(), index=df.index))
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

or if you don't have a lot of nan, it may worth to select only these rows prior to do anything:

mask_na = df.r_id.isna()
df.loc[mask_na, 'r_id'] = pd.Series(df.loc[mask_na,'id'].to_numpy()[:,None].tolist(), 
                                    index=df[mask_na].index)

Comments

1

I think anky_91's answer will be faster, but you could also try this:

df['r_id'] = np.where(df['r_id'].isnull(),
                      df['id'].apply(lambda x: [x]),
                      df['r_id'])

Output:

     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.