2

Input Dataframe as below

data = {

's_id' :[5,7,26,70.0,55,71.0,8.0,'nan','nan',4],
'r_id' : [[34, 44, 23, 11, 71], [53, 33, 73, 41], [17], [10, 31], [17], [75, 8],[7],[68],[50],[]]

}

df = pd.DataFrame.from_dict(data)
df
Out[240]: 
  s_id                  r_id
0    5  [34, 44, 23, 11, 71]
1    7      [53, 33, 73, 41]
2   26                  [17]
3   70              [10, 31]
4   55                  [17]
5   71               [75, 8]
6    8                   [7]
7  nan                  [68]
8  nan                  [50]
9    4                    []

Expected dataframe

data = {

's_id' :[5,7,26,70.0,55,71.0,8.0,'nan','nan',4],
'r_id' : [[5,34, 44, 23, 11, 71], [7,53, 33, 73, 41], [26,17], [70,10, 31], [55,17], [71,75, 8],[8,7],[68],[50],[4]]

}
df = pd.DataFrame.from_dict(data)
df
Out[241]: 
  s_id                     r_id
0    5  [5, 34, 44, 23, 11, 71]
1    7      [7, 53, 33, 73, 41]
2   26                 [26, 17]
3   70             [70, 10, 31]
4   55                 [55, 17]
5   71              [71, 75, 8]
6    8                   [8, 7]
7  nan                     [68]
8  nan                     [50]
9    4                      [4]

Need to populate the list column with the elements from S_id as the first element in the list column of r_id, I also have nan values and some of them are appearing as float columns, Thanking you.

I tried the following,

df['r_id'] = df["s_id"].apply(lambda x : x.append(df['r_id']) )

df['r_id'] = df["s_id"].apply(lambda x : [x].append(df['r_id'].values.tolist()))
0

1 Answer 1

2

If nans are missing values use apply with convert value to one element list with converting to integers and filter for omit mising values:

data = {

's_id' :[5,7,26,70.0,55,71.0,8.0,np.nan,np.nan,4],
'r_id' : [[34, 44, 23, 11, 71], [53, 33, 73, 41], 
          [17], [10, 31], [17], [75, 8],[7],[68],[50],[]]
}

df = pd.DataFrame.from_dict(data)
    print (df)

f = lambda x : [int(x["s_id"])] + x['r_id'] if pd.notna(x["s_id"]) else x['r_id']
df['r_id'] = df.apply(f, axis=1)
print (df)
   s_id                     r_id
0   5.0  [5, 34, 44, 23, 11, 71]
1   7.0      [7, 53, 33, 73, 41]
2  26.0                 [26, 17]
3  70.0             [70, 10, 31]
4  55.0                 [55, 17]
5  71.0              [71, 75, 8]
6   8.0                   [8, 7]
7   NaN                     [68]
8   NaN                     [50]
9   4.0                      [4]

Another idea is filter column and apply function to non NaNs rows:

m = df["s_id"].notna()
f = lambda x : [int(x["s_id"])] + x['r_id']
df.loc[m, 'r_id'] = df[m].apply(f, axis=1)
print (df)
   s_id                     r_id
0   5.0  [5, 34, 44, 23, 11, 71]
1   7.0      [7, 53, 33, 73, 41]
2  26.0                 [26, 17]
3  70.0             [70, 10, 31]
4  55.0                 [55, 17]
5  71.0              [71, 75, 8]
6   8.0                   [8, 7]
7   NaN                     [68]
8   NaN                     [50]
9   4.0                      [4]
Sign up to request clarification or add additional context in comments.

2 Comments

71 [71.0, 75, 8], 71 [71.0, 75, 8], how to make this float to integer, should I do after these steps.
which one is better and faster?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.