I have my spark dataframe as follow:
target_id other_ids
3733345 [3731634, 3729995, 3728014, 3708332, 3720...
3725312 [3711541, 3726052, 3733763, 900056057, 371...
3717114 [3701718, 3713481, 3715433, 3714825, 3731...
3408996 [3405896, 3250400, 3237054, 3242492, 3256...
3354970 [3354969, 3347893, 3348168, 3353273, 3356...
I want to first shuffle the elements in the arrays in of other_ids column and then create a new column new_id where I sample an id from the array of other_ids column where target_id is not in other_ids.
Final result:
target_id other_ids new_id
3733345 [3731634, 3729995, 3728014, 3708332, 3720... 3708332
3725312 [3711541, 3726052, 3733763, 900056057, 371... 900056057
3717114 [3701718, 3713481, 3715433, 3714825, 3731... 3250400
3408996 [3405896, 3250400, 3237054, 3242492, 3256... 3237054
3354970 [3354969, 3347893, 3348168, 3353273, 3356... 3353273
Any suggestions? Thnaks.