1

I have a dataframe:

df = pd.DataFrame({'id':[1,2,3,4], 'val1':['21','22','3','35'], 
                   'val2':['99',None,'91','67'], 'val3':['21','45','76','88']})

I want to merge all the values of columns starting with val into single column.

Expected Output:

    id val1  val2 val3       val                                                                                                       
0   1   21    99   21  21,99,21                                                                                                       
1   2   22  None   45     22,45                                                                                                       
2   3    3    91   76   3,91,76                                                                                                       
3   4   35    67   88  35,67,88 

What I Tried:

df['val'] = df['val1']+","+df['val2']+","+df['val3']

Which works well if there's no Null value but if row contains None it makes entire row NaN

   id val1  val2 val3       val                                                                                                       
0   1   21    99   21  21,99,21                                                                                                       
1   2   22  None   45       NaN                                                                                                       
2   3    3    91   76   3,91,76                                                                                                       
3   4   35    67   88  35,67,88
1

2 Answers 2

3

Use apply with dropna:

df['val'] = df[['val1',  'val2', 'val3']].apply(lambda x: ';'.join(x.dropna()), axis=1)
#alternative, thanks Jon Clements
#df['val'] = df.filter(regex='^val').apply(lambda x: ';'.join(x.dropna()), axis=1)
print (df)

   id val1  val2 val3       val
0   1   21    99   21  21;99;21
1   2   22  None   45     22;45
2   3    3    91   76   3;91;76
3   4   35    67   88  35;67;88

Alternative if performance is important is use nested list comprehension:

df['val'] = [';'.join(y for y in x if isinstance(y, str))
                           for x in  df.filter(regex='^val').values]
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks works like a charm. Is there an alternate way to select only columns starting with prefix val?
Yes, you can use df.filter(like='val')
@yatu or df.filter(regex='^val') to only include those that start with val rather than ones that contain val...
0

You're close. You can try filling the null values:

df['val'] = df.fillna('')['val1']+","+df.fillna('')['val2']+","+df.fillna('')['val3']

id val1  val2 val3       val                                                                                                       
0   1   21    99   21  21,99,21                                                                                                       
1   2   22  None   45    22,,45                                                                                                       
2   3    3    91   76   3,91,76                                                                                                       
3   4   35    67   88  35,67,88

4 Comments

@jezrael yes, I have
22,,45 I think
@jezrael What about it?
@MohitMotwani I don't want that extra , in that row 22,,45

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.