Concatenate column values in Pandas DataFrame with "NaN" values

Question

I'm trying to concatenate Pandas DataFrame columns with NaN values.

In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"],
                'col2'  : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]})

In [97]: df
Out[97]: 
  col1 col2 col3
0    1   p1    A
1    1   p2    B
2    2   p1    C
3    2  NaN    D
4    3   p2    E
5    3  NaN    F

In [98]: df['concatenated'] = df['col2'] +','+ df['col3']
In [99]: df
Out[99]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D          NaN
4    3   p2    E         p2,E
5    3  NaN    F          NaN

Instead of 'NaN' values in "concatenated" column, I want to get "D" and "F" respectively for this example?

Kiwi · Accepted Answer · 2014-05-03 14:08:27Z

20

I don't think your problem is trivial. However, here is a workaround using numpy vectorization :

In [49]: def concat(*args):
    ...:     strs = [str(arg) for arg in args if not pd.isnull(arg)]
    ...:     return ','.join(strs) if strs else np.nan
    ...: np_concat = np.vectorize(concat)
    ...: 

In [50]: np_concat(df['col2'], df['col3'])
Out[50]: 
array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'], 
      dtype='|S64')

In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])

In [52]: df
Out[52]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D            D
4    3   p2    E         p2,E
5    3  NaN    F            F

[6 rows x 4 columns]

answered May 3, 2014 at 14:08

Kiwi

2,83619 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nilani Algiriyage Over a year ago

Hey Thanks Kiwi, Seems this is the easiest way of doing. :)

n8-da-gr8 Over a year ago

Not sure why but I had to change it up a little, namely strs = [str(arg) for arg in args if not arg == 'nan'] and return ','.join(filter(None, strs)) if strs else ''

sl1129 · Accepted Answer · 2015-12-14 21:29:18Z

12

You could first replace NaNs with empty strings, for the whole dataframe or the column(s) you desire.

In [6]: df = df.fillna('')

In [7]: df['concatenated'] = df['col2'] +','+ df['col3']

In [8]: df
Out[8]:
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2         D           ,D
4    3   p2    E         p2,E
5    3         F           ,F

answered Dec 14, 2015 at 21:29

sl1129

3994 silver badges10 bronze badges

Comments

Erfan · Accepted Answer · 2020-10-10 18:29:21Z

5

We can use stack which will drop the NaN, then use groupby.agg and ','.join the strings:

df['concatenated'] = df[['col2', 'col3']].stack().groupby(level=0).agg(','.join)

  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D            D
4    3   p2    E         p2,E
5    3  NaN    F            F

answered Oct 10, 2020 at 18:29

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Comments

Sobolenko.Evgeniy · Accepted Answer · 2024-07-19 09:04:52Z

0

You can use str.cat():

df['joined'] = df['col2'].str.cat(df['col3'], na_rep='', sep=', ')

answered Jul 19, 2024 at 9:04

Sobolenko.Evgeniy

15 bronze badges

Collectives™ on Stack Overflow

Concatenate column values in Pandas DataFrame with "NaN" values

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related