13

I'm trying to concatenate Pandas DataFrame columns with NaN values.

In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"],
                'col2'  : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]})

In [97]: df
Out[97]: 
  col1 col2 col3
0    1   p1    A
1    1   p2    B
2    2   p1    C
3    2  NaN    D
4    3   p2    E
5    3  NaN    F

In [98]: df['concatenated'] = df['col2'] +','+ df['col3']
In [99]: df
Out[99]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D          NaN
4    3   p2    E         p2,E
5    3  NaN    F          NaN

Instead of 'NaN' values in "concatenated" column, I want to get "D" and "F" respectively for this example?

4 Answers 4

20

I don't think your problem is trivial. However, here is a workaround using numpy vectorization :

In [49]: def concat(*args):
    ...:     strs = [str(arg) for arg in args if not pd.isnull(arg)]
    ...:     return ','.join(strs) if strs else np.nan
    ...: np_concat = np.vectorize(concat)
    ...: 

In [50]: np_concat(df['col2'], df['col3'])
Out[50]: 
array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'], 
      dtype='|S64')

In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])

In [52]: df
Out[52]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D            D
4    3   p2    E         p2,E
5    3  NaN    F            F

[6 rows x 4 columns]
Sign up to request clarification or add additional context in comments.

2 Comments

Hey Thanks Kiwi, Seems this is the easiest way of doing. :)
Not sure why but I had to change it up a little, namely strs = [str(arg) for arg in args if not arg == 'nan'] and return ','.join(filter(None, strs)) if strs else ''
12

You could first replace NaNs with empty strings, for the whole dataframe or the column(s) you desire.

In [6]: df = df.fillna('')

In [7]: df['concatenated'] = df['col2'] +','+ df['col3']

In [8]: df
Out[8]:
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2         D           ,D
4    3   p2    E         p2,E
5    3         F           ,F

Comments

5

We can use stack which will drop the NaN, then use groupby.agg and ','.join the strings:

df['concatenated'] = df[['col2', 'col3']].stack().groupby(level=0).agg(','.join)
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D            D
4    3   p2    E         p2,E
5    3  NaN    F            F

Comments

0

You can use str.cat():

df['joined'] = df['col2'].str.cat(df['col3'], na_rep='', sep=', ')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.