46

I have two columns with strings. I would like to combine them and ignore nan values. Such that:

ColA, Colb, ColA+ColB
str   str    strstr
str   nan    str
nan   str    str

I tried df['ColA+ColB'] = df['ColA'] + df['ColB'] but that creates a nan value if either column is nan. I've also thought about using concat.

I suppose I could just go with that, and then use some df.ColA+ColB[df[ColA] = nan] = df[ColA] but that seems like quite the workaround.

0

5 Answers 5

46

Call fillna and pass an empty str as the fill value and then sum with param axis=1:

In [3]:
df = pd.DataFrame({'a':['asd',np.NaN,'asdsa'], 'b':['asdas','asdas',np.NaN]})
df

Out[3]:
       a      b
0    asd  asdas
1    NaN  asdas
2  asdsa    NaN

In [7]:
df['a+b'] = df.fillna('').sum(axis=1)
df

Out[7]:
       a      b       a+b
0    asd  asdas  asdasdas
1    NaN  asdas     asdas
2  asdsa    NaN     asdsa
Sign up to request clarification or add additional context in comments.

Comments

34

You could fill the NaN with an empty string:

df['ColA+ColB'] = df['ColA'].fillna('') + df['ColB'].fillna('')

Comments

9

In my case, I wanted to join more than 2 columns together with a separator (a+b+c)

In [3]:
df = pd.DataFrame({'a':['asd',np.NaN,'asdsa'], 'b':['asdas','asdas',np.NaN], 'c':['as',np.NaN ,'ds']})

In [4]: df
Out[4]:
       a      b   c
0    asd  asdas   as
1    NaN  asdas   NaN
2  asdsa    NaN   ds

The following syntax worked for me:

In [5]: df['d'] = df[['a', 'b', 'c']].fillna('').agg('|'.join, axis=1)

In [6]: df

Out[6]:
      a      b    c             d
0    asd  asdas   as  asd|asdas|as
1    NaN  asdas  NaN       |asdas|
2  asdsa    NaN   ds     asdsa||ds

2 Comments

why not include the code to create the df and the output of the solution?
added input and output
8

Using apply and str.cat you can

In [723]: df
Out[723]:
       a      b
0    asd  asdas
1    NaN  asdas
2  asdsa    NaN

In [724]: df['a+b'] = df.apply(lambda x: x.str.cat(sep=''), axis=1)

In [725]: df
Out[725]:
       a      b       a+b
0    asd  asdas  asdasdas
1    NaN  asdas     asdas
2  asdsa    NaN     asdsa

Comments

4

Prefer adding the columns than use apply method. cuz it's faster than apply.

  • Just add the two columns (if you know they are strings)

    %timeit df.bio + df.procedure_codes  
    

    21.2 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

  • Use apply

    %timeit df[eventcol].apply(lambda x: ''.join(x), axis=1)  
    

    13.6 s ± 343 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  • Use Pandas string methods and cat:

    %timeit df[eventcol[0]].str.cat(cols, sep=',')  
    

    264 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  • Using sum (which concatenate strings)

    %timeit df[eventcol].sum(axis=1)  
    

    509 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

see here for more tests

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.