Pandas: concatenate a list of columns into one column

Question

I am wondering if I could build such a module in Pandas:

    def concatenate(df,columnlist,newcolumn):
        # df is the dataframe and
        # columnlist is the list contains the column names of all the columns I want to concatnate
        # newcolumn is the name of the resulted new column

        for c in columnlist:
            ...some Pandas functions

        return df # this one has the concatenated "newcolumn"

I am asking this because that len(columnlist) is going to be very big and dynamic. Thanks!

Does this answer your question? Combine two columns of text in dataframe in pandas/python — jmuhlenkamp
– jmuhlenkamp, Commented Dec 13, 2019 at 21:57

John Zwinck · Accepted Answer · 2017-11-25 01:50:49Z

10

Try this:

import numpy as np
np.add.reduce(df[columnlist], axis=1)

What this does is to "add" the values in each row, which for strings means to concatenate them ("abc" + "de" == "abcde").

Originally I thought you wanted to concatenate them lengthwise, into a single longer series of all the values. If anyone else wants to do that, here's the code:

pd.concat(map(df.get, columnlist)).reset_index(drop=True)

edited Nov 25, 2017 at 1:50

answered Nov 25, 2017 at 1:04

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

LarryZ Over a year ago

Thanks John! I guess you misunderstood my original request @John Zwinck: if Column A is "ABC" and Column B is "XYZ" my newcolumn should be "ABCXYZ". The newcolumn has the exact length of the dataframe.

John Zwinck Over a year ago

@LarryZ: I see. I've changed my answer.

LarryZ Over a year ago

Thanks, @John Zwinck. It worked! It seems this method requires all the columns to be str, when any column contains int or float it will give the following error: " TypeError: must be str, not float "

John Zwinck Over a year ago

@LarryZ: You can fix that by np.add.reduce(df[columnlist].astype(str), axis=1).

LarryZ Over a year ago

Thanks, man! This is the answer, period! A shameless followup question: What if I also want to add a "separator" between columns? i.e. instead of "ABCXYZ" I want "ABC XYZ"? A dumb way is to add a new column called "Space" - contains nothing but one space " ", then insert the column name "Sapace" to my columnlist where necessary, it worked fine. Is there a more Pythonic way to do this?

|

cs95 · Accepted Answer · 2017-11-25 02:05:22Z

10

Given a dataframe like this:

df

     A    B
0  aaa  ddd
1  bbb  eee
2  ccc  fff

You can just use df.sum, given every column is a string column:

df.sum(1)

0    aaaddd
1    bbbeee
2    cccfff
dtype: object

If you need to perform a conversion, you can do so:

df.astype(str).sum(1)

If you need to select a subset of your data (only string columns?), you can use select_dtypes:

df.select_dtypes(include=['str']).sum(1)

If you need to select by columns, this should do:

df[['A', 'B']].sum(1)

In every case, the addition is not inplace, so if you want to persist your result, please assign it back:

r = df.sum(1)

answered Nov 25, 2017 at 2:05

cs95

406k106 gold badges744 silver badges797 bronze badges

5 Comments

LarryZ Over a year ago

Thanks, @COLDSPEED. Your solution appears promising. I tried "df.select_dtypes(include=['str']).sum(1)" but get this error below: File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2369, in select_dtypes invalidate_string_dtypes(dtypes) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 497, in invalidate_string_dtypes raise TypeError("string dtypes are not allowed, use 'object' instead") TypeError: string dtypes are not allowed, use 'object' instead

LarryZ Over a year ago

Then when I change the code to df.select_dtypes(include=['object']).sum(1), it gave no error but the result is one column with all "0". Any idea why? Thanks!

cs95 Over a year ago

@LarryZ what are your column types initially?

LarryZ Over a year ago

@COLDSPEED Thanks for the followup. A number of the columns contains mixed data type, both str and int. These columns are labeled as "object"

cs95 Over a year ago

@LarryZ Select_dtypes may not work but everything else should.

Collectives™ on Stack Overflow

Pandas: concatenate a list of columns into one column

2 Answers 2

6 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related