How to cancat one column values into another column pandas?

Question

I have a dataframe, it has two columns, one is called 'PTM_loc' and another one is called 'PTM_types'

data['PTM_loc']
0        24
1        24;30
2        22
3        19;20;66
4        16;30;50

data['PTM_typs']
0        S
1        S
2        T
3        S;T
4        S;Y;T

I would like to concat (or whatever you call this action) these columns together, and get a new column like this:

data['PTMs']
0        S24
1        S24;S30
2        T22
3        S19;T20;T66
4        S16;Y30;T50

Does anyone have any ideas?

mozway · Accepted Answer · 2023-06-26 13:14:57Z

2

You can use a list comprehension with a double zip/zip_longest:

from itertools import zip_longest

def combine(a, b):
    a = a.split(';')
    b = b.split(';')
    return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))

data['PTMs')] = [combine(a,b) for a,b in zip(data['PTM_types'], data['PTM_loc'])]

# or 
# from itertools import starmap
# data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))

Altentatively, for a pure pandas variant (just for fun, it's likely less efficient), use ffill to fill the missing combinations:

# either in rectangular form
df['PTMs'] = (data['PTM_types']
 .str.split(';', expand=True).ffill(axis=1)
 +data['PTM_loc'].str.split(';', expand=True)
).stack().groupby(level=0).agg(';'.join)

# or in long form
l = data['PTM_loc'].str.extractall('([^;]+)')
t = data['PTM_typs'].str.extractall('([^;]+)')
data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
                 .groupby(level=0).agg(';'.join)
                )

Output:

    PTM_loc PTM_types         PTMs
0        24         S          S24
1     24;30         S      S24;S30
2        22         S          S22
3  19;20;66       S;T  S19;T20;T66
4  16;30;50     S;Y;T  S16;Y30;T50

edited Jun 26, 2023 at 13:14

answered Jun 26, 2023 at 12:49

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

there are different lengths

mozway Over a year ago

@jezrael good point thanks, easy to handle ;)

RRRRRRRR Over a year ago

Hi, it only works for the element before the first semi-comma.

mozway Over a year ago

@RRRRRRRR have you checked the latest version of the code (refresh your page) and provide an example if this still fail

BENY · Accepted Answer · 2023-06-26 13:36:55Z

1

Something like split

s1 = data['PTM_loc'].str.split(';',expand=True)
s2 = data['PTM_typs'].str.split(';',expand=True)

data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
Out[20]: 
0            S24
1        S24;S30
2            T22
3    S19;T20;T66
4    S16;Y30;T50
dtype: object

answered Jun 26, 2023 at 13:36

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

How to cancat one column values into another column pandas?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related