0

I have a dataframe, it has two columns, one is called 'PTM_loc' and another one is called 'PTM_types'

data['PTM_loc']
0        24
1        24;30
2        22
3        19;20;66
4        16;30;50

data['PTM_typs']
0        S
1        S
2        T
3        S;T
4        S;Y;T

I would like to concat (or whatever you call this action) these columns together, and get a new column like this:

data['PTMs']
0        S24
1        S24;S30
2        T22
3        S19;T20;T66
4        S16;Y30;T50

Does anyone have any ideas?

2 Answers 2

2

You can use a list comprehension with a double zip/zip_longest:

from itertools import zip_longest

def combine(a, b):
    a = a.split(';')
    b = b.split(';')
    return ';'.join(map(''.join, zip_longest(a, b, fillvalue=a[-1])))

data['PTMs')] = [combine(a,b) for a,b in zip(data['PTM_types'], data['PTM_loc'])]

# or 
# from itertools import starmap
# data['PTMs'] = list(starmap(combine, zip(data['PTM_types'], data['PTM_loc'])))

Altentatively, for a pure pandas variant (just for fun, it's likely less efficient), use ffill to fill the missing combinations:

# either in rectangular form
df['PTMs'] = (data['PTM_types']
 .str.split(';', expand=True).ffill(axis=1)
 +data['PTM_loc'].str.split(';', expand=True)
).stack().groupby(level=0).agg(';'.join)

# or in long form
l = data['PTM_loc'].str.extractall('([^;]+)')
t = data['PTM_typs'].str.extractall('([^;]+)')
data['PTMs'] = (t.reindex_like(l).groupby(level=0).ffill().add(l)
                 .groupby(level=0).agg(';'.join)
                )

Output:

    PTM_loc PTM_types         PTMs
0        24         S          S24
1     24;30         S      S24;S30
2        22         S          S22
3  19;20;66       S;T  S19;T20;T66
4  16;30;50     S;Y;T  S16;Y30;T50
Sign up to request clarification or add additional context in comments.

4 Comments

there are different lengths
@jezrael good point thanks, easy to handle ;)
Hi, it only works for the element before the first semi-comma.
@RRRRRRRR have you checked the latest version of the code (refresh your page) and provide an example if this still fail
1

Something like split

s1 = data['PTM_loc'].str.split(';',expand=True)
s2 = data['PTM_typs'].str.split(';',expand=True)

data['new'] = s1.radd(s2.ffill(axis=1)).stack().groupby(level=0).agg(';'.join)
Out[20]: 
0            S24
1        S24;S30
2            T22
3    S19;T20;T66
4    S16;Y30;T50
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.