Pandas: matching a string in series with string from another series

Question

I have a DataFrame that looks like this:

Full                          Partial
ABCDEFGHIJKLMNOPQRSTUVWXYZ    FGHIJKL
ANLHDFKNADHFBAKHFGBAKJFB      FKNADH
JABFKADFNADKHFBADHBFJDHFBADF  ABFKA

What I want to do is to put everything from Full that does NOT match Partial in lowercase, yielding the following:

Coverage
abcdef_GHIJKL_mnopqrstuvwxyz
anlhd_FKNADH_fbakhfgbakjfb
j_ABFKA_dfnadkhfbadhbfjdhfbadf

How would I do this? I looked around and it seems that series.str.extract() could be a solution, but I'm not certain as when I try to do this:

df['Full'].str.extract(data['Partial'])

... it only says that Series can't be hashable. I assume that extract only takes a single argument, rather than a Series? Is there any way to bypass this? Is extract even the correct way to achieve what I'm looking for, or is there another way? I'm thinking I could perhaps find som way to extract the string indexes and do the following pseudocode:

df['Coverage'] = data['Full'][:start].lower() + '_' + data['Partial'] + \
     '_' + data['Full'][End:].lower()

... where Start and End is the indexes for where data['Partial'] starts and ends, respectively. Thoughts?

Rutger Kassies · Accepted Answer · 2014-05-08 15:01:38Z

2

Not the most elegant perhaps, but here is one solution:

For df:

                           Full  Partial
0    ABCDEFGHIJKLMNOPQRSTUVWXYZ  FGHIJKL
1      ANLHDFKNADHFBAKHFGBAKJFB   FKNADH
2  JABFKADFNADKHFBADHBFJDHFBADF    ABFKA

This:

df.apply(lambda r: r.Full.lower().replace(r.Partial.lower(), '_' + r.Partial + '_'), axis=1)

Returns:

0      abcde_FGHIJKL_mnopqrstuvwxyz
1        anlhd_FKNADH_fbakhfgbakjfb
2    j_ABFKA_dfnadkhfbadhbfjdhfbadf

For each row, you convert the full string to lowercase, and replace the 'partial string to lower' with the original partial string with two underscores added on both sides.

answered May 8, 2014 at 15:01

Rutger Kassies

65k17 gold badges119 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

erikfas Over a year ago

Awesome, that's EXACTLY what I wanted! In what way is it not elegant, and what does the r in the lambda function stand for? (I don't know much about lambda, I'm afraid)

Collectives™ on Stack Overflow

Pandas: matching a string in series with string from another series

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related