3

I have a column A which is the subset of another column B in my dataframe. I want to split the data in the parent column B with A so I get the remaining of B in a new column C. I was trying the below piece using lambda. But the inner column is A which I mention as Series can't be converted into str.

df['C'] = df['B'].apply(lambda x: x.split(df['A'],1)[1])

Example DF:

B                       A
Ford F-Series pickup    Ford
Toyota Camry            Toyoto
Ford Taurus/Taurus X    Ford

Needed Output Data column:

C
F-Series pickup
Camry
Taurus/Taurus X

Here as the df['A'] is a series, we need it as a data from each row to do the same. Is there any possibility to pass the column A data as string. Any help would be appreciated. Thanks.

3
  • Please add an example of df and your desired output. Commented Mar 20, 2017 at 1:04
  • @Craig Apologies. Added example for problem definition. Commented Mar 20, 2017 at 1:14
  • No worries, the example made it clear what you were trying to do. I hope my answer helps. Commented Mar 20, 2017 at 1:32

3 Answers 3

3

I'm assuming that you aren't just trying to remove the model name, but also remove everything before the name.

The code below accomplishes what you want:

import pandas as pd

df = pd.DataFrame( {'A':['Ford', 'Toyota', 'Ford'], 
                    'B':['Ford F-Series pickup', 'Toyota Camry', 'Ford Taurus/Taurus X']} )

df['C'] = df.apply(lambda x: x['B'].split(x['A'], 1), axis=1)['B']
print(df)

Produces:

        A                     B                 C
0    Ford  Ford F-Series pickup   F-Series pickup
1  Toyota          Toyota Camry             Camry
2    Ford  Ford Taurus/Taurus X   Taurus/Taurus X
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @Craig. Your solution is accurate. Worked. Thanks again!!
2

You can use the .replace method. Since you will create a leading whitespace after replacing the brand name with nothing, the .strip method will remove those.

import pandas as pd
df = pd.DataFrame({'A': ["Ford", "Audi"], 'B': ["Ford Pickup", "Audi R8"]})

df['C'] = df.apply(lambda x: x['B'].replace(x['A'], '').strip(), axis=1)

Comments

0

A more efficient way of doing this is by creating a list that loops through the .values of the columns you're seeking to perform string operations on.

In your example it would look like this:

df['C'] = [b.split(a)[0] for (b, a) in df[['B', 'A']].values]

To show that this is more efficient I ran the following test using one of my own datasets from google analytics:

Vector Solution
CPU times: user 30.8 ms, sys: 0 ns, total: 30.8 ms Wall time: 32.7 ms
appended_ga4_data['top_level'] = [y.split(x)[0] for (y, x) in appended_ga4_data[['Page path and screen class', 'subject']].values]

If we wanted to use the vector solution to get the second (or nth) element in the split, then we would need to include a conditional check that the list is long enough to get the index (assuming the split doesn't always apply):

Vector Solution with Conditions
CPU times: user 53.9 ms, sys: 635 µs, total: 54.5 ms Wall time: 66.4 ms
appended_ga4_data['top_level'] = [y.split(x)[1] if len(y.split(x))>1 else y.split(x)[0] for (y, x) in appended_ga4_data[['Page path and screen class', 'subject']].values]

Apply Solution
CPU times: user 245 ms, sys: 6.59 ms, total: 251 ms Wall time: 263 ms
appended_ga4_data['top_level'] = appended_ga4_data.apply(lambda x: x['Page path and screen class'].split(x['subject'])[0], axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.