Pandas iterate over rows and remove string value in one column from another

Question

I have dataframe (carsML) that looks something like this:

+-----------------+----------+--------------+
| carManufacturer | carModel |   carType    |
+-----------------+----------+--------------+
| VW              | POLO     | 1.4 TDI      |
| VW              | POLO     | POLO 1.4 TDI |
| VW              | POLO     | 1.6 TDI      |
| VW              | POLO     | 1.4          |
| VW              | POLO     | POLO 1.6 TDI |
|+-----------------+----------+--------------+

I want to iterate over rows, check weather carModel is contained in carType and if it is than remove it. So instead of having POLO 1.4 TDI it should be just 1.4 TDI.

One constraint - some carModels can be single letter long (like just 1 or A). In that case skip replacement and do nothing. Script should work only for carModels that are len(carModel)>1

So far I have:

for row in carsML.itertuples():
    if len(row.carModel) > 1:
        carsML.iloc[row.Index].carType = row.carType.replace(row.carModel,"")

But this doesn't changes anything..I don't know why..

You should include the example of carModel with length 1 as well — Erfan
– Erfan, Commented Feb 24, 2020 at 12:44

Jaroslav Bezděk · Accepted Answer · 2020-02-24 12:53:42Z

4

If I understand you well, the following one-liner could do your job:

carsML.carType = carsML.apply(lambda row: row.carType.strip(row.carModel) if len(row.carModel) > 1 else row.carType, axis=1)

edited Feb 24, 2020 at 12:53

answered Feb 24, 2020 at 12:50

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Hrvoje Over a year ago

That's nice but it's not covering condition that len(carModel)>1

Jaroslav Bezděk Over a year ago

@Harvey, I am sorry, I have just edited my answer. Now, it should be what you are looking for.

Hrvoje Over a year ago

I'll use your answer since it's much faster. +1 for all good answers. Thank you!

Chris · Accepted Answer · 2020-02-24 12:46:20Z

2

Use pandas.Series.replace with where:

# Extra row with single letter carModel:
  carManufacturer carModel       carType
0              VW     POLO       1.4 TDI
1              VW     POLO  POLO 1.4 TDI
2              VW     POLO       1.6 TDI
3              VW     POLO           1.4
4              VW     POLO  POLO 1.6 TDI
5              VW        P  POLO 1.6 TDI

df['carType'] = df['carType'].where(~df['carModel'].str.len().gt(1), 
                                    df['carType'].replace(df['carModel'], "", regex=True)).str.strip()

Output:

  carManufacturer carModel       carType
0              VW     POLO       1.4 TDI
1              VW     POLO       1.4 TDI
2              VW     POLO       1.6 TDI
3              VW     POLO           1.4
4              VW     POLO       1.6 TDI
5              VW        P  POLO 1.6 TDI

answered Feb 24, 2020 at 12:46

Chris

29.8k3 gold badges34 silver badges56 bronze badges

1 Comment

Hrvoje Over a year ago

It's working, how come I cannot use regex=True in my expression? it's giving me error. And in your case it's working.

Vahram Danielyan · Accepted Answer · 2020-02-24 13:28:33Z

1

How did you declared your dataFrame? I done the test:

>>> raw_data = {
...   'carManufacturer': ['VW','VW','VW','VW','VW'],
...   'carModel': ['POLO','POLO','POLO','POLO','POLO'],
...   'carType': ['1.4 TDI', 'POLO 1.4 TDI', '1.6 TDI', '1.4', 'POLO 1.6 TDI']
>>> df = pd.DataFrame(raw_data, columns=["carManufacturer", "carModel", "carType"])
>>> df
  carManufacturer carModel       carType
0              VW     POLO       1.4 TDI
1              VW     POLO  POLO 1.4 TDI
2              VW     POLO       1.6 TDI
3              VW     POLO           1.4
4              VW     POLO  POLO 1.6 TDI

after I done:

>>> for row in df.itertuples():
...   if len(row.carModel) > 1:
...      df.iloc[row.Index].carType = row.carType.replace(row.carModel,"")
...
>>> df

And that is the result:

>>> df
  carManufacturer carModel   carType
0              VW     POLO   1.4 TDI
1              VW     POLO   1.4 TDI
2              VW     POLO   1.6 TDI
3              VW     POLO       1.4
4              VW     POLO   1.6 TDI

It works perfectly.

answered Feb 24, 2020 at 13:28

Vahram Danielyan

3094 silver badges13 bronze badges

1 Comment

Hrvoje Over a year ago

I guess my testing was wrong and my code is ok. Thank you!

Collectives™ on Stack Overflow

Pandas iterate over rows and remove string value in one column from another

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related