3

I have dataframe (carsML) that looks something like this:

+-----------------+----------+--------------+
| carManufacturer | carModel |   carType    |
+-----------------+----------+--------------+
| VW              | POLO     | 1.4 TDI      |
| VW              | POLO     | POLO 1.4 TDI |
| VW              | POLO     | 1.6 TDI      |
| VW              | POLO     | 1.4          |
| VW              | POLO     | POLO 1.6 TDI |
|+-----------------+----------+--------------+

I want to iterate over rows, check weather carModel is contained in carType and if it is than remove it. So instead of having POLO 1.4 TDI it should be just 1.4 TDI.

One constraint - some carModels can be single letter long (like just 1 or A). In that case skip replacement and do nothing. Script should work only for carModels that are len(carModel)>1

So far I have:

for row in carsML.itertuples():
    if len(row.carModel) > 1:
        carsML.iloc[row.Index].carType = row.carType.replace(row.carModel,"")

But this doesn't changes anything..I don't know why..

1
  • You should include the example of carModel with length 1 as well Commented Feb 24, 2020 at 12:44

3 Answers 3

4

If I understand you well, the following one-liner could do your job:

carsML.carType = carsML.apply(lambda row: row.carType.strip(row.carModel) if len(row.carModel) > 1 else row.carType, axis=1)
Sign up to request clarification or add additional context in comments.

3 Comments

That's nice but it's not covering condition that len(carModel)>1
@Harvey, I am sorry, I have just edited my answer. Now, it should be what you are looking for.
I'll use your answer since it's much faster. +1 for all good answers. Thank you!
2

Use pandas.Series.replace with where:

# Extra row with single letter carModel:
  carManufacturer carModel       carType
0              VW     POLO       1.4 TDI
1              VW     POLO  POLO 1.4 TDI
2              VW     POLO       1.6 TDI
3              VW     POLO           1.4
4              VW     POLO  POLO 1.6 TDI
5              VW        P  POLO 1.6 TDI

df['carType'] = df['carType'].where(~df['carModel'].str.len().gt(1), 
                                    df['carType'].replace(df['carModel'], "", regex=True)).str.strip()

Output:

  carManufacturer carModel       carType
0              VW     POLO       1.4 TDI
1              VW     POLO       1.4 TDI
2              VW     POLO       1.6 TDI
3              VW     POLO           1.4
4              VW     POLO       1.6 TDI
5              VW        P  POLO 1.6 TDI

1 Comment

It's working, how come I cannot use regex=True in my expression? it's giving me error. And in your case it's working.
1

How did you declared your dataFrame? I done the test:

>>> raw_data = {
...   'carManufacturer': ['VW','VW','VW','VW','VW'],
...   'carModel': ['POLO','POLO','POLO','POLO','POLO'],
...   'carType': ['1.4 TDI', 'POLO 1.4 TDI', '1.6 TDI', '1.4', 'POLO 1.6 TDI']
>>> df = pd.DataFrame(raw_data, columns=["carManufacturer", "carModel", "carType"])
>>> df
  carManufacturer carModel       carType
0              VW     POLO       1.4 TDI
1              VW     POLO  POLO 1.4 TDI
2              VW     POLO       1.6 TDI
3              VW     POLO           1.4
4              VW     POLO  POLO 1.6 TDI

after I done:

>>> for row in df.itertuples():
...   if len(row.carModel) > 1:
...      df.iloc[row.Index].carType = row.carType.replace(row.carModel,"")
...
>>> df

And that is the result:

>>> df
  carManufacturer carModel   carType
0              VW     POLO   1.4 TDI
1              VW     POLO   1.4 TDI
2              VW     POLO   1.6 TDI
3              VW     POLO       1.4
4              VW     POLO   1.6 TDI

It works perfectly.

1 Comment

I guess my testing was wrong and my code is ok. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.