4

I want to use python pandas to drop rows in a spreadsheet that do not contain "CAT" within the "Equipment" column. Here is what I have so far:

import pandas as pd
import openpyxl
import warnings
import xlrd

warnings.filterwarnings("ignore")

file_name = input("Enter File Name: ")
df = pd.read_csv(file_name)


for i in range(len(df["Equipment"])):
    if "CAT" in df["Equipment"][i]:
        print (df["Equipment"][i])
    else:
        df.drop([i])

df.to_excel("new_mp.xlsx")

The script prints out the correct values in the terminal but does not drop the rows in the new spreadsheet "new_mp.xlsx". Can someone point me in the right direction?

Example Data:

Equipment
CAT 259B3 1818 OE Skid Steer 2011 CAT
T-14 Towmaster Trailer 3124 OE Trailer 2008
CAT 938M Loader RPO RENTAL 2017 CAT 938M
Rental Water Truck 55571 Rental Water Truck international water truck

2 Answers 2

9

You don't need a loop here; you can do this with str.contains:

v = df[df["Equipment"].str.contains('CAT')]
print(v)
                                  Equipment
0     CAT 259B3 1818 OE Skid Steer 2011 CAT
2  CAT 938M Loader RPO RENTAL 2017 CAT 938M

Or, use query with engine='python':

v = df.query('Equipment.str.contains("CAT")', engine='python')
print(v)
                                  Equipment
0     CAT 259B3 1818 OE Skid Steer 2011 CAT
2  CAT 938M Loader RPO RENTAL 2017 CAT 938M

Finally, write to disk:

v.to_excel("new_mp.xlsx")
Sign up to request clarification or add additional context in comments.

Comments

4

You can use this to filter the rows.

df[df['Equipment'].apply(lambda x: 'CAT' in x)].to_excel("new_mp.xlsx")

Edit:

Alright, let me explain this inside out :

lambda x: 'CAT' in x takes as input x and returns True if 'CAT' is in x.

Now the df['Equipment'].apply method will invoke the lambda function mentioned above on every value of the series. The result will be a boolean array which is then passed on to the df as boolean mask.

Hope I didn't overdo it.

3 Comments

I don't believe this accounts for when 'CAT' is contained in the cell value, i.e., it only considers when the entire cell value equals 'CAT'; can you modify your answer?
Yea, missed the contains part. I've updated the answer.
This worked! would you mind giving me a brief explanation as I am trying to learn more and use it on some other projects

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.