0

I want to delete any rows including specific string in dataframe.

I want to delete data rows with abnormal email address (with .jpg)

Here's my code, what's wrong with it?

df = pd.DataFrame({'email':['[email protected]', '[email protected]', '[email protected]', '[email protected]']})

df

             email
0    [email protected]
1    [email protected]
2       [email protected]
3  [email protected]

for i, r in df.iterrows():
    if df.loc[i,'email'][-3:] == 'com':
        df.drop(df.index[i], inplace=True) 

Traceback (most recent call last):

  File "<ipython-input-84-4f12d22e5e4c>", line 2, in <module>
    if df.loc[i,'email'][-3:] == 'com':

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1472, in __getitem__
    return self._getitem_tuple(key)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 870, in _getitem_tuple
    return self._getitem_lowerdim(tup)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 998, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1911, in _getitem_axis
    self._validate_key(key, axis)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1798, in _validate_key
    error()

  File "C:\Anaconda\lib\site-packages\pandas\core\indexing.py", line 1785, in error
    axis=self.obj._get_axis_name(axis)))

KeyError: 'the label [2] is not in the [index]'

1 Answer 1

1

IIUC, you can do this rather than iterating through your frame with iterrows:

df = df[df.email.str.endswith('.com')]

which returns:

>>> df
             email
0    [email protected]
1    [email protected]
3  [email protected]

Or, for larger dataframes, it's sometimes faster to not use the str methods provided by pandas, but just to do it in a plain list comprehension with python's built in string methods:

df = df[[i.endswith('.com') for i in df.email]]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! I will try this method. Anyway, what's the problem with my code?
Besides the fact that iterrows is kind of slow and clunky, not much. It would work if your replaced == with != and df.drop(df.index[i], inplace=True) with df.drop(i, inplace=True)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.