0

I am getting a 'ValueError: invalid literal for int() with base 10: ' when I try to filter the dataframe using multiple column conditions

Here is the code to set up the pandas dataframe. Warning: it'll download 6 mb of data. Can run in Google Colab if too concerned.

Code to import stuff and download the data

#Import stuff
import re
import os
import zipfile
from urllib.request import urlretrieve
from os.path import isfile, isdir
import requests
#Define Download Function
def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

#Download data
download_file_from_google_drive('1sZk3WWgdyHLru7q1KSWQwCT4nwwzHlpY', 'TheAnimeList.csv')

Code to set up the pandas dataframe

download_file_from_google_drive('1sZk3WWgdyHLru7q1KSWQwCT4nwwzHlpY', 'TheAnimeList.csv')

animeuser = pd.read_csv('TheAnimeList.csv' )
animeuser = animeuser[['anime_id','title_english', 'popularity', 'rank']]
animeuser.head()


anime_id    title_english   popularity  rank
0   11013   Inu X Boku Secret Service   231 1274.0
1   2104    My Bride is a Mermaid   366 727.0
2   5262    Shugo Chara!! Doki  1173    1508.0
3   721 Princess Tutu   916 307.0
4   12365   Bakuman.    426 50.0

I am trying to filter rows based on column conditionals. First I tried

animeuser = animeuser[  (animeuser.popularity >= 3000) | (animeuser.rank >= 3000)  ]

But that gave me this error

TypeError                                 Traceback (most recent call last)
<ipython-input-39-8fb6d8508f25> in <module>()
----> 1 animeuser = animeuser[  (animeuser.popularity >= 3000) | (animeuser.rank >= 3000)  ]

TypeError: '>=' not supported between instances of 'method' and 'int'

Then I tried

animeuser =  animeuser[ ( animeuser.astype(int)['popularity'] >= 3000 ) | ( animeuser.astype(int)['rank'] >= 3000 ) ] 

But that gave me this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-a2ea65786b2a> in <module>()
----> 1 animeuser =  animeuser[ ( animeuser.astype(int)['popularity'] >= 3000 ) | ( animeuser.astype(int)['rank'] >= 3000 ) ]

/usr/local/lib/python3.6/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
   4002         # else, only a single dtype is given
   4003         new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 4004                                      **kwargs)
   4005         return self._constructor(new_data).__finalize__(self)
   4006 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in astype(self, dtype, **kwargs)
   3460 
   3461     def astype(self, dtype, **kwargs):
-> 3462         return self.apply('astype', dtype=dtype, **kwargs)
   3463 
   3464     def convert(self, **kwargs):

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3327 
   3328             kwargs['mgr'] = self
-> 3329             applied = getattr(b, f)(**kwargs)
   3330             result_blocks = _extend_blocks(applied, result_blocks)
   3331 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in astype(self, dtype, copy, errors, values, **kwargs)
    542     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    543         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 544                             **kwargs)
    545 
    546     def _astype(self, dtype, copy=False, errors='raise', values=None,

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, **kwargs)
    623 
    624                 # _astype_nansafe works fine with 1-d only
--> 625                 values = astype_nansafe(values.ravel(), dtype, copy=True)
    626                 values = values.reshape(self.shape)
    627 

/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy)
    690     elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
    691         # work around NumPy brokenness, #1987
--> 692         return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
    693 
    694     if dtype.name in ("datetime64", "timedelta64"):

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

pandas/_libs/src/util.pxd in util.set_value_at_unsafe()

ValueError: invalid literal for int() with base 10: 'Inu X Boku Secret Service'

The string 'Inu X Boku Secret Service' belongs to the 'title_english' column in the very first row of the dataframe. But the 'rank' and 'popularity' columns see to be float and ints.

I even tried looking at the datatypes

animeuser.dtypes

anime_id           int64
title_english     object
popularity         int64
rank             float64
dtype: object

And everything seems to be in order.

2 Answers 2

1

The first error you are facing is because rank is a method of pandas.DataFrame. Methods have precedence over column access via attribute notation. So in order to access the data you need to use bracket notation: animeuser['rank'].

The second error occurs because you try to represent the whole data frame as int which is not possible for various columns. This would only be possible for the 'rank' and 'popularity' columns.

Sign up to request clarification or add additional context in comments.

Comments

1

With statement

animeuser.astype(int)['popularity']

you trying to convert to int all animeuser columns. And got an error on string column. Try just

animeuser['popularity']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.