Pandas error when filtering rows based on multiple column conditionals, "ValueError: invalid literal for int() with base 10: "

Question

I am getting a 'ValueError: invalid literal for int() with base 10: ' when I try to filter the dataframe using multiple column conditions

Here is the code to set up the pandas dataframe. Warning: it'll download 6 mb of data. Can run in Google Colab if too concerned.

Code to import stuff and download the data

#Import stuff
import re
import os
import zipfile
from urllib.request import urlretrieve
from os.path import isfile, isdir
import requests
#Define Download Function
def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

#Download data
download_file_from_google_drive('1sZk3WWgdyHLru7q1KSWQwCT4nwwzHlpY', 'TheAnimeList.csv')

Code to set up the pandas dataframe

download_file_from_google_drive('1sZk3WWgdyHLru7q1KSWQwCT4nwwzHlpY', 'TheAnimeList.csv')

animeuser = pd.read_csv('TheAnimeList.csv' )
animeuser = animeuser[['anime_id','title_english', 'popularity', 'rank']]
animeuser.head()


anime_id    title_english   popularity  rank
0   11013   Inu X Boku Secret Service   231 1274.0
1   2104    My Bride is a Mermaid   366 727.0
2   5262    Shugo Chara!! Doki  1173    1508.0
3   721 Princess Tutu   916 307.0
4   12365   Bakuman.    426 50.0

I am trying to filter rows based on column conditionals. First I tried

animeuser = animeuser[  (animeuser.popularity >= 3000) | (animeuser.rank >= 3000)  ]

But that gave me this error

TypeError                                 Traceback (most recent call last)
<ipython-input-39-8fb6d8508f25> in <module>()
----> 1 animeuser = animeuser[  (animeuser.popularity >= 3000) | (animeuser.rank >= 3000)  ]

TypeError: '>=' not supported between instances of 'method' and 'int'

Then I tried

animeuser =  animeuser[ ( animeuser.astype(int)['popularity'] >= 3000 ) | ( animeuser.astype(int)['rank'] >= 3000 ) ]

But that gave me this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-a2ea65786b2a> in <module>()
----> 1 animeuser =  animeuser[ ( animeuser.astype(int)['popularity'] >= 3000 ) | ( animeuser.astype(int)['rank'] >= 3000 ) ]

/usr/local/lib/python3.6/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    116                 else:
    117                     kwargs[new_arg_name] = new_arg_value
--> 118             return func(*args, **kwargs)
    119         return wrapper
    120     return _deprecate_kwarg

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
   4002         # else, only a single dtype is given
   4003         new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 4004                                      **kwargs)
   4005         return self._constructor(new_data).__finalize__(self)
   4006 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in astype(self, dtype, **kwargs)
   3460 
   3461     def astype(self, dtype, **kwargs):
-> 3462         return self.apply('astype', dtype=dtype, **kwargs)
   3463 
   3464     def convert(self, **kwargs):

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3327 
   3328             kwargs['mgr'] = self
-> 3329             applied = getattr(b, f)(**kwargs)
   3330             result_blocks = _extend_blocks(applied, result_blocks)
   3331 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in astype(self, dtype, copy, errors, values, **kwargs)
    542     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    543         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 544                             **kwargs)
    545 
    546     def _astype(self, dtype, copy=False, errors='raise', values=None,

/usr/local/lib/python3.6/dist-packages/pandas/core/internals.py in _astype(self, dtype, copy, errors, values, klass, mgr, **kwargs)
    623 
    624                 # _astype_nansafe works fine with 1-d only
--> 625                 values = astype_nansafe(values.ravel(), dtype, copy=True)
    626                 values = values.reshape(self.shape)
    627 

/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy)
    690     elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
    691         # work around NumPy brokenness, #1987
--> 692         return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
    693 
    694     if dtype.name in ("datetime64", "timedelta64"):

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

pandas/_libs/src/util.pxd in util.set_value_at_unsafe()

ValueError: invalid literal for int() with base 10: 'Inu X Boku Secret Service'

The string 'Inu X Boku Secret Service' belongs to the 'title_english' column in the very first row of the dataframe. But the 'rank' and 'popularity' columns see to be float and ints.

I even tried looking at the datatypes

animeuser.dtypes

anime_id           int64
title_english     object
popularity         int64
rank             float64
dtype: object

And everything seems to be in order.

a_guest · Accepted Answer · 2018-09-26 06:17:39Z

1

The first error you are facing is because rank is a method of pandas.DataFrame. Methods have precedence over column access via attribute notation. So in order to access the data you need to use bracket notation: animeuser['rank'].

The second error occurs because you try to represent the whole data frame as int which is not possible for various columns. This would only be possible for the 'rank' and 'popularity' columns.

answered Sep 26, 2018 at 6:17

a_guest

36.7k15 gold badges75 silver badges137 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

kvorobiev · Accepted Answer · 2018-09-26 06:16:57Z

1

With statement

animeuser.astype(int)['popularity']

you trying to convert to int all animeuser columns. And got an error on string column. Try just

animeuser['popularity']

answered Sep 26, 2018 at 6:16

kvorobiev

5,0804 gold badges32 silver badges36 bronze badges

Collectives™ on Stack Overflow

Pandas error when filtering rows based on multiple column conditionals, "ValueError: invalid literal for int() with base 10: "

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related