1

I have a dataframe in which few of the columns are object, and I want to change one of them into a int column so I can work with it. and do some calculation. but when ever am trying to do it am getting this error.

here's my code.

code which giving me the error.

df['Amount in USD']=df['Amount in USD'].str.replace(',', '') #this worked fine

df['Amount in USD']=df['Amount in USD'].astype(int) #but this doesn't

error

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-b9d8d4e75b08> in <module>
----> 1 df['Amount in USD']=df['Amount in USD'].astype(int)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5870         else:
   5871             # else, only a single dtype is given
-> 5872             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5873             return self._constructor(new_data).__finalize__(self, method="astype")
   5874 

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    629         self, dtype, copy: bool = False, errors: str = "raise"
    630     ) -> "BlockManager":
--> 631         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    632 
    633     def convert(

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    425                     applied = b.apply(f, **kwargs)
    426                 else:
--> 427                     applied = getattr(b, f)(**kwargs)
    428             except (TypeError, NotImplementedError):
    429                 if not ignore_failures:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    671             vals1d = values.ravel()
    672             try:
--> 673                 values = astype_nansafe(vals1d, dtype, copy=True)
    674             except (ValueError, TypeError):
    675                 # e.g. astype_nansafe can fail on object-dtype of strings

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
   1072         # work around NumPy brokenness, #1987
   1073         if np.issubdtype(dtype.type, np.integer):
-> 1074             return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
   1075 
   1076         # if we have a datetime/timedelta array of objects

pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: 'undisclosed'

info about the data frame.

0   Sr No              3044 non-null   int64 
 1   Date dd/mm/yyyy    3044 non-null   object
 2   Startup Name       3044 non-null   object
 3   Industry Vertical  2873 non-null   object
 4   SubVertical        2108 non-null   object
 5   City  Location     2864 non-null   object
 6   Investors Name     3020 non-null   object
 7   InvestmentnType    3040 non-null   object
 8   Amount in USD      2084 non-null   object
 9   Remarks            419 non-null    object

here the sample of my data frame

Sr No   Date dd/mm/yyyy Startup Name    Industry Vertical   SubVertical City Location   Investors Name  InvestmentnType Amount in USD   Remarks
0   1   09/01/2020  BYJU’S  E-Tech  E-learning  Bengaluru   Tiger Global Management Private Equity Round    20,00,00,000    NaN
1   2   13/01/2020  Shuttl  Transportation  App based shuttle service   Gurgaon Susquehanna Growth Equity   Series C    80,48,394   NaN
2   3   09/01/2020  Mamaearth   E-commerce  Retailer of baby and toddler products   Bengaluru   Sequoia Capital India   Series B    1,83,58,860 NaN
3   4   02/01/2020  https://www.wealthbucket.in/    FinTech Online Investment   New Delhi   Vinod Khatumal  Pre-series A    30,00,000   NaN
1
  • Info on your whole dataframe is not useful here - pls post info regarding your Amount in USD column; from what you show, it would certainly seem that in contains non-numeric entries (such as 'undisclosed'), hence the expected error. Commented Feb 7, 2021 at 8:46

1 Answer 1

1

There is a categorical variable instance 'undisclosed' in your df['Amount in USD'] which cannot be converted to int per se.

You need to map values that are not numeric with string type on your own, i.e.:

df['Amount in USD'] = df['Amount in USD'].replace('undisclosed', '-1')
df['Amount in USD'] = df['Amount in USD'].astype(int)

I make the assumption here, that there is no '-1' values in your df['Amount in USD'] column. You can check the unique values for that column like so:

`df['Amount in USD']`.unique()

Feel free to add those contents to your question so I can assist you further.


EDIT Bonus:

Depending on what calculations you want to perform on that column you need to carefully select the integers. There are several good guides available online:

Make sure that it also fits your domain which does look like finance to me.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks a lot, looking forward to get more help from ypu

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.