I have a dataframe in which few of the columns are object, and I want to change one of them into a int column so I can work with it. and do some calculation. but when ever am trying to do it am getting this error.
here's my code.
code which giving me the error.
df['Amount in USD']=df['Amount in USD'].str.replace(',', '') #this worked fine
df['Amount in USD']=df['Amount in USD'].astype(int) #but this doesn't
error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-b9d8d4e75b08> in <module>
----> 1 df['Amount in USD']=df['Amount in USD'].astype(int)
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
5870 else:
5871 # else, only a single dtype is given
-> 5872 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5873 return self._constructor(new_data).__finalize__(self, method="astype")
5874
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
629 self, dtype, copy: bool = False, errors: str = "raise"
630 ) -> "BlockManager":
--> 631 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
632
633 def convert(
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
425 applied = b.apply(f, **kwargs)
426 else:
--> 427 applied = getattr(b, f)(**kwargs)
428 except (TypeError, NotImplementedError):
429 if not ignore_failures:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
671 vals1d = values.ravel()
672 try:
--> 673 values = astype_nansafe(vals1d, dtype, copy=True)
674 except (ValueError, TypeError):
675 # e.g. astype_nansafe can fail on object-dtype of strings
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
1072 # work around NumPy brokenness, #1987
1073 if np.issubdtype(dtype.type, np.integer):
-> 1074 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
1075
1076 # if we have a datetime/timedelta array of objects
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: 'undisclosed'
info about the data frame.
0 Sr No 3044 non-null int64
1 Date dd/mm/yyyy 3044 non-null object
2 Startup Name 3044 non-null object
3 Industry Vertical 2873 non-null object
4 SubVertical 2108 non-null object
5 City Location 2864 non-null object
6 Investors Name 3020 non-null object
7 InvestmentnType 3040 non-null object
8 Amount in USD 2084 non-null object
9 Remarks 419 non-null object
here the sample of my data frame
Sr No Date dd/mm/yyyy Startup Name Industry Vertical SubVertical City Location Investors Name InvestmentnType Amount in USD Remarks
0 1 09/01/2020 BYJU’S E-Tech E-learning Bengaluru Tiger Global Management Private Equity Round 20,00,00,000 NaN
1 2 13/01/2020 Shuttl Transportation App based shuttle service Gurgaon Susquehanna Growth Equity Series C 80,48,394 NaN
2 3 09/01/2020 Mamaearth E-commerce Retailer of baby and toddler products Bengaluru Sequoia Capital India Series B 1,83,58,860 NaN
3 4 02/01/2020 https://www.wealthbucket.in/ FinTech Online Investment New Delhi Vinod Khatumal Pre-series A 30,00,000 NaN
Amount in USDcolumn; from what you show, it would certainly seem that in contains non-numeric entries (such as'undisclosed'), hence the expected error.