Running pandas 1.5.3. Also attempted on pandas 2.2.1.
I am loading in data from a CSV that looks like such:
888|0|TEST ACCOUNT
888|1|Sample Ship-to
802001|0|COMPANY 1
802001|1|COMPANY 1 INC
802001|2|COMPANY 1 BALL
K802001|3|COMPANY 1
With columns CUSNO, S2, and NAME, in that order.
I have a script that loads in the data, then checks the first column and makes sure it is of int64 in the resulting DataFrame. If not, the script is supposed to convert the column to numeric and drop the rows that have NaN in them.
So, before:
CUSNO S2 NAME
0 888 0 TEST ACCOUNT
1 888 1 Sample Ship-to
2 802001 0 COMPANY 1
3 802001 1 COMPANY 1 INC
4 802001 2 COMPANY 1 BALL
5 K802001 3 COMPANY 1
Then run:
cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce')
cl = cl.dropna(axis='index', how='any')
After:
CUSNO S2 NAME
0 888.0 0 TEST ACCOUNT
1 888.0 1 Sample Ship-to
2 802001.0 0 COMPANY 1
3 802001.0 1 COMPANY 1 INC
4 802001.0 2 COMPANY 1 BALL
I want to make CUSNO a column full of int64 or similar types, but when I run company_locations['CUSNO'].dtype it keeps returning float64. (Realistically, I want to get rid of the decimal point at the end of every entry in CUSNO and thought typecasting to int or similar would work best.)
I've tried a number of solutions, namely:
cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce').dropna().astype(int) # replacing the earlier line 1 of the script
cl['CUSNO'] = cl.astype({'CUSNO': 'int'})
cl['CUSNO'] = cl['CUSNO'].apply(pd.to_numeric, errors='coerce')
I've tried inplace=True for line 2 in the script above. I've also tried solutions from pandas: to_numeric for multiple columns, Change column type in pandas, and Python - pandas column type casting with "astype" is not working.
Perhaps I'm missing something here? Do I have to copy the new DataFrame to a new variable or something?