pandas: converting dataframe column to int following dataframe manipulation [duplicate]

Question

Running pandas 1.5.3. Also attempted on pandas 2.2.1.

I am loading in data from a CSV that looks like such:

888|0|TEST ACCOUNT
888|1|Sample Ship-to
802001|0|COMPANY 1
802001|1|COMPANY 1 INC
802001|2|COMPANY 1 BALL
K802001|3|COMPANY 1

With columns CUSNO, S2, and NAME, in that order.

I have a script that loads in the data, then checks the first column and makes sure it is of int64 in the resulting DataFrame. If not, the script is supposed to convert the column to numeric and drop the rows that have NaN in them.

So, before:

     CUSNO  S2            NAME
0      888   0    TEST ACCOUNT
1      888   1  Sample Ship-to
2   802001   0       COMPANY 1
3   802001   1   COMPANY 1 INC
4   802001   2  COMPANY 1 BALL
5  K802001   3       COMPANY 1

Then run:

cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce')
cl = cl.dropna(axis='index', how='any')

After:

      CUSNO  S2            NAME
0     888.0   0    TEST ACCOUNT
1     888.0   1  Sample Ship-to
2  802001.0   0       COMPANY 1
3  802001.0   1   COMPANY 1 INC
4  802001.0   2  COMPANY 1 BALL

I want to make CUSNO a column full of int64 or similar types, but when I run company_locations['CUSNO'].dtype it keeps returning float64. (Realistically, I want to get rid of the decimal point at the end of every entry in CUSNO and thought typecasting to int or similar would work best.)

I've tried a number of solutions, namely:

cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce').dropna().astype(int) # replacing the earlier line 1 of the script
cl['CUSNO'] = cl.astype({'CUSNO': 'int'})
cl['CUSNO'] = cl['CUSNO'].apply(pd.to_numeric, errors='coerce')

I've tried inplace=True for line 2 in the script above. I've also tried solutions from pandas: to_numeric for multiple columns, Change column type in pandas, and Python - pandas column type casting with "astype" is not working.

Perhaps I'm missing something here? Do I have to copy the new DataFrame to a new variable or something?

Andrej Kesely · Accepted Answer · 2024-03-21 23:21:28Z

1

I think simple (after dropping the NaNs):

df["CUSNO"] = df["CUSNO"].astype(int)
print(df)

Prints:

    CUSNO  S2            NAME
0     888   0    TEST ACCOUNT
1     888   1  Sample Ship-to
2  802001   0       COMPANY 1
3  802001   1   COMPANY 1 INC
4  802001   2  COMPANY 1 BALL

answered Mar 21, 2024 at 23:21

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

gregCubed Over a year ago

Much appreciated--I think I had tried this earlier before running the dropna and it kept throwing errors. Got it working now.

e-motta · Accepted Answer · 2024-03-21 23:22:00Z

1

When you use pd.to_numeric, it has a NaN value, which makes the entire row's dtype become float.

After you remove tha NaN, you can run this an it will be converted back into int:

cl["CUSNO"] = cl["CUSNO"].astype(int)

    CUSNO  Index     Description
0     888      0    TEST ACCOUNT
1     888      1  Sample Ship-to
2  802001      0       COMPANY 1
3  802001      1   COMPANY 1 INC
4  802001      2  COMPANY 1 BALL

answered Mar 21, 2024 at 23:22

e-motta

7,5953 gold badges10 silver badges32 bronze badges

Comments

Panda Kim · Accepted Answer · 2024-03-21 23:23:50Z

1

Code

if you want int dtype with NaN, use following code:

cl['CUSNO'] = pd.to_numeric(cl['CUSNO'], errors='coerce').astype('Int64')
cl = cl.dropna()

astype('Int64') can make int with Na

answered Mar 21, 2024 at 23:23

Panda Kim

13.7k2 gold badges8 silver badges15 bronze badges

Collectives™ on Stack Overflow

pandas: converting dataframe column to int following dataframe manipulation [duplicate]

3 Answers 3

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Linked

Related