2

Running pandas 1.5.3. Also attempted on pandas 2.2.1.

I am loading in data from a CSV that looks like such:

888|0|TEST ACCOUNT
888|1|Sample Ship-to
802001|0|COMPANY 1
802001|1|COMPANY 1 INC
802001|2|COMPANY 1 BALL
K802001|3|COMPANY 1

With columns CUSNO, S2, and NAME, in that order.

I have a script that loads in the data, then checks the first column and makes sure it is of int64 in the resulting DataFrame. If not, the script is supposed to convert the column to numeric and drop the rows that have NaN in them.

So, before:

     CUSNO  S2            NAME
0      888   0    TEST ACCOUNT
1      888   1  Sample Ship-to
2   802001   0       COMPANY 1
3   802001   1   COMPANY 1 INC
4   802001   2  COMPANY 1 BALL
5  K802001   3       COMPANY 1

Then run:

cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce')
cl = cl.dropna(axis='index', how='any')

After:

      CUSNO  S2            NAME
0     888.0   0    TEST ACCOUNT
1     888.0   1  Sample Ship-to
2  802001.0   0       COMPANY 1
3  802001.0   1   COMPANY 1 INC
4  802001.0   2  COMPANY 1 BALL

I want to make CUSNO a column full of int64 or similar types, but when I run company_locations['CUSNO'].dtype it keeps returning float64. (Realistically, I want to get rid of the decimal point at the end of every entry in CUSNO and thought typecasting to int or similar would work best.)

I've tried a number of solutions, namely:

cl['CUSNO'] = pd.to_numeric(cl.CUSNO, errors='coerce').dropna().astype(int) # replacing the earlier line 1 of the script
cl['CUSNO'] = cl.astype({'CUSNO': 'int'})
cl['CUSNO'] = cl['CUSNO'].apply(pd.to_numeric, errors='coerce')

I've tried inplace=True for line 2 in the script above. I've also tried solutions from pandas: to_numeric for multiple columns, Change column type in pandas, and Python - pandas column type casting with "astype" is not working.

Perhaps I'm missing something here? Do I have to copy the new DataFrame to a new variable or something?

0

3 Answers 3

1

I think simple (after dropping the NaNs):

df["CUSNO"] = df["CUSNO"].astype(int)
print(df)

Prints:

    CUSNO  S2            NAME
0     888   0    TEST ACCOUNT
1     888   1  Sample Ship-to
2  802001   0       COMPANY 1
3  802001   1   COMPANY 1 INC
4  802001   2  COMPANY 1 BALL
Sign up to request clarification or add additional context in comments.

1 Comment

Much appreciated--I think I had tried this earlier before running the dropna and it kept throwing errors. Got it working now.
1

When you use pd.to_numeric, it has a NaN value, which makes the entire row's dtype become float.

After you remove tha NaN, you can run this an it will be converted back into int:

cl["CUSNO"] = cl["CUSNO"].astype(int)
    CUSNO  Index     Description
0     888      0    TEST ACCOUNT
1     888      1  Sample Ship-to
2  802001      0       COMPANY 1
3  802001      1   COMPANY 1 INC
4  802001      2  COMPANY 1 BALL

Comments

1

Code

if you want int dtype with NaN, use following code:

cl['CUSNO'] = pd.to_numeric(cl['CUSNO'], errors='coerce').astype('Int64')
cl = cl.dropna()

astype('Int64') can make int with Na

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.