1

I'm trying to read a csv file which has 'NA' value as data. When I use the 'keep_default_na = false' option to keep NA values it's impacting other columns with empty rows.

The Data: csv pic

 colA  colB  colC             
'abc' ,    ,  NA
'ljk' , 10 ,  'Paris' 
'xyz' , 25 ,  NA

Here, I want to keep NA values in column 'colC'. I'm reading the csv like this.

DF = pandas.read_csv(csv, keep_default_na=False)

Now I can see NA values are being present in DF, but the values in the second column 'colB' are present as string ('10','25'), not as numbers.

This is happening if there is an empty row in a column with numeric values.

How can I apply 'keep_default_na= False' and still read other values in the same dType?

12
  • 1
    do you want them as float or object? your title is confusing Commented May 3, 2024 at 9:30
  • I want to read them as float or number. Not as objects. Commented May 3, 2024 at 9:54
  • then why is your title about conversion to object? Commented May 3, 2024 at 9:56
  • Changed the title. Commented May 3, 2024 at 10:19
  • Then why do you use keep_default_na=False? Commented May 3, 2024 at 10:30

1 Answer 1

1

Just do :

import pandas as pd

df = pd.read_csv(csv, keep_default_na=False)
df['colB'] = pd.to_numeric(df['colB'], errors='coerce')

errors='coerce' will ensure that NA string values are translated to NaNs for numeric processing. If you don't want NaNs but rather keep them as 0 or another value, you can fill these NaNs using fillna().

Here is the output with your data :

  colA  colB    colC
0  abc    NaN     NA
1  ljk   10.0  Paris
2  xyz   25.0     NA

[edit for automatic detection]

Just try to do to the to_numeric() approach on each column. If there is any non-NaN value post conversion, it indicates that it is numeric and we replace the column by the converted one.

for col in df.columns:
    temp_col = pd.to_numeric(df[col], errors='coerce')
    
    if temp_col.notna().any():
        df[col] = temp_col
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, what if it's a large csv file with multiple numeric columns with few empty rows. It's impossible to apply errors='coerce' to all the affected columns.
you can define numeric_columns = ['colB', 'colD', 'colG'] and then have a for loop over numeric_columns items and use df[col] = pd.to_numeric(df[col], errors='coerce').
I'm looking for a common solution. This code should work without knowing the columns names. I can't keep creating a new list for each csv file.
@Suri_362 but you know that ColC should be excluded, by name?
i updated my answer for automated detection

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.