0

I have a large dataframe and I want to search 144 of the columns to check if there are any negative values in them. If there is even one negative value in a column, I want to replace the whole column with np.nan. I then want to use the new version of the dataframe for later analysis.

I've tried a varied of methods but can't seem to find one that works. I think this is almost there but I can't seem to find a solution to what I'm trying to do.

clean_data_df.loc[clean_data_df.cols < 0, cols] = np.nan   #cols is a list of the column names I want to check  
null_columns=clean_data_df.columns[clean_data_df.isnull().any(axis=1)]
clean_data_df[null_columns] = np.nan

When I run the above code I get the following error: AttributeError: 'DataFrame' object has no attribute 'cols'

Thanks in advance!

2
  • Because if you are using lists, you should use clean_data_df[cols] Commented Feb 20, 2020 at 14:58
  • I've tried that too and I get a different error: "None of [Index([ ('p', 'r', 'e', 's', 's', 'u', 'r', 'e', '', '1'),\n ('p', 'r', 'e', 's', 's', 'u', 'r', 'e', '', '2'),\n [this carries on for all the column names],\n dtype='object', length=144)] are in the [index]" Commented Feb 20, 2020 at 15:05

2 Answers 2

2

You could use a loop to iterate over the columns:

for i in col:
   if df[i].isna().any():
       df[i] = np.nan

Minumum reproducible example:

df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan],'c':[1,2,3]})
for i in df:
    if df[i].isna().any():
        df[i] = np.nan
print(df)

Output:

    a   b  c
0 NaN NaN  1
1 NaN NaN  2
2 NaN NaN  3
Sign up to request clarification or add additional context in comments.

3 Comments

for such usecases , you can also take a look at loc i.e : df.loc[:,df.isna().any()] = np.nan
How does the df.isna().any() work inside the loc? I seem to be getting an AttributeError:'float' object has no attribute 'loc'
Attribute error stating what? df.loc[:,df.isna().any()] returns the first 2 columns since they are true, loc is smart in understanding a boolean list when passes after the comma i.e for columns , test with df.loc[:,[True,True,False]]
1

Idea is filter only filtered rows by cols by DataFrame.lt and DataFrame.any and then add all another columns filled by False in Series.reindex, last set values by DataFrame.loc, here first : means all rows:

df = pd.DataFrame({'a':list('abc'), 'b':[-2,-1,-3],'c':[1,2,3]})

cols = ['b','c']
df.loc[:, df[cols].lt(0).any().reindex(df.columns, fill_value=False)] = np.nan
print(df)
   a   b  c
0  a NaN  1
1  b NaN  2
2  c NaN  3

Detail:

print(df[cols].lt(0).any())
b     True
c    False
dtype: bool

print (df[cols].lt(0).any().reindex(df.columns, fill_value=False))
a    False
b     True
c    False
dtype: bool

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.