Drop pandas dataframe columns containing all 'nan' values

Question

I have this dataframe

       T1       T2      T3      T4     T5
0  [22.8]   [42.2]  [30.0]  [23.0]  [nan]
1  [26.4]   [56.1]  [36.7]  [25.8]  [nan]
2  [29.3]   [68.9]  [42.3]  [28.4]  [nan]
3  [32.1]   [79.7]  [47.6]  [31.3]  [nan]
4  [34.3]   [90.0]  [52.2]  [33.6]  [nan]
5  [36.1]   [99.1]  [55.8]  [35.4]  [nan]
6  [37.1]  [104.0]  [57.0]  [36.3]  [nan]
7  [37.8]  [107.0]  [58.2]  [37.2]  [nan]
8  [38.4]  [111.2]  [60.0]  [37.9]  [nan]
9   [nan]    [nan]   [nan]   [nan]  [nan]

How I get these is by uploading user selected excel files in a tkinter GUI. I want to drop columns containing all 'nan' values. The rows are fine even if they contain all 'nan'. I just want to remove the columns.

So far I've tried these commands. DFT = DFT.dropna(axis=1, how='all') to drop columns and DFT = DFT.loc[:,DFT.notna().any(axis=0)] to keep only not 'nan' values.

As you can see it returns the exact same dataframe without dropping. What could be the possible issue and how do I fix this?

Here's a minimum working example

import tkinter.filedialog
import tkinter as tk
from tkinter import ttk
from tkmacosx import Button
import pandas as pd
import numpy as np

root = tk.Tk()
root.geometry('400x400')

label_check = tk.StringVar()

def OOE():        
   pathATC = tk.filedialog.askopenfilename(filetypes = [('Excel files', '*.xls*')], title = "Select an ATC file")  
   excel_file = pd.ExcelFile(pathATC)
   sheet_names = excel_file.sheet_names
   combo = tk.StringVar()
   def selected(event):
       print(box.get()) 
       PI_ATC = pd.read_excel(pathATC, sheet_name = box.get(),usecols="C",skiprows=8, nrows=10).to_numpy().astype(float)
       POUT_ATC = pd.read_excel(pathATC, sheet_name=box.get(),usecols="I",skiprows=8, nrows=10).to_numpy().astype(float)     
       PI_ATC[PI_ATC == 0] = 'nan'
       # POUT_ATC[POUT_ATC == 0] = 'nan'                   
       if 'Post Burn-in' in box.get() or 'Mesure 2' in box.get():
           TH1_Bi = pd.read_excel(pathATC, sheet_name=box.get(),usecols="L",skiprows=8, nrows=10).to_numpy().astype(float)
           TH2_Bi = pd.read_excel(pathATC, sheet_name=box.get(),usecols="M",skiprows=8, nrows=10).to_numpy().astype(float)
           TH3_Bi = pd.read_excel(pathATC, sheet_name=box.get(),usecols="N",skiprows=8, nrows=10).to_numpy().astype(float)
           TH4_Bi = pd.read_excel(pathATC, sheet_name=box.get(),usecols="O",skiprows=8, nrows=10).to_numpy().astype(float)
           TH5_Bi = pd.read_excel(pathATC, sheet_name=box.get(),usecols="P",skiprows=8, nrows=10).to_numpy().astype(float)
            
           dat = list(zip(TH1_Bi,TH2_Bi,TH3_Bi,TH4_Bi,TH5_Bi))
           DFT = pd.DataFrame(data=dat, columns = ['T1', 'T2', 'T3','T4', 'T5'])
           DFT = DFT.dropna(axis=1, how='all')
           print(DFT)
        
       elif 'Mesure 1' in box.get():                     
           POUT_M1 = pd.read_excel(pathATC, sheet_name=box.get(),usecols="H",skiprows=8, nrows=11).to_numpy().astype(float)
           POUT_M1[POUT_M1 == 0] = 'nan'               
         
   box = ttk.Combobox(root, textvariable=combo, value =sheet_names, state='readonly')
   box.bind("<<ComboboxSelected>>",selected)
   box.pack()

xl_btn = Button(root,text="ATC",foreground='#161327',background="#707087",command=lambda:OOE())
xl_btn.pack()

label=ttk.Label(root,text=" " ,textvariable=label_check)
label.pack()
root.mainloop()

For clarification, your nan values are encapsulated as a list in each cell? — Michael Cao
– Michael Cao, Commented Jan 27, 2023 at 17:43
@MichaelCao I think so, the dataframe I posted, is the output from the console (by the print command). Not sure why it appears with brackets. Is it because how I created the dataframe? — Chat0924
– Chat0924, Commented Jan 27, 2023 at 17:46

mozway · Accepted Answer · 2023-01-27 17:44:01Z

3

Assuming your have lists in your DataFrame, use str[0] per column to get each element, then boolean indexing:

out = df.loc[:, df.apply(lambda s: s.str[0]).notna().any()]

Or:

out = df.loc[:, (df[c].str[0].notna().any() for c in df)]

Output:

       T1       T2      T3      T4
0  [22.8]   [42.2]  [30.0]  [23.0]
1  [26.4]   [56.1]  [36.7]  [25.8]
2  [29.3]   [68.9]  [42.3]  [28.4]
3  [32.1]   [79.7]  [47.6]  [31.3]
4  [34.3]   [90.0]  [52.2]  [33.6]
5  [36.1]   [99.1]  [55.8]  [35.4]
6  [37.1]  [104.0]  [57.0]  [36.3]
7  [37.8]  [107.0]  [58.2]  [37.2]
8  [38.4]  [111.2]  [60.0]  [37.9]
9   [nan]    [nan]   [nan]   [nan]

Another option is to modify your previous code to avoid having encapsulation in lists, then df.dropna(how='all', axis=1) will work.

answered Jan 27, 2023 at 17:44

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Chat0924 Over a year ago

btw how do I avoid having encapsulation in lists?

mozway Over a year ago

@Chat0924 if you don't have a good reason to use those lists, I strongly suggest to get rid of them and only have the elements, this will facilitate further processing. Pandas doesn't like lists.

mozway Over a year ago

Hard to say how to avoid it with your current code as it's not easily reproducible. What is the content of TH1_Bi? and dat?

Chat0924 Over a year ago

TH1_Bi to TH5_Bi are temperature values I read by excel columns of an uploaded excel file. I created the DFT dataframe just to access the values easily, also to drop the columns if an entire column of the excel file is empty. dat is how I created the dataset by those values.

mozway Over a year ago

Congrats, this is better ;)

|

Collectives™ on Stack Overflow

Drop pandas dataframe columns containing all 'nan' values

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related