I am looking for help on how to simplify my code. The DataFrame is >100k rows and could have multiple columns that contain a mix of strings and integers. Here is an example df:
data = {
"Area_1": [0, 100, 200, 0],
"Area_2": [0, 0, 100, 100],
"Area_3": [0, 0, 0, 100],
"id": ["gene_x", "gene_y", "gene_z", "gene_i"],
}
df = pd.DataFrame(data, columns=["id", "Area_1", "Area_2", "Area_3"])
Here is the code I thought was simplifying a chunky amount of code that worked but was only able to handle 3 columns. I now want to accept any number of columns and filter rows if all columns contain the integer 0.
Expected output: everything in the DataFrame but the row containing gene_x.
Current code:
cut=r'^Area'
blade = df.columns.str.contains(cut)
df[(df.loc[:,blade] > 0).any(axis=1)]
Currently, this code executes without error but returns the df without filtering as expected. My expectation would be the removal of the any rows which do not contain a value >0
