Pandas DataFrame: How to remove column and perform calculations on select columns

Question

I have a DataFrame with 57 columns. Columns 1 through 21 are dimensions. 22 through 57 are metrics. Column 1 is a date column. Column 21 is a bad column that is causing me to have duplicative data.

What I am looking to do is remove column 21 and then take the min of 22 to 57 when 1 through 20 are the same.

Roim · Accepted Answer · 2020-10-01 20:36:23Z

1

No reason to use groupby, you can just use drop and min

To remove a column 21 you can just use drop on the relevant column, removing it by name:

df.drop(columns="column_21_name", inplace=True)

To a select a minimum between several columns you can use min:

df["min_column"] = df.iloc[:, 22:57].min(axis=0)

(First I used iloc to select only relevant columns and then use the minimum omethod)

Maybe it should be 21:56 (if start indexing from 0), depends on how you counted. Just try and see if it is what you desired.

Afterwards you have in df a new column name "min_column" and you drop the rest of the relevant column (21 to 56)

P.S - Please follow StackOverflow Guidelines when publishing a question: You should say what you already try (instead of just asking) and give example of your dataframe (rather then talk generally about "column 20"). I decided to answer this time, but other community members may be less merciful.

answered Oct 1, 2020 at 20:36

Roim

3,0762 gold badges14 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

noah Over a year ago

" then take the min of 22 to 57 when 1 through 20 are the same" This doesn't check if they are the same

Roim Over a year ago

From what I understand, "when 1 through 20 are the same" means "leave them as they are", not "do the minimum only if they are equal"

noah Over a year ago

I interpreted as the other way. Guess we have to just wait for OP to clarify

Joe Fedorowicz Over a year ago

@noah was correct, but I figured it out. Was stupid of me. I dropped the offending column and just did a drop duplicates. Sorry all

noah · Accepted Answer · 2020-10-01 20:41:48Z

1

I think the following will do the trick for you. You can drop the column if you'd like (df.drop(df.columns[20], axis=1, inplace=True)), but it isn't necessary for this one calculation. Code groupbys by the first 21 columns and then takes min of columns 22 till 57 for each combination. If you decide to drop the column the iloc indexing will change. iloc[a:b] goes from a to b-1.

df.iloc[:, 21:57].groupby(df.iloc[:, :21]).min()

answered Oct 1, 2020 at 20:41

noah

2,79615 silver badges29 bronze badges

Comments

Joe Fedorowicz · Accepted Answer · 2020-10-01 20:43:50Z

0

Just needed to drop the column and then drop duplicates. Sorry all.

df.drop(columns="lineItemBudget", inplace=True)

df.drop_duplicates(inplace=True)

answered Oct 1, 2020 at 20:43

Joe Fedorowicz

8252 gold badges8 silver badges16 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame: How to remove column and perform calculations on select columns

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related