0

I have a DataFrame with 57 columns. Columns 1 through 21 are dimensions. 22 through 57 are metrics. Column 1 is a date column. Column 21 is a bad column that is causing me to have duplicative data.

What I am looking to do is remove column 21 and then take the min of 22 to 57 when 1 through 20 are the same.

3 Answers 3

1

No reason to use groupby, you can just use drop and min

To remove a column 21 you can just use drop on the relevant column, removing it by name:

df.drop(columns="column_21_name", inplace=True)

To a select a minimum between several columns you can use min:

df["min_column"] = df.iloc[:, 22:57].min(axis=0)

(First I used iloc to select only relevant columns and then use the minimum omethod)

Maybe it should be 21:56 (if start indexing from 0), depends on how you counted. Just try and see if it is what you desired.

Afterwards you have in df a new column name "min_column" and you drop the rest of the relevant column (21 to 56)

P.S - Please follow StackOverflow Guidelines when publishing a question: You should say what you already try (instead of just asking) and give example of your dataframe (rather then talk generally about "column 20"). I decided to answer this time, but other community members may be less merciful.

Sign up to request clarification or add additional context in comments.

4 Comments

" then take the min of 22 to 57 when 1 through 20 are the same" This doesn't check if they are the same
From what I understand, "when 1 through 20 are the same" means "leave them as they are", not "do the minimum only if they are equal"
I interpreted as the other way. Guess we have to just wait for OP to clarify
@noah was correct, but I figured it out. Was stupid of me. I dropped the offending column and just did a drop duplicates. Sorry all
1

I think the following will do the trick for you. You can drop the column if you'd like (df.drop(df.columns[20], axis=1, inplace=True)), but it isn't necessary for this one calculation. Code groupbys by the first 21 columns and then takes min of columns 22 till 57 for each combination. If you decide to drop the column the iloc indexing will change. iloc[a:b] goes from a to b-1.

df.iloc[:, 21:57].groupby(df.iloc[:, :21]).min()

Comments

0

Just needed to drop the column and then drop duplicates. Sorry all.

df.drop(columns="lineItemBudget", inplace=True)

df.drop_duplicates(inplace=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.