0

Given this dataframe:

    HOUSEID   PERSONID  HHSTATE  TRPMILES
0   20000017    1         IN    22.000000
1   20000017    1         IN    0.222222
2   20000017    1         IN    22.000000
3   20000017    2         IN    22.000000
4   20000017    2         IN    0.222222
5   20000017    2         IN    0.222222
6   20000231    1         TX    3.000000
7   20000231    1         TX    2.000000
8   20000231    1         TX    6.000000
9   20000231    1         TX    5.000000

I want to normalize TRPMILES based on the max value of HHSTATE:

        HOUSEID  PERSONID  HHSTATE  TRPMILES
    0   20000017    1         IN    1
    1   20000017    1         IN    0.009999
    2   20000017    1         IN    1
    3   20000017    2         IN    1
    4   20000017    2         IN    0.009999
    5   20000017    2         IN    0.009999
    6   20000231    1         TX    0.500000
    7   20000231    1         TX    0.333333
    8   20000231    1         TX    1
    9   20000231    1         TX    0.833333

Here is what I have tried:

df=df.div(df['TRPMILES'].max(level=[2]),level=2).reset_index()

I have a million rows with 50 different values for HHSTATE. can you give any hints?

1 Answer 1

1

I think the following will work for you:

df["max_trpmiles"] = df.groupby("HHSTATE")["TRPMILES"].transform("max")
df["TRPMILES"] /= df["max_trpmiles"]
df = df.drop("max_trpmiles", axis=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.