0

I have a raw dataframe that looks like

        codcet  placa_encoded   date   time_seconds velocidade
5031 490191222  431.0      2021-03-11   70079.0      51
5032 490221211  431.0      2021-03-11   72219.0      55
7991 490361213  562.0      2021-03-11   28559.0      24
7992 490361232  562.0      2021-03-11   29102.0      29              
7993 490361221  562.0      2021-03-11   30183.0      33                 
...

Where the numbers on the far left are indexes from the original dataset. My goal is to convert this into a dataframe indexed by placa_encoded and by n, a counter within each group that then looks like

placa_encoded  n    time_seconds  velocidade            codcet    
431.0          0      70079.0           51              490191222   
431.0          1      72219.0           55              490221211   
562.0          0      28559.0           24              490361213   
562.0          1      29102.0           29              490361232
562.0          2      30183.0           33              490361221   

That is, I aim to groupby('placa_encoded') then add another column n that counts the position within each group. The row should be indexed by both placa_encoded and n. I think I can use cumcount() to do this but it's unclear to me how to add it as a column since groupby doesn't product a dataframe I can assign to. I looked at this question but it seems they use .count() to convert it to a dataframe, and I want to preserve the data instead of getting any counts. I also tried to use pd.DataFrame(gbplaca) and pd.DataFrame(gbplaca.groups) to no avail.

Thank you so much!

2 Answers 2

1

I think this is what you are wanting

df['n'] = df.sort_values(['time_seconds'], ascending=True).groupby(['placa_encoded']).cumcount()
df = df.set_index(['placa_encoded', 'n'])
df

This is a multi-index with 'placa_encoded' and 'n'

Sign up to request clarification or add additional context in comments.

5 Comments

I updated my answer to make it a multi-index. Is that what you were looking for?
Yes, thanks! In general, should we always operate on the original dataframe and use groupby.agg() type methods to create new columns we can add to the original dataframe? What if I had wanted to make a dataframe where the rows were grouped by/clumped by, say, velocidade, instead?
Its difficult to say as that is kind of a general question, every situation is different so its outside the bounds of a SO comment (not trying to be evasive just a kind of big question) in this situation though if you wanted to do velocidade instead you would just replace the placa_encoded' from the code above...does that answer your question (or at least sort of)
Well, the original dataframe already has rows grouped by placa_encoded, but not by velocidad. That's what I'm asking -- how would we have done the same if the dataframe wasn't grouped by placa_encoded. We can't just run groupby('placa_encoded') then the same commands, because groupby doesn't return a dataframe we can run set_index on, but instead a DataFrameGroupBy object.
Oh so you got a dataframe doing a groupby on velocidad then from that dataframe you did another group by using placa_encoded?
0

All you have to do is to set the column in df (named 'n') with the result from df.groupby('placa_encoded').cumcount():

    df.loc[:, 'n'] = df.groupby('placa_encoded').cumcount()
    print(df)
           codcet  placa_encoded        date  time_seconds  velocidade  n
     0  490191222          431.0  2021-03-11       70079.0          51  0
     1  490221211          431.0  2021-03-11       72219.0          55  1
     2  490361213          562.0  2021-03-11       28559.0          24  0
     3  490361232          562.0  2021-03-11       29102.0          29  1
     4  490361221          562.0  2021-03-11       30183.0          33  2

Then, you can set the multi-index of df to ['placa_encoded', 'n'] if you want:

    df = df.set_index(['placa_encoded', 'index'])
    print(df)
                         codcet        date  time_seconds  velocidade
     placa_encoded n                                                 
     431.0         0  490191222  2021-03-11       70079.0          51
                   1  490221211  2021-03-11       72219.0          55
     562.0         0  490361213  2021-03-11       28559.0          24
                   1  490361232  2021-03-11       29102.0          29
                   2  490361221  2021-03-11       30183.0          33

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.