2

Okay this is tricky. I have a pandas dataframe and I am dealing with machine log data. I have an index in the data, but this dataframe has various jobs in it. I wanted to be able to give those individual jobs an index of their own, so that i could compare them with each other. So I want another column with an index beginning with zero, which goes till the end of the job and then resets to zero for the new job. Or do i do this line by line?

1
  • Please look at stackoverflow.com/questions/20109391/… and learn how to ask a good pandas question. You need to show your data and your expected output. We can't construct examples from paragraphs of explanation. Commented Sep 8, 2017 at 7:07

1 Answer 1

4

I think you need set_index with cumcount for count categories:

df = df.set_index(df.groupby('Job Columns').cumcount(), append=True)

Sample:

np.random.seed(456)
df = pd.DataFrame({'Jobs':np.random.choice(['a','b','c'], size=10)})

#solution with sorting
df1 = df.sort_values('Jobs').reset_index(drop=True)
df1 = df1.set_index(df1.groupby('Jobs').cumcount(), append=True)
print (df1)
    Jobs
0 0    a
1 1    a
2 2    a
3 0    b
4 1    b
5 2    b
6 3    b
7 0    c
8 1    c
9 2    c

#solution with no sorting
df2 = df.set_index(df.groupby('Jobs').cumcount(), append=True)
print (df2)
    Jobs
0 0    b
1 1    b
2 0    c
3 0    a
4 1    c
5 2    c
6 1    a
7 2    b
8 2    a
9 3    b
Sign up to request clarification or add additional context in comments.

1 Comment

That solved the problem. You are a pandas genius, I think. Thanks a lot!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.