2

I need help with an issue I have concerning the creation of a sequence.

The sequence should be based on the value of ID_PROJET_test field that contains a bool which indicates whether we should increment or not.

if ID_PROJET_test = False then increment
if ID_PROJET_test = True then do not increment

For instance, if ID_PROJET_test contains the following Series: s1 = [0,0,1,0,1,0]

ID_PROJET should be equal to : [1,2,2,3,3,4]

If ID_PROJET_test contains the following Series: s2 = [0,0,0,1,1,1,0,0]

ID_PROJET should be equal to : [1,2,3,3,3,3,4,5]

I can do it easily with a for loop :

compteur = 1
for i in range(len(df)):
    if df['ID_PROJET_test']==True:
        df.ID_PROJET[i] = compteur
    else:
        compteur += 1
        df.ID_PROJET[i] = compteur

However, I have around 1.8M records and it is much too slow. Any idea on how to do it?

1 Answer 1

4

If you flip the 0/1 value, you can use cumsum():

s1 = pd.Series([0,0,1,0,1,0])

(~s1.astype(bool)).cumsum()
0    1
1    2
2    2
3    3
4    3
5    4
dtype: int64

s2 = pd.Series([0,0,0,1,1,1,0,0])

(~s2.astype(bool)).cumsum()
0    1
1    2
2    3
3    3
4    3
5    3
6    4
7    5
dtype: int64

Also note @Jon Clement's more compact:

(s1 ^ 1).cumsum()
Sign up to request clarification or add additional context in comments.

4 Comments

Even simpler (s1 ^ 1).cumsum()
Just to note as well - you don't need the astype(int) in your version - it's just an unnecessary type conversion... True is 1 and False is 0 - so cumsuming 'em works just fine and really increases the performance... it's still not as fast as xor'ing but much better :)
Thanks a lot ! Perfect !
@Arthur_V you're welcome! Please mark this answer accepted by clicking the check to the left of the answer - that way others will know this problem has been resolved.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.