3

I want to detect sign change of my data using either pandas or numpy. I want to count the number(s) of id which changes sign of y between two immediate TIMESTEP values (eg. for 2800 and 2900 TIMESTEPs, id 313 has changed sign (y becomes negative). I have tried the below code by counting negatives and then using drop duplicate but that again not efficient and correct.

df_negatives0 = df0.query('y < 0')
df_nonduplicate0=df_negatives0.drop_duplicates(subset=["id"])

My dataset:

TIMESTEP    id  mass    y
0       42  0.755047    0.489375
0       245 0.723805    0.479446
0       344 0.675664    0.463363
...
...
2800    313 0.795699    0.00492984
2800    425 0.68311     0.282356
2900    42  0.755047    0.424421
2900    245 0.723805    0.0378489
2900    344 0.675664    0.127917
2900    313 0.795699    -0.0149792
2900    425 0.68311     0.273884
...
...
upto
10000000

My desired data:

TIMESTEP    id_count    mass
2900        1           0.795699
...
...
500000      2           0.85245 + 0.54852 (i want to sum the masses if id count is more than one)
...
...
upto
10000000

2 Answers 2

2

There is a dedicated function in numpy np.sign (thanks to @Asclepius for flagging an error in previous version of this answer) and for sign change from one row to the next, it's possible to use the .diff method:

from numpy import sign
from pandas import DataFrame

df = DataFrame([-2, 0, -1, 3, -2], columns=["x"])

# this will return the sign of the float x
df["sign"] = sign(df["x"])

# this will return the difference between the sign of two consecutive rows
print(df["sign"].diff())
# 0    NaN
# 1    1.0
# 2   -1.0
# 3    2.0
# 4   -2.0
Sign up to request clarification or add additional context in comments.

Comments

0
  • For each id, find the sign-changed rows
  • Do Aggregation for each TIMESTEP
def filter_sign_changed(df_same_id):
    df_same_id = df_same_id.sort_values(by="TIMESTEP")
    sign_changed = df_same_id.y * df_same_id.y.shift(1) < 0
    return df_same_id[sign_changed]

sign_changed_df = df.groupby("id").apply(filter_sign_changed)
result = sign_changed_df.groupby("TIMESTEP", as_index=False).agg(id_count=("id", "size"), mass=("mass", "sum"))
result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.