0

This question has two parts:

1) Is there a better way to do this?

2) If NO to #1, how can I fix my date issue?

I have a dataframe as follows

  GROUP    DATE        VALUE    DELTA
    A   12/20/2015      2.5      ??
    A   11/30/2015      25        
    A    1/31/2016      8.3       
    B       etc         etc
    B       etc         etc  
    C       etc         etc
    C       etc         etc
  • This is a representation, there are close to 100 rows for each group (each row representing a unique date).

For each letter in GROUP, I want to find the change in value between successive dates. So for example for GROUP A I want the change between 11/30/2015 and 12/20/2015, which is -22.5. Currently I am doing the following:

df['DATE'] = pd.to_datetime(df['DATE'],infer_datetime_format=True)
df.sort_values('DATE',ascending=True)
df_out = []
for GROUP in df.GROUP.unique():
    x = df[df.GROUP == GROUP]
    x['VALUESHIFT'] = x['VALUE'].shift(+1)
    x['DELTA'] = x['VALUE'].sub(x['VALUESHIFT'])
    df_out.append(x)
df_out = pd.concat(df_out)

The challenge I am running into is the dates are not sorted correctly. So when the shift takes place and I calculate the delta it is not really the delta between successive dates.

Is this the right approach to handle? If so how can I fix my date issue? I have reviewed/tried the following to no avail:

Applying datetime format in pandas for sorting

how to make a pandas dataframe column into a datetime object showing just the date to correctly sort

doing calculations in pandas dataframe based on trailing row

Pandas - Split dataframe into multiple dataframes based on dates?

1
  • One thing to add - I have not set an index on the dataframe. It just has the default numeric 0, 1, 2, 3, 4, index set when I create the dataframe. Commented Feb 9, 2017 at 15:09

1 Answer 1

0

Answering my own question. This works:

df['DATE'] = pd.to_datetime(df['DATE'],infer_datetime_format=True)
df_out = []
for ID in df.GROUP.unique():
    x = df[df.GROUP == ID]
    x.sort_values('DATE',ascending=True, inplace=True)
    x['VALUESHIFT'] = x['VALUE'].shift(+1)
    x['DELTA'] = x['VALUE'].sub(x['VALUESHIFT'])
    df_out.append(x)
df_out = pd.concat(df_out)

1) Added inplace=True to sort value.

2) Added the sort within the for loop.

3) Changed by loop from using GROUP to ID since it is also the name of a column name, which I imagine is considered sloppy?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.