Adding rows to DataFrame conditioned on values

Question

I have the following Pandas DataFrame:

     start_timestamp_milli  end_timestamp_milli       name  rating
1            1555414708025        1555414723279    Valence       2   
2            1555414708025        1555414723279    Arousal       6   
3            1555414708025        1555414723279  Dominance       2   
4            1555414708025        1555414723279    Sadness       1
5            1555414708025        1555414723279    Happiness     0
6            1555414708025        1555414723279    Anger         0
7            1555414708025        1555414723279    Surprise      0
8            1555414708025        1555414723279    Stress        0
9            1555414813304        1555414831795    Valence       3   
10           1555414813304        1555414831795    Arousal       5   
11           1555414813304        1555414831795  Dominance       2   
12           1555414813304        1555414831795    Sadness       0
13           1555414813304        1555414831795    Happiness     0
14           1555414813304        1555414831795    Anger         0
15           1555414708025        1555414723279    Surprise      0
16           1555414708025        1555414723279    Stress        0   
17           1555414921819        1555414931382    Valence       1   
18           1555414921819        1555414931382    Arousal       7   
19           1555414921819        1555414931382  Dominance       2   
20           1555414921819        1555414931382    Sadness       1 
21           1555414921819        1555414931382   Happiness      0  
22           1555414921819        1555414931382    Anger         1
23           1555414708025        1555414723279    Surprise      0
24           1555414708025        1555414723279    Stress        1

Now, for each block with the same start_timestamp_milli and end_timestamp_milli, I would like to insert an additional row with name "Neutral" and a rating of 1 if the rating of Sadness, Happiness, Anger, Surprise and Stress is 0 and otherwise 0. The start_timestamp_milli and end_timestamp_milli of the new row should be set to the values of that block.

The resulting DataFrame should look like this:

     start_timestamp_milli  end_timestamp_milli       name  rating
1            1555414708025        1555414723279    Valence       2   
2            1555414708025        1555414723279    Arousal       6   
3            1555414708025        1555414723279  Dominance       2   
4            1555414708025        1555414723279    Sadness       1
5            1555414708025        1555414723279    Happiness     0
6            1555414708025        1555414723279    Anger         0
7            1555414708025        1555414723279    Surprise      0
8            1555414708025        1555414723279    Stress        0
9            1555414708025        1555414723279    Neutral       0
10           1555414813304        1555414831795    Valence       3   
11           1555414813304        1555414831795    Arousal       5   
12           1555414813304        1555414831795  Dominance       2   
13           1555414813304        1555414831795    Sadness       0
14           1555414813304        1555414831795    Happiness     0
15           1555414813304        1555414831795    Anger         0
16           1555414708025        1555414723279    Surprise      0
17           1555414708025        1555414723279    Stress        0
18           1555414708025        1555414723279    Neutral       1   
19           1555414921819        1555414931382    Valence       1   
20           1555414921819        1555414931382    Arousal       7   
21           1555414921819        1555414931382  Dominance       2   
22           1555414921819        1555414931382    Sadness       1 
23           1555414921819        1555414931382   Happiness      0  
24           1555414921819        1555414931382    Anger         1
25           1555414708025        1555414723279    Surprise      0
26           1555414708025        1555414723279    Stress        1 
27           1555414708025        1555414723279    Neutral       0

How can this be done?

@Valentino No, it isn't. The resulting dataframe has 3 additional rows. — machinery
– machinery, Commented Jun 3, 2019 at 18:50

BENY · Accepted Answer · 2019-06-03 17:04:10Z

You can do with filter before groupby agg + all , then concat back the result

s=df.loc[df.name.isin(['Sadness', 'Happiness', 'Anger', 'Surprise' , 'Stress']),'rating'].\
       eq(0).\
            groupby([df['start_timestamp_milli'],df['end_timestamp_milli']]).\
                 agg('all').reset_index().assign(name='Neutral')
df=pd.concat([df,s],sort=False).sort_values(['start_timestamp_milli','end_timestamp_milli'])
df
Out[66]: 
    start_timestamp_milli  end_timestamp_milli       name  rating
1           1555414708025        1555414723279    Valence       2
2           1555414708025        1555414723279    Arousal       6
3           1555414708025        1555414723279  Dominance       2
4           1555414708025        1555414723279    Sadness       1
5           1555414708025        1555414723279  Happiness       0
6           1555414708025        1555414723279      Anger       0
7           1555414708025        1555414723279   Surprise       0
8           1555414708025        1555414723279     Stress       0
15          1555414708025        1555414723279   Surprise       0
16          1555414708025        1555414723279     Stress       0
23          1555414708025        1555414723279   Surprise       0
24          1555414708025        1555414723279     Stress       1
0           1555414708025        1555414723279    Neutral       0
9           1555414813304        1555414831795    Valence       3
10          1555414813304        1555414831795    Arousal       5
11          1555414813304        1555414831795  Dominance       2
12          1555414813304        1555414831795    Sadness       0
13          1555414813304        1555414831795  Happiness       0
14          1555414813304        1555414831795      Anger       0
1           1555414813304        1555414831795    Neutral       1
17          1555414921819        1555414931382    Valence       1
18          1555414921819        1555414931382    Arousal       7
19          1555414921819        1555414931382  Dominance       2
20          1555414921819        1555414931382    Sadness       1
21          1555414921819        1555414931382  Happiness       0
22          1555414921819        1555414931382      Anger       1
2           1555414921819        1555414931382    Neutral       0

user3483203 · Accepted Answer · 2019-06-03 17:01:11Z

I would pivot first, which makes the lookup a lot easier, and then adjust your initial DataFrame.

f = (df.pivot_table(index=['start_timestamp_milli', 'end_timestamp_milli'],
        columns='name', values='rating', aggfunc='any', fill_value=0).astype(int))

Now to find the timestamp combinations:

cols = ['Sadness', 'Happiness', 'Anger', 'Surprise', 'Stress']
appd = pd.Series(np.where(f[cols].any(1), 0, 1), index=f.index)

res = pd.concat([df, appd.rename('rating').reset_index().assign(name='Neutral')])

    end_timestamp_milli       name  rating  start_timestamp_milli
1         1555414723279    Valence       2          1555414708025
2         1555414723279    Arousal       6          1555414708025
3         1555414723279  Dominance       2          1555414708025
4         1555414723279    Sadness       1          1555414708025
5         1555414723279  Happiness       0          1555414708025
6         1555414723279      Anger       0          1555414708025
7         1555414723279   Surprise       0          1555414708025
8         1555414723279     Stress       0          1555414708025
9         1555414831795    Valence       3          1555414813304
10        1555414831795    Arousal       5          1555414813304
11        1555414831795  Dominance       2          1555414813304
12        1555414831795    Sadness       0          1555414813304
13        1555414831795  Happiness       0          1555414813304
14        1555414831795      Anger       0          1555414813304
15        1555414723279   Surprise       0          1555414708025
16        1555414723279     Stress       0          1555414708025
17        1555414931382    Valence       1          1555414921819
18        1555414931382    Arousal       7          1555414921819
19        1555414931382  Dominance       2          1555414921819
20        1555414931382    Sadness       1          1555414921819
21        1555414931382  Happiness       0          1555414921819
22        1555414931382      Anger       1          1555414921819
23        1555414723279   Surprise       0          1555414708025
24        1555414723279     Stress       1          1555414708025
0         1555414723279    Neutral       0          1555414708025
1         1555414831795    Neutral       1          1555414813304
2         1555414931382    Neutral       0          1555414921819

This will add them to the end of the Frame, you would need to sort if you want them intermittently placed.

Quang Hoang · Accepted Answer · 2019-06-03 17:02:07Z

3

You can try:

# turn to long table with columns being feelings
new_df = (df.pivot_table(index=['start_timestamp_milli',  'end_timestamp_milli',], 
                           columns='name', 
                           values='rating')
            .fillna(0)
            )

new_df['Neutral'] = (new_df[['Sadness','Happiness','Anger', 'Surprise', 'Stress']]
                           .eq(0).all(axis=1).astype(int)
                    )

new_series = new_df[['Neutral']].stack()
new_series.name = 'rating'

df = pd.concat((df, new_series.reset_index()))

answered Jun 3, 2019 at 17:02

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Comments

piRSquared · Accepted Answer · 2019-06-03 17:12:11Z

IIUC:

se = ['start_timestamp_milli', 'end_timestamp_milli']
names = 'Sadness, Happiness, Anger, Surprise, Stress'.split(', ')

Create this little guy

neutral = df.set_index(se).query('name in @names').rating.eq(0).all(level=[0, 1]).mul(1)

neutral

start_timestamp_milli  end_timestamp_milli
1555414708025          1555414723279          0
1555414813304          1555414831795          1
1555414921819          1555414931382          0
Name: rating, dtype: int64

Then add the name column to it with the value of 'Neutral' and append it the the original. Don't forget to sort and drop stuff to make it pretty again.

df.append(neutral.reset_index().assign(name='Neutral'), sort=False) \
  .sort_values(se).reset_index(drop=True)

    start_timestamp_milli  end_timestamp_milli       name  rating
0           1555414708025        1555414723279    Valence       2
1           1555414708025        1555414723279    Arousal       6
2           1555414708025        1555414723279  Dominance       2
3           1555414708025        1555414723279    Sadness       1
4           1555414708025        1555414723279  Happiness       0
5           1555414708025        1555414723279      Anger       0
6           1555414708025        1555414723279   Surprise       0
7           1555414708025        1555414723279     Stress       0
8           1555414708025        1555414723279   Surprise       0
9           1555414708025        1555414723279     Stress       0
10          1555414708025        1555414723279   Surprise       0
11          1555414708025        1555414723279     Stress       1
12          1555414708025        1555414723279    Neutral       0
13          1555414813304        1555414831795    Valence       3
14          1555414813304        1555414831795    Arousal       5
15          1555414813304        1555414831795  Dominance       2
16          1555414813304        1555414831795    Sadness       0
17          1555414813304        1555414831795  Happiness       0
18          1555414813304        1555414831795      Anger       0
19          1555414813304        1555414831795    Neutral       1
20          1555414921819        1555414931382    Valence       1
21          1555414921819        1555414931382    Arousal       7
22          1555414921819        1555414931382  Dominance       2
23          1555414921819        1555414931382    Sadness       1
24          1555414921819        1555414931382  Happiness       0
25          1555414921819        1555414931382      Anger       1
26          1555414921819        1555414931382    Neutral       0

Collectives™ on Stack Overflow

Adding rows to DataFrame conditioned on values

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related