18

I have the following pandas DataFrame:

     time      Group      blocks
0     1        A           4
1     2        A           7
2     3        A           12
3     4        A           17
4     5        A           21 
5     6        A           26
6     7        A           33
7     8        A           39
8     9        A           48
9     10       A           59
    ....        ....          ....
36     35      A           231
37     1       B           1
38     2       B           1.5
39     3       B           3
40     4       B           5
41     5       B           6
    ....        ....          ....
911    35      Z           349

This is a dataframe with multiple time series-ques data, from min=1 to max=35. Each Group has a time series like this.

I would like to plot each individual time series A through Z against an x-axis of 1 to 35. The y-axis would be the blocks at each time.

I was thinking of using something like an Andrews Curves plot, which would plot each series against one another. Each "hue" would be set to a different group. (Other ideas are welcome.)

enter image description here

My problem: how do you format this dataframe to plot multiple series? Should the columns be GroupA, GroupB, etc.?

How do you get the dataframe to be in the format:

time GroupA blocksA GroupsB blocksB GroupsC blocksC....

Is this the correct format for an Andrews plot as shown?

EDIT

If I try:

df.groupby('Group').plot(legend=False)

the x-axis is completely incorrect. All time series should be plotted from 0 to 35, all in one series.

enter image description here

How do I solve this?

7
  • To call andrews_curves on a dataframe you have to mark values you want to group by. Try andrews_curves(df, 'Group') to group by columns Group. Commented Jul 5, 2016 at 8:04
  • @Serenity This is a mess. The x-axis is not from 1 to 35, and I'm not sure what the y-axis is. How do you change this? Commented Jul 5, 2016 at 8:06
  • Andrews' curves are between [-pi; +pi]. Read this: fedc.wiwi.hu-berlin.de/xplore/tutorials/mvahtmlnode9.html Commented Jul 5, 2016 at 8:25
  • @Serenity Is it possible to change the x-axis range? Commented Jul 5, 2016 at 11:59
  • ax=plt.gca(); ax_set_xlim(1,35) Commented Jul 5, 2016 at 12:10

2 Answers 2

13

You can re-structure the data as a pivot table:

df.pivot_table(index='time',columns='Group',values='blocks',aggfunc='sum').plot()
Sign up to request clarification or add additional context in comments.

Comments

12

Look at this variants. The first is Andrews' curves and the second is a multiline plot which are grouped by one column Month. The dataframe data includes three columns Temperature, Day, and Month:

import pandas as pd
import statsmodels.api as sm
import matplotlib.pylab as plt
from pandas.tools.plotting import andrews_curves

data = sm.datasets.get_rdataset('airquality').data
fig, (ax1, ax2) = plt.subplots(nrows = 2, ncols = 1)
data = data[data.columns.tolist()[3:]] # use only Temp, Month, Day

# Andrews' curves
andrews_curves(data, 'Month', ax=ax1)

# multiline plot with group by
for key, grp in data.groupby(['Month']): 
    ax2.plot(grp['Day'], grp['Temp'], label = "Temp in {0:02d}".format(key))
plt.legend(loc='best')    
plt.show()

When you plot Andrews' curve your data salvaged to one function. It means that Andrews' curves that are represented by functions close together suggest that the corresponding data points will also be close together.

enter image description here

2 Comments

See edit above; I have problems with the 'groupby' plot
I use this code to my data and got the error below. Can anyone help me on this? TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.