3

In python pandas I have create a dataframe with one value for each year and two subclasses - i.e., one metric for a parameter triplet

import pandas, requests, numpy
import matplotlib.pyplot as plt

df

       Metric    Tag_1  Tag_2  year
0     5770832  FOOBAR1  name1  2008
1     7526436  FOOBAR1    xyz  2008
2    33972652  FOOBAR1  name1  2009
3    17491416  FOOBAR1    xyz  2009
...
16    6602920  baznar2  name1  2008
17       6608  baznar2    xyz  2008
...
30  142102944  baznar2  name1  2015
31          0  baznar2    xyz  2015

I would like to produce a bar plot with metrics as y-values over x=(year,Tag_1,Tag_2) and sorting primarily for years and secondly for tag_1 and color the bars depending on tag_1. Something like

(2008,FOOBAR,name1)   --> 5770832  *RED*
(2008,baznar2,name1)  --> 6602920  *BLUE*
(2008,FOOBAR,xyz)     --> 7526436  *RED*
(2008,baznar2,xyz)    --> ...      *BLUE*
(2008,FOOBAR,name1)   --> ...      *RED*

I tried starting with a grouping of columns like

df.plot.bar(x=['year','tag_1','tag_2']

but have not found a way to separate selections into two bar sets next to each other.

1
  • You'll be more likely to get an answer if we can create an example dataframe by copy and pasting. Commented May 27, 2016 at 16:31

2 Answers 2

1

This should get you on your way:

df = pd.read_csv('path_to_file.csv')

# Group by the desired columns
new_df = df.groupby(['year', 'Tag_1', 'Tag_2']).sum()
# Sort descending
new_df.sort('Metric', inplace=True)


# Helper function for generation sequence of 'r' 'b' colors
def get_color(i):
    if i%2 == 0:
        return 'r'
    else:
        return 'b'

colors = [get_color(j) for j in range(new_df.shape[0])]

# Make the plot
fig, ax = plt.subplots()
ind = np.arange(new_df.shape[0])
width = 0.65
a = ax.barh(ind, new_df.Metric, width, color = colors) # plot a vals
ax.set_yticks(ind + width)  # position axis ticks
ax.set_yticklabels(new_df.index.values)  # set them to the names
fig.tight_layout()
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

you can also do it this way:

fig, ax = plt.subplots()
df.groupby(['year', 'Tag_1', 'Tag_2']).sum().plot.barh(color=['r','b'], ax=ax)
fig.tight_layout()
plt.show()

PS if don't like scientific notation you can get rid of it:

ax.get_xaxis().get_major_formatter().set_scientific(False)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.