Creating bar chart with CSV data python

Question

I have a CSV with data like

4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 18:14:58,57,4
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 20:11:15,1884,90
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-10-04 09:44:21,1146,6
4be390eefaf9a64e7cb52937c4a5c77a,"avito.ru",2014-09-29 21:01:29,48,3

I sort this like

print(infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum())

And I got data:

address            used_at
am.ru              2014         413071
                   2015         183402
auto.ru            2014        9122342
                   2015        6923367
avito.ru           2014       84503151
                   2015       87688571
avtomarket.ru      2014         106849
                   2015          95927
cars.mail.ru/sale  2014         211456
                   2015         167278
drom.ru            2014       11014955
                   2015        9704124
e1.ru              2014       28678357
                   2015       27961857
irr.ru/cars        2014         222193
                   2015         133678

I need to create bar chart like this example

But insted men and women I need to 2014 and 2015 year to every web-site(at axis x) and sum of active_seconds(at axis y). In example they use np.array, but I have object type series.

I try do this with:

width = 0.35
plt.figure()
ax = graph_by_duration['address'].plot(kind='bar', secondary_y=['active_seconds'])
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()

Should I convert it to np.array or process to do this?

jezrael · Accepted Answer · 2016-03-19 12:23:00Z

3

I think you can first add reset_index and then pivot DataFrame for creating columns 2014 and 2015. Last use plot.bar:

df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum()
                                                                          .reset_index()
print df
              address  used_at  active_seconds
0               am.ru     2014          413071
1               am.ru     2015          183402
2             auto.ru     2014         9122342
3             auto.ru     2015         6923367
4            avito.ru     2014        84503151
5            avito.ru     2015        87688571
6       avtomarket.ru     2014          106849
7       avtomarket.ru     2015           95927
8   cars.mail.ru/sale     2014          211456
9   cars.mail.ru/sale     2015          167278
10            drom.ru     2014        11014955
11            drom.ru     2015         9704124
12              e1.ru     2014        28678357
13              e1.ru     2015        27961857
14        irr.ru/cars     2014          222193
15        irr.ru/cars     2015          133678

graph_by_duration = df.pivot(index='address', columns='used_at', values='active_seconds')
print graph_by_duration
used_at                2014      2015
address                              
am.ru                413071    183402
auto.ru             9122342   6923367
avito.ru           84503151  87688571
avtomarket.ru        106849     95927
cars.mail.ru/sale    211456    167278
drom.ru            11014955   9704124
e1.ru              28678357  27961857
irr.ru/cars          222193    133678

ax = graph_by_duration.plot.bar(figsize=(10,8))
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()

answered Mar 19, 2016 at 12:23

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

NineWasps Over a year ago

Is it real to add mean above every bar? Because some means too little and It's not clear what changes have occured. Maybe it's argument yerr ?

jezrael Over a year ago

I try it with yerr, but it doesn work - only work comparing mean and std:

df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.mean() df1 = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.std() fig, ax = plt.subplots() df.plot.bar(yerr=df1, ax=ax) ax.set_ylabel('Time online') ax.set_title('Time spent online per web site, per year') plt.show()

NineWasps Over a year ago

Error

SyntaxError: Non-ASCII character '\xd0' in file C:/Users/user/Desktop/project/main.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

jezrael Over a year ago

Sometimes it happens, if some char is bad copied. Maybe try rewrite code in row 8.

NineWasps Over a year ago

Can you say, how can i convert active_seconds to hours? I have very large means and I want to divide it / 3600. I write

for string in time['time online']:     hour = string / 3600.     round_h = '%.1f' % round(hour, 1)     graph_by_duration = time.pivot(index='address', columns='used_at', values='round_h')

but i have an error

|

Collectives™ on Stack Overflow

Creating bar chart with CSV data python

1 Answer 1

15 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

15 Comments

Your Answer

Sign up or log in

Post as a guest

Related