1

I have a CSV with data like

4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 18:14:58,57,4
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 20:11:15,1884,90
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-10-04 09:44:21,1146,6
4be390eefaf9a64e7cb52937c4a5c77a,"avito.ru",2014-09-29 21:01:29,48,3

I sort this like

print(infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum())

And I got data:

address            used_at
am.ru              2014         413071
                   2015         183402
auto.ru            2014        9122342
                   2015        6923367
avito.ru           2014       84503151
                   2015       87688571
avtomarket.ru      2014         106849
                   2015          95927
cars.mail.ru/sale  2014         211456
                   2015         167278
drom.ru            2014       11014955
                   2015        9704124
e1.ru              2014       28678357
                   2015       27961857
irr.ru/cars        2014         222193
                   2015         133678

I need to create bar chart like this example

But insted men and women I need to 2014 and 2015 year to every web-site(at axis x) and sum of active_seconds(at axis y). In example they use np.array, but I have object type series.

I try do this with:

width = 0.35
plt.figure()
ax = graph_by_duration['address'].plot(kind='bar', secondary_y=['active_seconds'])
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()

Should I convert it to np.array or process to do this?

1 Answer 1

3

I think you can first add reset_index and then pivot DataFrame for creating columns 2014 and 2015. Last use plot.bar:

df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum()
                                                                          .reset_index()
print df
              address  used_at  active_seconds
0               am.ru     2014          413071
1               am.ru     2015          183402
2             auto.ru     2014         9122342
3             auto.ru     2015         6923367
4            avito.ru     2014        84503151
5            avito.ru     2015        87688571
6       avtomarket.ru     2014          106849
7       avtomarket.ru     2015           95927
8   cars.mail.ru/sale     2014          211456
9   cars.mail.ru/sale     2015          167278
10            drom.ru     2014        11014955
11            drom.ru     2015         9704124
12              e1.ru     2014        28678357
13              e1.ru     2015        27961857
14        irr.ru/cars     2014          222193
15        irr.ru/cars     2015          133678
graph_by_duration = df.pivot(index='address', columns='used_at', values='active_seconds')
print graph_by_duration
used_at                2014      2015
address                              
am.ru                413071    183402
auto.ru             9122342   6923367
avito.ru           84503151  87688571
avtomarket.ru        106849     95927
cars.mail.ru/sale    211456    167278
drom.ru            11014955   9704124
e1.ru              28678357  27961857
irr.ru/cars          222193    133678

ax = graph_by_duration.plot.bar(figsize=(10,8))
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()

graph

Sign up to request clarification or add additional context in comments.

15 Comments

Is it real to add mean above every bar? Because some means too little and It's not clear what changes have occured. Maybe it's argument yerr ?
I try it with yerr, but it doesn work - only work comparing mean and std: df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.mean() df1 = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.std() fig, ax = plt.subplots() df.plot.bar(yerr=df1, ax=ax) ax.set_ylabel('Time online') ax.set_title('Time spent online per web site, per year') plt.show()
Error SyntaxError: Non-ASCII character '\xd0' in file C:/Users/user/Desktop/project/main.py on line 8, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Sometimes it happens, if some char is bad copied. Maybe try rewrite code in row 8.
Can you say, how can i convert active_seconds to hours? I have very large means and I want to divide it / 3600. I write for string in time['time online']: hour = string / 3600. round_h = '%.1f' % round(hour, 1) graph_by_duration = time.pivot(index='address', columns='used_at', values='round_h') but i have an error
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.