0

I want to represent my data in the form of a bar plot as shown on my expected output. enter image description here

time,date,category
0,2002-05-01,2
1,2002-05-02,0
2,2002-05-03,0
3,2002-05-04,0
4,2002-05-05,0
5,2002-05-06,0
6,2002-05-07,0
7,2002-05-08,2
8,2002-05-09,2
9,2002-05-10,0
10,2002-05-11,2
11,2002-05-12,0
12,2002-05-13,0
13,2002-05-14,2
14,2002-05-15,2
15,2002-05-16,2
16,2002-05-17,2
17,2002-05-18,2
18,2002-05-19,0
19,2002-05-20,0
20,2002-05-21,1
21,2002-05-22,2
22,2002-05-23,0
23,2002-05-24,1
24,2002-05-25,0
25,2002-05-26,0
26,2002-05-27,0
27,2002-05-28,0
28,2002-05-29,1
29,2002-05-30,0

import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')
daily_category = df[['date','category']]
daily_category['weekday'] = pd.to_datetime(daily_category['date']).dt.day_name()
daily_category_plot = daily_category[['weekday','category']]

daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None)
plt.show()

However, I get the below error

Traceback (most recent call last): File "day_plot.py", line 10, in daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None) File "/home/..../.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6525, in groupby dropna=dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 533, in init dropna=self.dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/grouper.py", line 786, in get_grouper raise KeyError(gpr) KeyError: 'weekday'

********** A further example below where I manually extract data below returns almost the expected output except that the days are represented as numbers instead of weekday names. ***********

Day,category1,category2,category3
Sunday,0,0,4
Monday,0,0,4
Tuesday,1,1,2
Wednesday,1,4,0
Thursday,0,2,3
Friday,1,1,2
Saturday,0,2,2

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')

ax = df.plot.bar(stacked=True, color=['green', 'red', 'blue'])
ax.set_xticklabels(labels=df.index, rotation=70, rotation_mode="anchor", ha="right")
ax.set_xlabel('')
ax.set_ylabel('Number of days')
plt.show()

Tested output

enter image description here

Updated code producing odd plot

import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')
daily_category = df[['time','date','category']]
daily_category['weekday'] = pd.to_datetime(daily_category['date']).dt.day_name()

ans = (daily_category.groupby(['weekday', 'category']) 
         .size()
         .reset_index(name='sum')
         .pivot(index='weekday', columns='category', values='sum')
      )

ans.plot.bar(stacked=True)
plt.show()

Updated output

enter image description here

5
  • pivot your table, then .plot.bar(stacked=True) Commented Apr 22, 2022 at 17:42
  • I edited you latest picture so you see, that the names of the days arn't in order. Commented Apr 23, 2022 at 11:44
  • When I take out this line: ans.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], I get the correct output, but the week days don't have a natural ordering (see updated output). Instead they get ordered in alphabetic order. Commented Apr 23, 2022 at 11:52
  • Your second solution works perfect. Thanks a lot Commented Apr 23, 2022 at 12:41
  • I can change the x and y label with setting the fontsize=''. How can I change fontsize of the category labels on the bars? Commented Apr 23, 2022 at 12:47

2 Answers 2

1

This solution uses groupby on to columns and transforms the returned Dataframe using pivot. This can be plotted by plot.bar() but has the wrong labels. Therefor the index is changed.

I did copy and past you code and got a DataFrame by

import pandas as pd
from io import StringIO
t = """time,date,category
0,2002-05-01,2
..."""
df = pd.read_csv(StringIO(t))
df['weekday'] = df.date.apply(lambda x: pd.to_datetime(x).weekday())

To check the expected output for the Wednesday bar I use the filter option.

>>>df[df['weekday']==2]
     time        date  category  weekday
0      0  2002-05-01         2        2
7      7  2002-05-08         2        2
14    14  2002-05-15         2        2
21    21  2002-05-22         2        2
28    28  2002-05-29         1        2

So I want to see on the Wednesday only category 1 (1/5) and category 2 (4/5).

ans = (df.groupby(["weekday", "category"]) 
         .size()
         .reset_index(name="sum")
         .pivot(index='weekday', columns='category', values='sum')
      )
ans.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
ans.plot.bar(stacked=True)

stacked bar plot

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks but when I run your solution on the same data I get different categories e.g. for category 0 I expected to see 1 count on Tuesday, Wednesday, and Friday. However, I get counts of 4, 2, and 3 for the respective days (please see my tested output).
I think the problem why I cannot reproduce your output is that in my dataset, time is treated as an index and not a column. Will make it a column and see if I get the same output as yours.
@Gee Your example data does not match you exmple graph. See Saturday,0,2,2 and a stacked bar plot with 3 colors. This looks odd.
You are correct...its odd but I am running the code on exactly the same dataset. I have posted the updated code that generates this weird plot.
This is because the order with the names of the weekdays isn't the same as for the weekdays with numbers. But you overwrite it by ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] which is wrong.
1
import pandas as pd
import matplotlib.pyplot as plt

d = """0,2002-05-01,2  1,2002-05-02,0  2,2002-05-03,0  3,2002-05-04,0  4,2002-05-05,0  5,2002-05-06,0  6,2002-05-07,0  7,2002-05-08,2  8,2002-05-09,2  9,2002-05-10,0  10,2002-05-11,2  11,2002-05-12,0  12,2002-05-13,0  13,2002-05-14,2  14,2002-05-15,2  15,2002-05-16,2  16,2002-05-17,2  17,2002-05-18,2  18,2002-05-19,0  19,2002-05-20,0  20,2002-05-21,1  21,2002-05-22,2  22,2002-05-23,0  23,2002-05-24,1  24,2002-05-25,0  25,2002-05-26,0  26,2002-05-27,0  27,2002-05-28,0  28,2002-05-29,1  29,2002-05-30,0"""
df = pd.DataFrame([v.split(',') for v in d.split('  ')], columns=['time', 'date', 'category'])
df.time, df.category = df.time.astype(int), df.category.astype(int)

data = df.copy()
data['weekday'] = pd.to_datetime(data['date']).dt.day_name()
data.drop(columns=['time', 'date'], inplace=True)

weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
categories = sorted(list(set(df.category)))
counts = pd.DataFrame(0, index=weekdays, columns=categories)
for weekday, category in zip(data.weekday, data.category):
    counts.loc[weekday, category] += 1

counts.plot.bar(stacked=True);

enter image description here

1 Comment

Thanks, how can I display the category labeling on the bars, instead of just having the numbers 0, 1, and 2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.