2

I have a dataframe extracted from WhatsApp with columns: Date&Time, msg, name, msg_len. Date&Time is a DateTime object that represents when the message has been sent, msg is the actual message, name is who sent the message and msg_len is the actual length of the message. I'm trying to build a stacked bar plot using this dataframe: on the X-axis the date (e.g. 2019-02), on the y-axis, the mean length or the number of messages sent in that month and each bar is divided by each person. So far my function looks like this:

def BarPlotMonth(Data):
    """
    This function plots a barplot for the number of messages sent for each month and the mean length of the messages for each month
    """

    fig,axes = plt.subplots(2,1,
            figsize=(18,10),
            sharex = True)


    GroupedByMonth = Data.groupby(Data['Date&Time'].dt.strftime('%Y-%m'))['msg_len']

    Mean = GroupedByMonth.mean()
    Count = GroupedByMonth.count()
    Std = GroupedByMonth.std()

    axes[0].bar(Count.index, Count, color = 'lightblue')
    axes[0].set_title('Number of text per month')
    axes[0].set_ylabel('Count')

    axes[1].bar(Mean.index, Mean, color = 'lightblue', yerr = Std)
    axes[1].set_title('Mean lenght of a message per month')
    axes[1].set_ylabel('Mean lenght')
    axes[1].set_xlabel('Year-Month')

    plt.xticks(rotation=45)
    axes[1].legend()

    plt.savefig('WhatsApp_conversations.png')
    plt.show()

But I can't divide each bar. How can I solve this?

2
  • You need to call axes.bar for each person if you want to separate by person. Commented Feb 24, 2019 at 0:01
  • I know, but I do not know how because grouping the grouped dataframe seems to not be a great idea for python. Commented Feb 24, 2019 at 0:50

1 Answer 1

4

You will need to restructure your DataFrame a bit to be able to use df.plot(kind='bar', stacked=True).

group_by_month_per_user = df.groupby(
    [
        df['Date&Time'].dt.strftime('%Y-%m'),
        'name'
    ]   
).mean().unstack()

group_by_month_per_user

This produces a table with the following structure.

             msg_len                                 
name           alice        bob   giuseppe     martin
Date&Time                                            
2019-01    48.870968  42.315789  56.391304  49.586207
2019-02    51.099174  48.777778  56.173913  51.895652
2019-03    52.336364  49.626168  47.021898  46.626263

Note that the columns is a multindex with msg_len over all columns, we need to remove this to keep the legend tidy (can simply select the entire column). Then the resulting DataFrame can be passed to .plot.

group_by_month_per_user['msg_len'].plot(kind='bar', stacked=True, legend=['name'])

This produces the following plot.

output showing stacked bar for SMS length

The following code was used to generate a random dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from random import randint, choice
import string

ts = datetime.now()
data = []
names = ['bob', 'alice', 'martin', 'giuseppe']

for n in range(1000):
    msg_len = randint(0, 100)
    row = [
        ts - timedelta(days=randint(-30,30)),
        ''.join(random.choice(string.ascii_lowercase) for _ in range(msg_len)),
        choice(names),
        msg_len
    ]

    data.append(row)

df = pd.DataFrame(data, columns = ['Date&Time', 'msg', 'name', 'msg_len'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.