0

I have been trying to make a program that plots the frequency of usage of a word during Whatsapp chats between 2 people. The word night for example has been used a couple of times on a few days, and 0 times on the most of the days. The graph I have is as follows

Usage of the word night

Here is the code

word_occurances = [0 for i in range(len(just_dates))]

for i in range(len(just_dates)):
    for j in range(len(df_word)):
        if just_dates[i].date() == word_date[j].date():
            word_occurances[i] += 1

title = person2.rstrip(':') + ' with ' + person1.rstrip(':') + ' usage of the word - ' + word

plt.plot(just_dates, word_occurances, color = 'purple')
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title(title)
plt.savefig('Graphs/Words/' + title + '.jpg', dpi = 200)
plt.show()

word_occurances is a list

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

What I want is for the graph to only connect the points where it has been used while showing the entire timeline on the x axis. I don't want the graph to touch 0. How can I do this? I have searched and found similar answers but none have worked the way I them.

2 Answers 2

1

You simply have to find the indices of word_occurances on which the corresponding value is greater than zero. With this you can index just_dates to get the corresponding dates.

word_counts = []    # Only word counts > 0
dates = []          # Date of > 0 word count
for i, val in enumerate(word_occurances):
    if val > 0:
        word_counts.append(val)
        dates.append(just_dates[i])

You may want to plot with an underlying bar plot in order to maintain the original scale.

plt.bar(just_dates, word_occurances)
plt.plot(dates, word_counts, 'r--')
Sign up to request clarification or add additional context in comments.

Comments

1

One way to address this is to plot only data that contain entries but label all dates where a conversation took place to indicate the zero values in your graph:

from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator

#fake data generation, this block just imitates your unknown data and can be deleted
import numpy as np
import pandas as pd
np.random.seed(12345)
n = 30
just_dates = pd.to_datetime(np.random.randint(1, 100, n)+18500, unit="D").sort_values().to_list()
word_occurances = [0]*n
for i in range(10): 
    word_occurances[np.random.randint(n)] = np.random.randint(1, 10)


fig, ax = plt.subplots(figsize=(15,5))

#generate data to plot by filtering out zero values
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]

#plot these data with marker to indicate each point 
#think 1-1-1-1-1 would only be visible as two points with lines only
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
#label all dates where conversations took place
ax.xaxis.set_major_locator(FixedLocator(mdates.date2num(just_dates)))
#prevent that matplotlib autoscales the y-axis
ax.set_ylim(0, )
ax.tick_params(axis="x", labelrotation= 90)

plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()

Sample output: ![enter image description here

This can get quite busy soon with all these date labels (and might or might not work with your datetime objects in just_dates that might differ in structure from my sample date). Another way would be to indicate each conversation with vlines:

...
fig, ax = plt.subplots(figsize=(15,5))

plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]

ax.plot(*zip(*plot_data), color = 'purple', marker="o")
ax.vlines((just_dates), 0, max(word_occurances), color="red", ls="--")
ax.set_ylim(0, )

plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()

Sample output: enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.