0

My dataframe is,

      created_at            text
2017-03-01 00:00:01        power blah blah
2017-03-01 00:00:11        foo blah blah
2017-03-01 00:01:01        bar blah blah
2017-03-02 00:00:01        foobar blah blah
2017-03-02 00:10:01        hello world
2017-03-02 01:00:01        power blah blah

created_at is my index and its type is datetime64 which I can slice day by day easily. What I want to plot is that total number of entries day by day. I separate this dataframe into its category, and plot them in one graph. But I think there is better way to do without having multiple dataframes

a = df[df["text"].str.contains("power")]
b = df[df["text"].str.contains("foo")]
c = df[df["text"].str.contains("bar")]

fig = plt.figure()
ax = fig.add_subplot(111)

df.groupby(df["created_at"].dt.date).size().plot(kind="bar", position=0)
a.groupby(a["created_at"].dt.date).size().plot(kind="bar", position=0)
b.groupby(b["created_at"].dt.date).size().plot(kind="bar", position=0)
c.groupby(c["created_at"].dt.date).size().plot(kind="bar", position=0)

plt.show()

I am learning Seaborn, so if solution is related to Seaborn, it would be nice, but it does not have to stick to it. Thanks in advance!

1
  • In case your categories are mutually exclusive, just add a "category" column and iterate over df.groupby('category'). Otherwise, the best you can do to clean up your code is use a for loop. Commented Mar 16, 2018 at 6:35

1 Answer 1

1

Since you want to group-by days consider converting df.index to type pd.DatetimeIndex so you can use df.resample() as shown below:

# your original dataframe:
df = pd.read_json({"text":{"1488326401000":"power blah blah","1488326411000":"foo blah blah","1488326461000":"bar blah blah","1488412801000":"foobar blah blah","1488413401000":"hello world","1488416401000":"power blah blah"}})

# convert index to DatetimeIndex
df.index = pd.to_datetime(df.index)

# create function to do your calculations; not sure if this is exactly what you want
def func(df_):
    texts = ['power', 'foo', 'bar']
    d = dict()

    for text in texts:
        d[text] = df_['text'].str.contains(text).sum()

    return pd.Series(d)

# create your dataframe for plotting by resampling your data by each day and then applying the `func`
df_plot = df.resample('D').apply(func)

# do the plotting
df_plot.plot(kind='bar')

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.