Creating Horizontal Bar Plot With Time-Series Data in Python

Question

I have the following problem:

Given a pandas dataframe with a number of unique hostnames, I would like to plot a horizontal bar graph that indicates the length of time that a particular issue occurred with this hostname.

I have the following code:

# Create a bar plot for each unique system name of all ticket entries
for sys_name in unique_sys_names:
    # Grab the df that refers to just the issues with that system name
    j_data_sys = eff_j_data[eff_j_data['System Name'] == sys_name]
    eff_j_data_sys = j_data_sys[['Created','Resolved','Summary']]
    eff_j_data_sys.plot.barh(x=eff_j_data_sys['Resolved']-eff_j_data_sys['Created'],y=range(0,len(eff_j_data_sys)))

Essentially, I have unique hostnames in a larger pandas dataframe, each with an issue ranging from 1 to N. In the for loop, I simply iterate through the unique hostnames (sys_name) and then I grab all the issues related to that hostname in j_data_sys. I then grab all the times that each issue was created and resolved as well as the Summary of the issue. All I would like to do is indicated in the following image: Example Bar Plot

Of course, this could include N issues, each with corresponding timestamps of start and finished.

An example dataframe containing this data would be:

           Created            Resolved           Summary
9  2016-04-25 10:29:00 2016-04-26 13:22:00  1 Blade Missing
10 2016-04-25 10:10:00 2016-04-25 10:23:00  Blade in Lockdown

Any other suggestions as to best represent this data in a time appropriate way is recommended.

Thank you,

ej_f · Accepted Answer · 2016-05-02 22:08:54Z

1

I think you don't need a bar plot, because it is used for visualizing relative distribution of categorical data. One solution could be using the following approach. Lets suppose that we have your test data in csv format.

In [1]: import pandas as pd
        import matplotlib.pyplot as plt
        df = pd.read_csv("df.txt", parse_dates = ["Created", "Resolved"], index_col = "Summary")
        df = df.stack().reset_index().rename(columns={0:"date"}).set_index("date")[["Summary"]]
        df = pd.get_dummies(df).applymap(lambda x:x if x else pd.np.nan)
        for n, col in enumerate(df.columns): df[col] = df[col]*n
        df.plot(lw=10, legend=False)
        plt.yticks(pd.np.arange(len(df.columns)), df.columns)
        plt.tight_layout()
        plt.show()

Basically, what the code above does is convert the "Created and Resolved" columns in the index of a new dataframe, then assign numerical values to each event when occurs or NaN if doesn't. The result dataframe is:

In [2]: df
Out[2]: 
                     Summary_1 Blade Missing  Summary_Blade in Lockdown
date                                                                   
2016-04-25 10:29:00                      0.0                        NaN
2016-04-26 13:22:00                      0.0                        NaN
2016-04-25 10:10:00                      NaN                        1.0
2016-04-25 10:23:00                      NaN                        1.0

And the result plot:

I hope this can help you. Regards.

answered May 2, 2016 at 22:08

ej_f

4603 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luis Alvarez Over a year ago

this helps a lot! So suppose I had three issues within my dataframe, that 3rd issue would have the numerical value 2.0 assigned to it?

ej_f Over a year ago

correct, this occurs in the for loop, each column is multiplying by its own position number

Collectives™ on Stack Overflow

Creating Horizontal Bar Plot With Time-Series Data in Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related