0

I have the following problem:

Given a pandas dataframe with a number of unique hostnames, I would like to plot a horizontal bar graph that indicates the length of time that a particular issue occurred with this hostname.

I have the following code:

# Create a bar plot for each unique system name of all ticket entries
for sys_name in unique_sys_names:
    # Grab the df that refers to just the issues with that system name
    j_data_sys = eff_j_data[eff_j_data['System Name'] == sys_name]
    eff_j_data_sys = j_data_sys[['Created','Resolved','Summary']]
    eff_j_data_sys.plot.barh(x=eff_j_data_sys['Resolved']-eff_j_data_sys['Created'],y=range(0,len(eff_j_data_sys)))

Essentially, I have unique hostnames in a larger pandas dataframe, each with an issue ranging from 1 to N. In the for loop, I simply iterate through the unique hostnames (sys_name) and then I grab all the issues related to that hostname in j_data_sys. I then grab all the times that each issue was created and resolved as well as the Summary of the issue. All I would like to do is indicated in the following image: Example Bar Plot

Of course, this could include N issues, each with corresponding timestamps of start and finished.

An example dataframe containing this data would be:

           Created            Resolved           Summary
9  2016-04-25 10:29:00 2016-04-26 13:22:00  1 Blade Missing
10 2016-04-25 10:10:00 2016-04-25 10:23:00  Blade in Lockdown

Any other suggestions as to best represent this data in a time appropriate way is recommended.

Thank you,

1 Answer 1

1

I think you don't need a bar plot, because it is used for visualizing relative distribution of categorical data. One solution could be using the following approach. Lets suppose that we have your test data in csv format.

In [1]: import pandas as pd
        import matplotlib.pyplot as plt
        df = pd.read_csv("df.txt", parse_dates = ["Created", "Resolved"], index_col = "Summary")
        df = df.stack().reset_index().rename(columns={0:"date"}).set_index("date")[["Summary"]]
        df = pd.get_dummies(df).applymap(lambda x:x if x else pd.np.nan)
        for n, col in enumerate(df.columns): df[col] = df[col]*n
        df.plot(lw=10, legend=False)
        plt.yticks(pd.np.arange(len(df.columns)), df.columns)
        plt.tight_layout()
        plt.show()

Basically, what the code above does is convert the "Created and Resolved" columns in the index of a new dataframe, then assign numerical values to each event when occurs or NaN if doesn't. The result dataframe is:

In [2]: df
Out[2]: 
                     Summary_1 Blade Missing  Summary_Blade in Lockdown
date                                                                   
2016-04-25 10:29:00                      0.0                        NaN
2016-04-26 13:22:00                      0.0                        NaN
2016-04-25 10:10:00                      NaN                        1.0
2016-04-25 10:23:00                      NaN                        1.0

And the result plot:

enter image description here

I hope this can help you. Regards.

Sign up to request clarification or add additional context in comments.

2 Comments

this helps a lot! So suppose I had three issues within my dataframe, that 3rd issue would have the numerical value 2.0 assigned to it?
correct, this occurs in the for loop, each column is multiplying by its own position number

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.