4

I've developed a perl script that manipulates around data and gives me a final csv file. Unfortunately, the package for graphs and charts in perl are not supported on my system and I'm not able to install them due to work restrictions. So I want to try and take the csv file and put together something in Python to generate a mixed graph. I want the first column to be the labels on the x-axis. The next three columns to be bar graphs. The fourth column to be a line across the x-axis.

Here is sample data:

Name      PreviousWeekProg     CurrentWeekProg     ExpectedProg     Target
Dan              94                   92                 95           94
Jarrod           34                   56                 60           94
Chris            45                   43                 50           94
Sam              89                   90                 90           94
Aaron            12                   10                 40           94
Jenna            56                   79                 80           94
Eric             90                   45                 90           94

I am looking for a graph like this: enter image description here

I did some researching but being as clueless as I am in python, I wanted to ask for some guidance on good modules to use for mixed charts and graphs in python. Sorry, if my post is vague. Besides looking at other references online, I'm pretty clueless about how to go about this. Also, my version of python is 3.8 and I DO have matplotlib installed (which is what i was previously recommended to use).

3
  • Hi, did my answer help with your question? Commented Dec 12, 2019 at 2:01
  • 1
    @ShaunLowis It was very helpful but I'm still trying to figure out some basics with it :( For example, I'm getting errors when trying to read the csv to begin with. Seems to be something fundamental but I haven't figured it out yet. Commented Dec 12, 2019 at 19:28
  • That's fair, you can mark my answer as correct and then ask another question related to your errors and tag me in a comment and I could try and help? Commented Dec 12, 2019 at 20:01

3 Answers 3

2
+50

Since the answer by @ShaunLowis doesn't include a complete example I thought I'd add one. As far as reading the .csv file goes, the best way to do it in this case is probably to use pandas.read_csv() as the other answer points out. In this example I have named the file test.csv and placed it in the same directory from which I run the script

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("./test.csv")
names = df['Name'].values
x = np.arange(len(names))
w = 0.3
plt.bar(x-w, df['PreviousWeekProg'].values, width=w, label='PreviousWeekProg')
plt.bar(x, df['CurrentWeekProg'].values, width=w, label='CurrentWeekProg')
plt.bar(x+w, df['ExpectedProg'].values, width=w, label='ExpectedProg')
plt.plot(x, df['Target'].values, lw=2, label='Target')
plt.xticks(x, names)
plt.ylim([0,100])
plt.tight_layout()
plt.xlabel('X label')
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), fancybox=True, ncol=5)
plt.savefig("CSVBarplots.png", bbox_inches="tight")
plt.show()

enter image description here


Explanation

From the pandas docs for read_csv() (arguments extraneous to the example excluded),

pandas.read_csv(filepath_or_buffer)

Read a comma-separated values (csv) file into DataFrame.

filepath_or_buffer: str, path object or file-like object

Any valid string path is acceptable. The string could be a URL. [...] If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handler (e.g. via builtin open function) or StringIO.

In this example I am specifying the path to the file, not a file object.

names = df['Name'].values

This extracts the values in the 'Name' column and converts them to a numpy.ndarray object. In order to plot multiple bars with one name I reference this answer. However, in order to use this method, we need an x array of floats of the same length as the names array, hence

x = np.arange(len(names))

then set a width for the bars and offset the first and third bars accordingly, as outlines in the referenced answer

w = 0.3
plt.bar(x-w, df['PreviousWeekProg'].values, width=w, label='PreviousWeekProg')
plt.bar(x, df['CurrentWeekProg'].values, width=w, label='CurrentWeekProg')
plt.bar(x+w, df['ExpectedProg'].values, width=w, label='ExpectedProg')

from the matplotlib.pyplot.bar page (unused non-positional arguments excluded),

matplotlib.pyplot.bar(x, height, width=0.8)

The bars are positioned at x [...] their dimensions are given by width and height.

Each of x, height, and width may either be a scalar applying to all bars, or it may be a sequence of length N providing a separate value for each bar.

In this case, x and height are sequences of values (different for each bar) and width is a scalar (the same for each bar).

Next is the line for target which is pretty straightforward, simply plotting the x values created earlier against the values from the 'Target' column

plt.plot(x, df['Target'].values, lw=2, label='Target')

where lw specifies the linewidth. Disclaimer: if the target value isn't the same for each row of the .csv this will still work but may not look exactly how you want it to as is.

The next two lines,

plt.xticks(x, names)
plt.ylim([0,100])

just add the names below the bars at the appropriate x positions and then set the y limits to span the interval [0, 100].

The final touch here is to add the legend below the plot,

plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True)

see this answer for more on how to adjust this as desired.

Sign up to request clarification or add additional context in comments.

18 Comments

This was perfect. Your explanation was very detailed and on-point! Will take an hour for me to award the bounty.
@sfr Thanks, I’m glad it helped
I have an additional question. Instead of using plt.show is there any way to have it save to an image (jpeg or png) locally?
@sfr You want to use plt.savefig("filename.png"). Additionally I recommend using bbox_inches='tight' to remove the generous whitespace added around the output. (So in full plt.savefig("filename.png", bbox_inches='tight')
I just gave that a shot right before reading your comment and it worked great. thanks again for all your help.
|
1

I would recommend reading in your .csv file using the 'read_csv()' utility of the Pandas library like so:

import pandas as pd

df = pd.read_csv(filepath)

This stores the information in a Dataframe object. You can then access your columns by:

my_column = df['PreviousWeekProg']

After which you can call:

my_column.plot(kind='bar')

On whichever column you wish to plot. Configuring subplots is a different beast, for which I would recommend using matplotlib's pyplot .

I would recommend starting with this figure and axes object declarations, then going from there:

fig = plt.figure()
ax1 = plt.subplot()
ax2 = plt.subplot()
ax3 = plt.subplot()
ax4 = plt.subplot()

Where you can read more about adding in axes data here.

Let me know if this helps!

1 Comment

If you are struggling with the implementation, this post should help if I was unclear about anything: stackoverflow.com/questions/33631163/…
1

You can use the parameter hue in the package seaborn. First, you need to reshape you data set with the function melt:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df1 = df.melt(id_vars=['Name', 'Target'])
print(df1.head(10))

Output:

     Name  Target          variable  value
0     Dan      94  PreviousWeekProg     94
1  Jarrod      94  PreviousWeekProg     34
2   Chris      94  PreviousWeekProg     45
3     Sam      94  PreviousWeekProg     89
4   Aaron      94  PreviousWeekProg     12
5   Jenna      94  PreviousWeekProg     56
6    Eric      94  PreviousWeekProg     90
7     Dan      94   CurrentWeekProg     92
8  Jarrod      94   CurrentWeekProg     56
9   Chris      94   CurrentWeekProg     43

Now you can use the column 'variable' as your hue parameter in the function barplot:

fig, ax = plt.subplots(figsize=(10, 5)) # set the size of a figure
sns.barplot(x='Name', y='value', hue='variable', data=df1) # plot

xmin, xmax = plt.xlim() # get x-axis limits
ax.hlines(y=df1['Target'], xmin=xmin, xmax=xmax, color='red') # add multiple lines
# or ax.axhline(y=df1['Target'].max()) to add a single line

sns.set_style("whitegrid") # use the whitegrid style
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.06), ncol=4, frameon=False) # move legend to the bottom
plt.title('Student Progress', loc='center') # add title
plt.yticks(np.arange(df1['value'].min(), df1['value'].max()+1, 10.0)) # change tick frequency
plt.xlabel('') # set xlabel
plt.ylabel('') # set ylabel

plt.show() # show plot

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.