1

I'm using matplotlib to plot the distribution of a data set, and want to overlay vertical lines for the confidence interval.

The density plot looks fine, but I don't see the line. Any ideas?

# Get data
import urllib.request as request
request.urlretrieve('http://seanlahman.com/files/database/baseballdatabank-master_2016-03-02.zip', "baseballdatabank-master_2016-03-02.zip")
from zipfile import ZipFile
zip = ZipFile('baseballdatabank-master_2016-03-02.zip')
zip.extractall()

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

batting_df = pd.read_csv("baseballdatabank-master\core\Batting.csv")
batting_df = batting_df[batting_df['AB'] > 20]
batting_df['batting_average'] = batting_df['H'] / batting_df['AB']

# Plot distribution
batting_averages = batting_df['batting_average'].dropna()
batting_averages.plot.kde()

# Plot confidence interval
import scipy.stats
import numpy as np, scipy.stats as st
stderr = st.sem(batting_averages)
interval1 = (batting_averages.mean() - stderr * 1.96, batting_averages.mean() + stderr * 1.96)
plt.plot(interval1[0], 12)
plt.show()

I'm trying to plot the vertical line at the x coordinate of the first interval, which is centered around the mean. I passed 12 as the y coordinate as this is highest value shown on the y axis.

3
  • 1
    Your y argument to plot is unitary. You'll need your kde x as the first argument, and your interval as the y... Commented Aug 18, 2016 at 2:09
  • 1
    Could you expand your example with the kind of data you have? Commented Aug 18, 2016 at 11:54
  • Found the answer, in that plot takes two points as arguments. Commented Aug 18, 2016 at 13:15

2 Answers 2

3

If you catch the axes of the kde plot like this:

ax = batting_averages.plot.kde()

... then you can plot vertical lines at any position you want:

stderr = st.sem(batting_averages)
ax.vlines( x=batting_averages.mean(), ymin=-1, ymax=15, color='red', label='mean' )
stderr = 0.1
ax.vlines( x=batting_averages.mean() - stderr * 1.96, ymin=-1, ymax=15, color='green', label='95% CI' )
ax.vlines( x=batting_averages.mean() + stderr * 1.96, ymin=-1, ymax=15, color='green' )
ax.set_ylim([-1,12])
ax.legend()
plt.show()

which gives you the following graph:

enter image description here

(note that I changed the standard error to make the lines visible)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is useful.
0

Plot takes two arguments, x and y. In this case, I need to pass the x coordinates of the two points that define the line, followed by the y coordinates of the two points:

plot((x1, x2), (y1, y2))

Substituting the variables from the example above:

plt.plot((interval1[0], interval1[0]), (0, 12))
plt.plot((interval1[1], interval1[1]), (0, 12))

See: vertical & horizontal lines in matplotlib

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.