5

How to make a boxplot where each row in my dataframe object is a box in the plot?

I have some stock data that I want to plot with a box plot. My data is from yahoo finance and includes Open, High, Low, Close, Adjusted Close and Volume data for each trading day. I want to plot a box plot where each box is 1 day of OHLC price action.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.io.data import DataReader

# get daily stock price data from yahoo finance for S&P500
SP = DataReader("^GSPC", "yahoo") 

SP.head()
             Open        High        Low         Close       Volume          Adj Close
Date                        
2010-01-04   1116.56     1133.87     1116.56     1132.99     3991400000      1132.99
2010-01-05   1132.66     1136.63     1129.66     1136.52     2491020000      1136.52
2010-01-06   1135.71     1139.19     1133.95     1137.14     4972660000      1137.14
2010-01-07   1136.27     1142.46     1131.32     1141.69     5270680000      1141.69
2010-01-08   1140.52     1145.39     1136.22     1144.98     4389590000      1144.98

plt.figure()
bp = SP.boxplot()

But when I plot this data frame as a boxplot, I only get one box with the Open, High, Low, and Close values of the entire Volume column.

Likewise, I try re-sampling my Adjusted Close daily price data to get weekly OHLC:

close = SP['Adj Close']
wk = close.resample('W', how='ohlc')
wk.head()

             open        high        low         close
Date                
2010-01-10   1132.99     1144.98     1132.99     1144.98
2010-01-17   1146.98     1148.46     1136.03     1136.03
2010-01-24   1150.23     1150.23     1091.76     1091.76
2010-01-31   1096.78     1097.50     1073.87     1073.87
2010-02-07   1089.19     1103.32     1063.11     1066.19

This yields a Box Plot with 4 Boxes. Each box is the range of each column, not row. So for example, the first Box, 'open', shows the Open, Close, High and Low of the entire 'open' Column.

But what I actually want is 1 box for each 'Date' (index or row of my DataFrame). So the first Box will show the OHLC of the first row, '2010-01-10'. Second box will be the second row ('2010-01-17').

What I really want though is each row in my original Daily data (SP DataFrame) is its own OHLC Box. Essentially I want daily candlesticks, generated as a boxplot().

                 Open        High        Low         Close     
    Date                        
    2010-01-04   1116.56     1133.87     1116.56     1132.99

How do I do this using the Pandas DataFrame and Matplotlib boxplot()? I just want a basic boxplot plot where each row from the DataFrame is a OHLC box in the plot. Nothing fancy at this point. Thanks!

5
  • I think you want something like this: github.com/pydata/pandas/issues/783 It's not implemented yet, but there may be some suggestions there that help you. Commented Dec 13, 2013 at 19:16
  • @TomAugspurger also this: github.com/matplotlib/matplotlib/pull/2643 Commented Dec 13, 2013 at 19:29
  • 1
    Oh geez, there's matplotlib.finance.candlestick matplotlib.org/examples/pylab_examples/… Commented Dec 13, 2013 at 20:12
  • I simply want to graph a boxplot where each box is a row of my DataFrame. My first piece of code is graphing a boxplot where each box is a column of my DataFrame. I want each row to be a box. Understand? I would think this should be quite easy, im just not able to figure out how. Do I need to transpose or unstack something? Commented Dec 13, 2013 at 21:53
  • 1
    A couple of points: 1) boxplots and candlestick graphs, despite their similar appearance, are conceptually quite different; 2) you actually are trying to make a candlestick graph; 3) even if you transposed your data with SP.T, the boxplot method will not produce what you want it to and; 4) the real change you face is figuring out how to take your dataframe and turn it into a format that matplotlib.finance.candlestick can use. Commented Dec 13, 2013 at 23:16

1 Answer 1

5

As I said in the comments, you don't really want boxplots. Instead you should be making a candlestick chart. Here's some code to get you started.

import numpy as np
import pandas
import matplotlib.pyplot as plt
from matplotlib.finance import candlestick, candlestick2
import matplotlib.dates as mdates
from pandas.io.data import DataReader

# get daily stock price data from yahoo finance for S&P500
SP = DataReader("^GSPC", "yahoo")
SP.reset_index(inplace=True)
print(SP.columns)
SP['Date2'] = SP['Date'].apply(lambda date: mdates.date2num(date.to_pydatetime()))
fig, ax = plt.subplots()
csticks = candlestick(ax, SP[['Date2', 'Open', 'Close', 'High', 'Low']].values)
plt.show()
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Paul H. That helps. So the daily data from yahoo finance DataReader already has OHLC values for each day. I would rather not re-sample to get monthly or weekly OHLC data. How can I use SP from SP=DataReader("^GSPC", "yahoo") to plot daily candlestick charts? Thanks.
@brno792 sure can...see the edits (it's all the same code, just using a different dataframe)
Thanks Paul H. I get an error though on the candlestick() line: KeyError: "['open' 'close' 'high' 'low'] not in index".
@brno792 just make those match your column names

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.