matplotlib-devel Mailing List for matplotlib

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Dec 15, 2009 at 9:57 AM, Andrew Straw <str...@as...> wrote:
>
>   notch_max = med + 1.57*iq/np.sqrt(row)
>   notch_min = med - 1.57*iq/np.sqrt(row)
>
> Is this code actually calculating a meaningful value? If so, what?
>

>From the statistics ignoramus in the room, so take this with a grain
of salt...  I'd write that code as

notch_max = med + (iq/2) * (pi/np.sqrt(row))

and it makes more sense.  The notch limits are an estimate of the
interval of the median, which is (one-half, for each up/down) the
q3-q1 range times a normalization factor which is pi/sqrt(n), where
n==row=len(d).  The 1/sqrt(n) makes some sense, as it's the usual
statistical error normalization factor.  The multiplication by pi, I'm
not so sure, and I can't find that exact formula in any quick stats
reference, but I'm sure someone who actually knows stats can point out
where it comes from.

Note that the code below does:

                if notch_max > q3:
                    notch_max = q3
                if notch_min < q1:
                    notch_min = q1

though matlab explicitly states in:

http://www.mathworks.com/access/helpdesk/help/toolbox/stats/boxplot.html

that

"""
Interval endpoints are the extremes of the notches or the centers of
the triangular markers. When the sample size is small, notches may
extend beyond the end of the box.
"""

So it seems to me that the more principled thing to do would be to
leave those notch markers outside the box if they land there, because
that's a warning of the robustness of the estimation. Clipping them to
q1/q3 is effectively hiding a problem...

cheers,

f

2003	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov (33)	Dec (20)
2004	Jan (7)	Feb (44)	Mar (51)	Apr (43)	May (43)	Jun (36)	Jul (61)	Aug (44)	Sep (25)	Oct (82)	Nov (97)	Dec (47)
2005	Jan (77)	Feb (143)	Mar (42)	Apr (31)	May (93)	Jun (93)	Jul (35)	Aug (78)	Sep (56)	Oct (44)	Nov (72)	Dec (75)
2006	Jan (116)	Feb (99)	Mar (181)	Apr (171)	May (112)	Jun (86)	Jul (91)	Aug (111)	Sep (77)	Oct (72)	Nov (57)	Dec (51)
2007	Jan (64)	Feb (116)	Mar (70)	Apr (74)	May (53)	Jun (40)	Jul (519)	Aug (151)	Sep (132)	Oct (74)	Nov (282)	Dec (190)
2008	Jan (141)	Feb (67)	Mar (69)	Apr (96)	May (227)	Jun (404)	Jul (399)	Aug (96)	Sep (120)	Oct (205)	Nov (126)	Dec (261)
2009	Jan (136)	Feb (136)	Mar (119)	Apr (124)	May (155)	Jun (98)	Jul (136)	Aug (292)	Sep (174)	Oct (126)	Nov (126)	Dec (79)
2010	Jan (109)	Feb (83)	Mar (139)	Apr (91)	May (79)	Jun (164)	Jul (184)	Aug (146)	Sep (163)	Oct (128)	Nov (70)	Dec (73)
2011	Jan (235)	Feb (165)	Mar (147)	Apr (86)	May (74)	Jun (118)	Jul (65)	Aug (75)	Sep (162)	Oct (94)	Nov (48)	Dec (44)
2012	Jan (49)	Feb (40)	Mar (88)	Apr (35)	May (52)	Jun (69)	Jul (90)	Aug (123)	Sep (112)	Oct (120)	Nov (105)	Dec (116)
2013	Jan (76)	Feb (26)	Mar (78)	Apr (43)	May (61)	Jun (53)	Jul (147)	Aug (85)	Sep (83)	Oct (122)	Nov (18)	Dec (27)
2014	Jan (58)	Feb (25)	Mar (49)	Apr (17)	May (29)	Jun (39)	Jul (53)	Aug (52)	Sep (35)	Oct (47)	Nov (110)	Dec (27)
2015	Jan (50)	Feb (93)	Mar (96)	Apr (30)	May (55)	Jun (83)	Jul (44)	Aug (8)	Sep (5)	Oct	Nov (1)	Dec (1)
2016	Jan	Feb	Mar (1)	Apr	May	Jun (2)	Jul	Aug (3)	Sep (1)	Oct (3)	Nov	Dec
2017	Jan	Feb (5)	Mar	Apr	May	Jun	Jul (3)	Aug	Sep (7)	Oct	Nov	Dec
2018	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
		1 (2)	2 (5)	3	4	5 (1)
6	7	8	9	10 (2)	11 (3)	12
13 (1)	14	15 (3)	16 (6)	17 (4)	18 (4)	19 (5)
20 (2)	21 (9)	22 (3)	23 (1)	24 (1)	25 (2)	26
27	28 (10)	29 (6)	30 (5)	31 (4)

matplotlib-devel Mailing List for matplotlib

matplotlib-devel — matplotlib developers