1

For the following code, I expect to see each number (1-25) to appear 4 times under 4 different percentile values. However, at percentiles 28 and 56 the results are not as expected. The 28th percentile should be 7 and the 56th percentile should be 14.

> quantile(1:25, seq(0,1,0.01), type=1)
  0%   1%   2%   3%   4%   5%   6%   7%   8%   9%  10%  11%  12%  13%  14%  15%  16%  17%  18%  19%  20%  21%  22%  23%  24%  25%  26%  27%  28% 
   1    1    1    1    1    2    2    2    2    3    3    3    3    4    4    4    4    5    5    5    5    6    6    6    6    7    7    7    8 
 29%  30%  31%  32%  33%  34%  35%  36%  37%  38%  39%  40%  41%  42%  43%  44%  45%  46%  47%  48%  49%  50%  51%  52%  53%  54%  55%  56%  57% 
   8    8    8    8    9    9    9    9   10   10   10   10   11   11   11   11   12   12   12   12   13   13   13   13   14   14   14   15   15 
 58%  59%  60%  61%  62%  63%  64%  65%  66%  67%  68%  69%  70%  71%  72%  73%  74%  75%  76%  77%  78%  79%  80%  81%  82%  83%  84%  85%  86% 
  15   15   15   16   16   16   16   17   17   17   17   18   18   18   18   19   19   19   19   20   20   20   20   21   21   21   21   22   22 
 87%  88%  89%  90%  91%  92%  93%  94%  95%  96%  97%  98%  99% 100% 
  22   22   23   23   23   23   24   24   24   24   25   25   25   25 

If we use a different size vector, the same (and more) percentile values will not be correct.

Another example where the 7th, 14th, 28th, 55th and 56th percentile values are wrong:

> quantile(1:100, seq(0,1,0.01), type=1)
  0%   1%   2%   3%   4%   5%   6%   7%   8%   9%  10%  11%  12%  13%  14%  15%  16%  17%  18%  19%  20%  21%  22%  23%  24%  25%  26%  27%  28% 
   1    1    2    3    4    5    6    8    8    9   10   11   12   13   15   15   16   17   18   19   20   21   22   23   24   25   26   27   29 
 29%  30%  31%  32%  33%  34%  35%  36%  37%  38%  39%  40%  41%  42%  43%  44%  45%  46%  47%  48%  49%  50%  51%  52%  53%  54%  55%  56%  57% 
  29   30   31   32   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   56   57   58 
 58%  59%  60%  61%  62%  63%  64%  65%  66%  67%  68%  69%  70%  71%  72%  73%  74%  75%  76%  77%  78%  79%  80%  81%  82%  83%  84%  85%  86% 
  58   59   60   61   62   63   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80   81   82   83   84   85   86 
 87%  88%  89%  90%  91%  92%  93%  94%  95%  96%  97%  98%  99% 100% 
  87   88   89   90   91   92   93   94   95   96   97   98   99  100 

What is the reason for this? Is this a bug?

2
  • This might be simpler to see quantile(1:5, seq(0,1,0.05), type=1) Commented Mar 7, 2017 at 21:36
  • Isn't 1 appearing 5 times in your first result? Commented Mar 7, 2017 at 22:39

2 Answers 2

1

I think you are wrong in saying the answers are "not correct." Remember that this is based on the ECDF meaning it is an empirical result not a theoretical one, and the 9 different methods are just about what to do when the empirical results can't be used but you need to use some method to interpolate etc. Using your first example if we take the integers from 1 to 25 and calculate a conventional cumulative distribution we ge

Values Freq Percent Cum. Percent  
 1      1    4       4           
 2      1    4       8           
 3      1    4       12          
 4      1    4       16          
 5      1    4       20          
 6      1    4       24          
 7      1    4       28          
 8      1    4       32          
 9      1    4       36          
 10     1    4       40          
 11     1    4       44          
 12     1    4       48          
 13     1    4       52          
 14     1    4       56          
 15     1    4       60          
 16     1    4       64          
 17     1    4       68          
 18     1    4       72          
 19     1    4       76          
 20     1    4       80          
 21     1    4       84          
 22     1    4       88          
 23     1    4       92          
 24     1    4       96          
 25     1    4       100 

So each observed value represents 4 percent of your sample. What do 28 and 56 have in common? They are both multiples of 4 and they are the only two places where they also represent the lower boundary of the "quantile range" for a value. So they get pushed up. I think Type 2 actually best illustrates this.

a<-quantile(y, seq(0,1,0.01), type=1)
b<-quantile(y, seq(0,1,0.01), type=2)
c<-quantile(y, seq(0,1,0.01), type=3)
quantiles<-data.frame(a,b,c)
quantiles[1:30,]

      a    b  c
0%    1  1.0  1
1%    1  1.0  1
2%    1  1.0  1
3%    1  1.0  1
4%    1  1.5  1
5%    2  2.0  1
6%    2  2.0  2
7%    2  2.0  2
8%    2  2.5  2
9%    3  3.0  2
10%   3  3.0  2
11%   3  3.0  3
12%   3  3.5  3
13%   4  4.0  3
14%   4  4.0  4
15%   4  4.0  4
16%   4  4.5  4
17%   5  5.0  4
18%   5  5.0  4
19%   5  5.0  5
20%   5  5.5  5
21%   6  6.0  5
22%   6  6.0  6
23%   6  6.0  6
24%   6  6.5  6
25%   7  7.0  6
26%   7  7.0  6
27%   7  7.0  7
28%   8  8.0  7
29%   8  8.0  7
30%   8  8.0  8
31%   8  8.0  8
32%   8  8.5  8
33%   9  9.0  8

Type 3 is out on a different planet with the odd/even thing.

Sign up to request clarification or add additional context in comments.

2 Comments

I don't understand what you mean by saying that these are the only places where they represent the lower boundary of the quantile range. Doesn't every percentile that is a multiple of 4 represent the lower boundary of a quantile range?
No sometimes they are in the middle or at the top. Look at 4, 8 and 12 for example.
0

There are different methods for calculating quantiles which can give slightly different answers. You are using type 1. Types 3 or 4 give the answers that you are expecting.

See ?quantile for the details.

quantile(1:25, seq(0, 1, 0.01), type = 4)[29] 28% 7

3 Comments

They don't seem to give me the right answers. Type 3 gives the same as Type 1, and type 2 interpolates quantile(1:5, seq(0,1,0.05), type = 3)
What I am seeing is that type 3 and 4 are providing the same results for the first example. See this: quantile(1:25, seq(0,1,0.01), type=3)
Edited answer with output from my system. Perhaps it is platform-dependent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.