0

I have a table with histogram type data. There are 2 columns: Bucket, Count.

Bucket is the histogram bucket and Count is the number of values in that bucket.

Now my buckets are ordered so for example, let's say that the bucket indicates minutes it took to complete a task. We could have buckets such as 0-5 minutes, 5-10 minutes, 10-15, etc.

What I'm trying to compute is which bucket falls in the XXth percentile. For example, if 90% of tasks complete in 12 minutes, then I want to know that 90% of tasks are in the 10-15 bucket or less.

As an example, say I have the following table:

Bucket | Count
--------------
  0    | 10
  1    | 15
  2    | 5
  3    | 15

If I want to compute the 60th percentile. It'd be

(10+15+5+15)*.60 = 27 so the result would be bucket 2 since 60% of all entries are in bucket 2 or less

Is there a way to compute this in SQL?

Thanks!

4
  • What is a "histogram" data type? Commented Apr 20, 2012 at 18:00
  • The data in my table is formatted in such a way as to optimize hisogram views. So just like I said above, I have a column for the bucket, then Count for the number of "tasks" that fall into that bucket. Commented Apr 20, 2012 at 18:03
  • Can you show the CREATE TABLE and sample insert statements, so that those of us not fluent in histograms can piece together what you're talking about? Commented Apr 20, 2012 at 18:06
  • Which SQL Server version ? SELECT @@VERSION Commented Apr 20, 2012 at 19:10

2 Answers 2

0

Note: COUNT is a reserved SQL word so maybe you should use valueCount.

Should be something like that (assuming your table is called histogramTable):

SELECT bucket, 
    (SELECT SUM(valuecount) 
        FROM histogramTable AS in1 
        WHERE in1.bucket <= ot.bucket
    ) * 100 / (
            SELECT SUM(valueCOUNT) 
            FROM histogramTable AS in1
            ) As Percentile
FROM histogramTable AS ot 

Of course, I didn't convert any value in DECIMAL data type, so you'll be losing some precision with your Percentile column.

Then let's say you need the bucket representing Percentile 80:

DECLARE @Percentile AS INT
SET @Percentile = 80

SELECT TOP 1 bucket FROM(
SELECT bucket, 
    (SELECT SUM(valuecount) 
        FROM histogramTable AS in1 
        WHERE in1.bucket <= ot.bucket
    ) * 100 / (
            SELECT SUM(valueCOUNT) 
            FROM histogramTable AS in1
            ) As Percentile
FROM histogramTable AS ot 
) AS h
WHERE h.Percentile > @Percentile
ORDER BY Percentile;
Sign up to request clarification or add additional context in comments.

6 Comments

I just saw your edited Question so let me some minutes to adapt my answer to fit your needs. :)
Your suggestion led to to the following which does work but run extremely slow: SELECT bucket, (SELECT SUM(valuecount) FROM mytable AS in1 WHERE in1.bucket <= ot.bucket) * 100 / (SELECT SUM(COUNT) FROM mytable AS in1) FROM mytable AS ot GROUP BY bucket ORDER BY bucket Is there a way to do cumulative sums in SQL?
SELECT (10+15+5+15)*.75 AS A, (10+15+5+15)*75/100 AS B
@BogdanSahlean : Were you serious with your comment? If so, it's a database and he can't predict the values the table will contain, neither the number of values.
@FrancisP: My comment refers to arithmetic calculations: 10/4 <> 10/4.0 (classicasp.aspfaq.com/general/why-does-4/5-0.html).
|
0

Starting with SQL Server 2012, there are now SQL standard PERCENTILE_DISC and PERCENTILE_CONT inverse distribution functions, which can be used for this purpose. Unfortunately, thus far, SQL Server implements them only as window functions, not as aggregate functions.

They are not very useful on the data set that you've shown (which seems pre-aggregated), but they would definitely help on the source data set, where you could simply calculate:

SELECT DISTINCT percentile_disc(0.6) WITHIN GROUP (ORDER BY bucket) OVER ()
FROM t

I have blogged about percentiles in SQL here, in more detail.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.