ORACLE SQL select distinct not removing duplicates

Question

I have the following tables; format: table_name[column1, column2, etc..]

VENDOR_ORDERS [ORDER_ID, ORDER_CREATION_DATETIME, REGION_ID, ZIP_CODE, AMOUNT]
CALENDAR [CALENDAR_WEEK, CALENDAR_DATE]

basically what i'm trying to achieve is writing a query that will give me:

the COUNT(ORDER_ID) and SUM(AMOUNT) per CALENDAR_WEEK for every REGION_ID and DISTINCT(ZIP_CODE)

so the results should look something like this:

ZIP_CODE    CALENDAR_WEEK    REGION_ID    COUNT(ORDER_ID)    SUM(AMOUNT)
                            --------------------
XXXXX           01              1             50               987.45
YYYYY           01              1             25               568.32
ZZZZZ           01              1             30               555.63
MMMMM           01              1             10               099.93
XXXXX           15              1             05               999.34
YYYYY           15              1             32               339.67
ZZZZZ           15              1             21               457.23
MMMMM           15              1             88               459.99

i used the following code:

SELECT
    DISTINCT(vo.ZIP_CODE)
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,TRUNC(vo.ORDER_CREATION_DATETIME) -- this column is not needed, i just added it for visualization purposes
    ,vo.REGION_ID
    ,COUNT(vo.ORDER_ID)
    ,SUM(vo.AMOUNT)
FROM
    VENDOR_ORDERS vo
    ,CALENDAR ca
WHERE   
    TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
    AND vo.REGION_ID = 1
GROUP BY
    vo.ZIP_CODE
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,vo.ORDER_CREATION_DATETIME
    ,vc.REGION_ID;

the problem is that i'm not getting DISTINCT(ZIP_CODE) per CALENDAR_WEEK, i'm having repeated ZIP_CODE for the same CALENDAR_WEEK, same REGION_ID but different COUNT(ORDER_ID) and SUM(AMOUNT)

i hope i made myself clear. thanks in advance for the help

distinct is NOT a function. It is always applied for all columns in the select list. — user330315
– user330315, Commented Mar 8, 2016 at 13:32
You cannot group by date column and expect the results to be grouped by a week! remove it from the group by and either remove it from the select or select it with min or max — sagi
– sagi, Commented Mar 8, 2016 at 13:35

Gordon Linoff · Accepted Answer · 2016-03-08 13:32:52Z

7

You misunderstand what distinct is. It is not a function. It is a modifier on select and it affects all columns being selected. So, it is behaving exactly as it should.

If you want aggregations by zip code and week, then those are the only two columns that should be in the group by:

SELECT vo.ZIP_CODE, TO_CHAR(ca.CALENDAR_WEEK),
       -- vo.REGION_ID
        COUNT(vo.ORDER_ID),
        SUM(vo.AMOUNT)
FROM VENDOR_ORDERS vo JOIN
     CALENDAR ca
     ON TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
WHERE vo.REGION_ID = 1
GROUP BY vo.ZIP_CODE, TO_CHAR(ca.CALENDAR_WEEK)

You could probably include region_id as well, assuming that each zip code is in one region.

answered Mar 8, 2016 at 13:32

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

nachomasterCR Over a year ago

one last question, i followed your advice but kept the DISTINCT and it seems i got exactly what i wanted. i see you removed it in your suggestion, is there much of a difference?

Gordon Linoff Over a year ago

There is almost never a need to use select distinct with a group by query. In the very rare circumstance that you need it, you'll understand SQL well-enough to use it. So, don't include it because it is misleading.

Thomas G · Accepted Answer · 2016-03-08 13:48:40Z

Your DISTINCT has no purpose in this query it will be applied to all columns and not to ORDER_ID only as you think. Think about this: if you have several ORDER_ID with different values for all other columns, how Oracle would know which one to return ??

Additionnaly it is useless to specify the DISTINCT because you are doing a GROUP BY which finally achieve the same results.

And last but not least, you're wrong when you say this in your comments:

-- this column is not needed, i just added it for visualization

You need it in your SELECT because it is an essential field of your GROUP BY

Without seing data sample I can't say it 100%, but your issue is probably due to the fact that in your select you make a TRUNC on your datetime field, and not in your GROUP BY clause. So it doesn't return what you want and you don't understand why because your select show you a truncated date, you think that the GROUP BY worked also on date, but its not the case, it grouped on DATE and TIME

To understand your issue, do:

SELECT
    DISTINCT(vo.ZIP_CODE)
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,vo.ORDER_CREATION_DATETIME 
    ,vo.REGION_ID
    ,COUNT(vo.ORDER_ID)
    ,SUM(vo.AMOUNT)
FROM
    VENDOR_ORDERS vo
    ,CALENDAR ca
WHERE   
    TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
    AND vo.REGION_ID = 1
GROUP BY
    vo.ZIP_CODE
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,vo.ORDER_CREATION_DATETIME
    ,vc.REGION_ID;

To fix your issue, do:

SELECT
    DISTINCT(vo.ZIP_CODE)
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,TRUNC(vo.ORDER_CREATION_DATETIME) 
    ,vo.REGION_ID
    ,COUNT(vo.ORDER_ID)
    ,SUM(vo.AMOUNT)
FROM
    VENDOR_ORDERS vo
    ,CALENDAR ca
WHERE   
    TRUNC(vo.ORDER_CREATION_DATETIME) = sd.CALENDAR_DATE
    AND vo.REGION_ID = 1
GROUP BY
    vo.ZIP_CODE
    ,TO_CHAR(ca.CALENDAR_WEEK)
    ,TRUNC(vo.ORDER_CREATION_DATETIME)
    ,vc.REGION_ID;

Collectives™ on Stack Overflow

ORACLE SQL select distinct not removing duplicates

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related