0

I have a table (dataset_final) that contains data on the number of sales (field quantity) of goods in a particular store for a particular week of the year. Unique goods about 200 thousand, about 50 stores, the period of 6 years.

dataset_final

    +---------+-------------+---------+----------+----------+
    | year_id | week_number | good_id | store_id | quantity |
    +---------+-------------+---------+----------+----------+
    | 2017    | 37          | 137233  | 9        | 1        |
    +---------+-------------+---------+----------+----------+
    | 2017    | 38          | 137233  | 9        | 4        |
    +---------+-------------+---------+----------+----------+
    | 2017    | 40          | 137233  | 9        | 3        |
    +---------+-------------+---------+----------+----------+
    | 2016    | 35          | 152501  | 23       | 6        |
    +---------+-------------+---------+----------+----------+
    | 2016    | 37          | 152501  | 23       | 3        |
    +---------+-------------+---------+----------+----------+

I would like the missing values, i.e. when the combination of good and store was not sold in a certain week of the year, to fill in the zero. For example.

+---------+-------------+---------+----------+----------+
| year_id | week_number | good_id | store_id | quantity |
+---------+-------------+---------+----------+----------+
| 2017    | 37          | 137233  | 9        | 1        |
+---------+-------------+---------+----------+----------+
| 2017    | 38          | 137233  | 9        | 4        |
+---------+-------------+---------+----------+----------+
| 2017    | 40          | 137233  | 9        | 3        |
+---------+-------------+---------+----------+----------+
| 2016    | 35          | 152501  | 23       | 6        |
+---------+-------------+---------+----------+----------+
| 2016    | 37          | 152501  | 23       | 3        |
+---------+-------------+---------+----------+----------+
| 2017    | 39          | 137233  | 9        | 0        |
+---------+-------------+---------+----------+----------+
| 2016    | 36          | 152501  | 23       | 0        |
+---------+-------------+---------+----------+----------+

I wanted to do this: find all unique combinations of year_id, week_number, good_id, store_id and add only those that are not in the dataset_final table. My query:

WITH t1 AS (SELECT  DISTINCT 
       [year_id]
      ,[week_number]
      ,[good_id]
      ,[store_id]

FROM [fs_db].[dbo].[ds_dataset_final]),

t2 AS (SELECT  DISTINCT [year_id], [week_number] FROM [fs_db].[dbo].[ds_dataset_final])

SELECT t2.[year_id], t2.[week_number],  t1.[good_id], t1. [store_id] FROM t1

full join t2 ON t2.[year_id]=t1.[year_id]  AND t2.[week_number]=t2.[week_number]

This query produces about 1.2 billion unique combinations, which seems too much.

Also, I take into account the combination only from the beginning of sales of goods, for example, if the table has sales of a particular product only from 2017, then I do not need to fill in earlier data.

4
  • What is your MSSQL Version? Commented Jul 14, 2019 at 10:09
  • 1
    You'll need a calendar table, or similar. Do you have a table with a list of your stores, and relevant weeks and years? You have different values for goods_id too, Are you after one row for every year, month, store and good? Commented Jul 14, 2019 at 10:09
  • My version is MSSQL 2017 Commented Jul 14, 2019 at 10:10
  • I have only a table with a list of stores and goods. Yes, the values for good_id are different. Commented Jul 14, 2019 at 10:12

2 Answers 2

1

The basic idea is to general all the rows using cross join and then use left join to bring in the values.

Assuming you have all year/week combinations in your original table and have all the goods and stores in the table, you can use:

select vw.year_id, vw.week_number,
       g.good_id, s.store_id,
       coalesce(d.quantity, 0) as quantity
from (select distinct year_id, week_number
      from fs_db..ds_dataset_final
     ) yw cross join
     (select distinct good_id
      from fs_db..ds_dataset_final
     ) g cross join
     (select distinct store_id
      from fs_db..ds_dataset_final
     ) s left join
     fs_db..ds_dataset_final d
     on d.year_id = vw.year_id and
        d.week_number = vw.week_number and
        d.good_id = g.good_id and
        d.store_id = s.store_id;

You may have other sources for each of the dimensions (such as a proper dimension table). If so, don't use select distinct but use the reference tables.

EDIT:

Just add as the last line the in the query:

where yw.year >= 2015 and yw.year < 2019

if you want the years 2015, 2016, 2017, and 2018.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks, it's working! But I would like to insert into the table only the data with the beginning of sales. For example, a combination of good_id (152501) and store_id (23) has sales only since the beginning of 2016, but the table has data on sales of other products since 2013, @gordon-linoff
@Rabbit . . . Filter on the dates you want in the subquery for yw or using yw in an outer where clause.
I don't quite understand how this can be done for all good-store combinations at once...
@Rabbit . . . Any filter on the time period use yw applies to all the good-store combinations in the result set.
...Can you write an example of such a query?
|
1

This is very much pseudo SQL in the absence of what your actual database looks like, it should, however, get you on the right path. You'll need to replace the objects like dbo.Store with your actual objects, and I suggest creating a proper calendar table:

--This shoudl really be a full calendar table, but we'll making a sample here
CREATE TABLE dbo.Weeks (Year int,
                        Week int);

INSERT INTO dbo.Weeks (Year, Week)
SELECT Y.Year,
       W.Week
FROM (VALUES(2016),(2017),(2018),(2019))Y(Year)
     CROSS APPLY (SELECT TOP 52 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Week
                  FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N1(N),
                       (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N2(N)) W

GO

WITH CTE AS(
    SELECT W.Year,
           W.Week,
           S.StoreID,
           G.GoodsID
    FROM dbo.Weeks W
         CROSS JOIN dbo.Store S
         CROSS JOIN dbo.Goods G
   WHERE EXISTS (SELECT 1
                 FROM dbo.YourTable YT
                 WHERE YT.year_id <= W.Year
                   AND YT.store_id = S.StoreID))
SELECT C.Year,
       C.Week,
       C.StoreID,
       C.GoodsID,
       ISNULL(YT.quantity,0) AS quantity
FROM CTE C
     LEFT JOIN YourTable YT ON C.Year = YT.year_id
                           AND C.Week = YT.week_number
                           AND C.StoreID = YT.store_id
                           AND C.GoodsID = YT.good_id
--WHERE?

5 Comments

Will this query eventually generate all unique combinations of year_id, week_number, good_id, store_id based on my tables?
Why don't you try and find out, @Rabbit? What do you think the CTE is doing?
Had a "brain fart" on what I'd called/aliases things though. Coffee clearly not kicked in this morning.
I think this query will generate unique combinations) But I would like to insert into the table only the data with the beginning of sales. For example, a combination of good_id (152501) and store_id (23) has sales only since the beginning of 2016, but the table has data on sales of other products since 2013.
Added an EXISTS clause for you, @Rabbit

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.