3

How can you join between a table with a sparse number of dates and another table with an exhaustive number of dates such that the gaps between the sparse dates take the values of the previous sparse date?

Illustrative example:

PRICE table (sparse dates):
date        itemid  price
2008-12-04  1       $1
2008-12-11  1       $3
2008-12-15  1       $7


VOLUME table (exhaustive dates):
date         itemid  volume_amt
2008-12-04   1       12345
2008-12-05   1       23456
2008-12-08   1       34567
2008-12-09   1       ...
2008-12-10   1
2008-12-11   1
2008-12-12   1
2008-12-15   1
2008-12-16   1
2008-12-17   1
2008-12-18   1

Desired result:

date       price  volume_amt
2008-12-04 $1     12345
2008-12-05 $1     23456
2008-12-08 $1     34567
2008-12-09 $1     ...
2008-12-10 $1
2008-12-11 $3
2008-12-12 $3
2008-12-15 $7
2008-12-16 $7
2008-12-17 $7
2008-12-18 $7

Update:

A couple people have suggested a correlated subquery that accomplishes the desired result. (Correlated subquery = a subquery that contains a reference to the outer query.)

This will work; however, I should have noted that the platform I'm using is MySQL, for which correlated subqueries are poorly optimized. Any way to do it without using a correlated subquery?

4
  • Can you edit the title or delete and start over? Commented Dec 18, 2008 at 17:23
  • looks like someone else did it. yeah, wasn't paying attention there. Commented Dec 18, 2008 at 17:26
  • needs column names of the input and the desired output. Right now the phrasing of the question is rather ambiguous Commented Dec 18, 2008 at 17:26
  • added column names, desired output was already present Commented Dec 18, 2008 at 17:32

5 Answers 5

6

This isn't as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.

EXPLAIN SELECT v.`date`, v.volume_amt, p1.item_id, p1.price
FROM Volume v JOIN Price p1
  ON (v.`date` >= p1.`date` AND v.item_id = p1.item_id)
LEFT OUTER JOIN Price p2
  ON (v.`date` >= p2.`date` AND v.item_id = p2.item_id
    AND p1.`date` < p2.`date`)
WHERE p2.item_id IS NULL;

This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.

I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.

edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on (date) and an index on (date, item_id) and the EXPLAIN plan was identical. An index on (item_id, date) may be better in this case. Here's the EXPLAIN output for that:

+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref             | rows | Extra                                |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
|  1 | SIMPLE      | p1    | ALL  | item_id       | NULL    | NULL    | NULL            |    6 |                                      | 
|  1 | SIMPLE      | v     | ref  | item_id       | item_id | 22      | test.p1.item_id |    3 | Using where                          | 
|  1 | SIMPLE      | p2    | ref  | item_id       | item_id | 22      | test.v.item_id  |    1 | Using where; Using index; Not exists | 
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+

But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.

edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the (item_id, date) index.

Sign up to request clarification or add additional context in comments.

1 Comment

Looks good so far, What if we assume there could be more than one possible item_id, just stick v.item_id=p1.item_id and v.item_id=p2.item_id in the on clauses, right? Also, if you're really feeling up to it, is it better to add indexes on (item_id, date) or (date, item_id)?
3

Assuming there is only 1 price per date/itemid:

select v.date, v.itemid, p.price
from volume v
join price p on p.itemid = v.item_id
where p.date = (select max(p2.date) from price p2
                where p2.itemid = v.itemid
                and p2.date <= v.date);

4 Comments

Same answer as mine, but I'll admit that you got it right faster even if mine shows that I posted sooner.
actually I believe this contains a mistake: the p.date condition belongs in the join condition, not in the where clause.
Not in this case, but it would if you were doing an outer join.
my comment about the p.date comparison belonging in the where or on clauses was not correct; it doesn't make a difference.
2
SELECT v.date, p.price, v.volume
FROM volume v
LEFT JOIN Price p ON p.itemID=v.itemID
    AND p.[date] = (
                    SELECT MAX([date] )
                    FROM price p2 
                    WHERE p2.[date] <= v.[date] AND p2.itemid= v.itemid
                    GROUP BY p2.[date]
                   )

1 Comment

won't work -- will return volume only on the days when there is a new price. desired output is the volume should be returned for the price on that day or the nearest previous day for which there is a new price.
0
 SELECT  Volume.date, volume.itemid, price.price, volume.volume_amt
 FROM Volume 
 LEFT OUTER JOIN Price
 ON Volume.date = Price.date

Probably. My SQL-fu is weak

2 Comments

Nope, that would give him blanks for price in the rows that don't have exact matches.
Ah, gotcha. We're doing share trades or something and want last-traded-price. <thinks>
0

This method works in Oracle. Don't know about other databases, and you didn't specify. If this exact syntax doesn't work in your database, I would guess there are similar techniques.

dev> select * from price;

AS_OF               ID     AMOUNT
----------- ---------- ----------
04-Dec-2008          1          1
11-Dec-2008          1          2
15-Dec-2008          1          3

dev> select * from volume;

DAY                 ID     VOLUME
----------- ---------- ----------
05-Dec-2008          1          1
06-Dec-2008          1          2
07-Dec-2008          1          3
08-Dec-2008          1          4
09-Dec-2008          1          5
10-Dec-2008          1          6
11-Dec-2008          1          7
12-Dec-2008          1          8
13-Dec-2008          1          9
14-Dec-2008          1         10
15-Dec-2008          1         11
16-Dec-2008          1         12
17-Dec-2008          1         13
18-Dec-2008          1         14
19-Dec-2008          1         15
20-Dec-2008          1         16
21-Dec-2008          1         17
22-Dec-2008          1         18
23-Dec-2008          1         19

dev> select day, volume, amount from (
  2    select day, volume, (select max(as_of) from price p where p.id = v.id and as_of <= day) price_as_of
  3      from volume v
  4  )
  5  join price on as_of = price_as_of
  6  order by day;

DAY             VOLUME     AMOUNT
----------- ---------- ----------
05-Dec-2008          1          1
06-Dec-2008          2          1
07-Dec-2008          3          1
08-Dec-2008          4          1
09-Dec-2008          5          1
10-Dec-2008          6          1
11-Dec-2008          7          2
12-Dec-2008          8          2
13-Dec-2008          9          2
14-Dec-2008         10          2
15-Dec-2008         11          3
16-Dec-2008         12          3
17-Dec-2008         13          3
18-Dec-2008         14          3
19-Dec-2008         15          3
20-Dec-2008         16          3
21-Dec-2008         17          3
22-Dec-2008         18          3
23-Dec-2008         19          3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.