0

I created a large database of all my prices and all my competitor prices with date and location information.

I want to narrow my database to only the "true" competitors on a location and price basis because we charge different prices in different locations. For example, I just want the count of competitors that charge $1 below or above me.

My current code stalls and does not yield results. I think it is because of my implementation of JOIN ON.

To debug, I seperated it out and got results for my first two tables no problem. Exactly what I was going for. With the third table "TrueComps", no such luck.

It's complex as a result of joining 3 tables. I am new to SQL and am thus interested in learning new solutions. I believe there is a better solution than this:

WITH 
RentDotComOnly AS
(
  SELECT 
    distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
    COUNT(distinct apt_unique_id) AS "rent_count_clean_zip", 
    -- AVG((low_price+high_price)/2) AS "rent_avg_price", 
    0.85*min(low_price) AS "rent_lower_bound", 
    1.15*max(high_price) AS "rent_upper_bound"
  FROM 
    archived_apartments 
  WHERE 
    source_type in (29,36,316) 
    AND week = '2015-07-06' 
    AND is_house <> 1  
    AND archived_apartments.high_price <> 0 
  GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip
),
AllRJData AS
(
  SELECT
    distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
    COUNT(distinct apt_unique_id) AS "all_count_clean_zip"
    --, AVG((low_price+high_price)/2) AS "all_avg_price"
  FROM 
    archived_apartments 
  WHERE 
    week = '2015-07-06' 
    AND is_house <> 1  
  GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip
),
TrueComps AS
(
  SELECT
    distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
    COUNT(distinct apt_unique_id) AS "true_comps"
   FROM
    archived_apartments, RentDotComOnly
   WHERE
    week = '2015-07-06' 
    AND is_house <> 1 
    AND archived_apartments.high_price <> 0 
    AND low_price > 10000
    GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip
)

SELECT 
  distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
  TrueComps.true_comps AS "TrueComps"
FROM
  archived_apartments, TrueComps

GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip, truecomps.true_comps
ORDER BY monthlyzip

Original code:

AND (low_price > RentDotComOnly.rent_lower_bound and low_price < RentDotComOnly.rent_upper_bound) or (high_price < RentDotComOnly.rent_upper_bound and high_price > RentDotComOnly.rent_lower_bound)

My full code:

WITH 
RentDotComOnly AS
(
  SELECT 
    distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
    COUNT(distinct apt_unique_id) AS "rent_count_clean_zip", 
    -- AVG((low_price+high_price)/2) AS "rent_avg_price", 
    0.85*min(low_price) AS "rent_lower_bound", 
    1.15*max(high_price) AS "rent_upper_bound"
  FROM 
    archived_apartments 
  WHERE 
    source_type in (29,36,316) 
    AND week = '2015-07-06' 
    AND is_house <> 1  
    AND archived_apartments.high_price <> 0 
  GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip
),
AllRJData AS
(
  SELECT
    distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
    COUNT(distinct apt_unique_id) AS "all_count_clean_zip"
    --, AVG((low_price+high_price)/2) AS "all_avg_price"
  FROM 
    archived_apartments 
  WHERE 
    week between '2015-07-06' and '2015-10-12' 
    AND is_house <> 1  
  GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip
),
TrueComps AS
(
  SELECT
    distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
    COUNT(distinct apt_unique_id) AS "true_comps"
   FROM
    archived_apartments, RentDotComOnly
   WHERE
    week between '2015-07-06' and '2015-10-12'
    AND is_house <> 1 
    AND archived_apartments.high_price <> 0 
    AND (low_price > RentDotComOnly.rent_lower_bound and low_price < RentDotComOnly.rent_upper_bound) or (high_price < RentDotComOnly.rent_upper_bound and high_price > RentDotComOnly.rent_lower_bound)    
  GROUP BY monthlyzip, archived_apartments.week, archived_apartments.clean_zip
)

SELECT 
  distinct concat(DATE_PART(mm,archived_apartments.week),clean_zip) AS "monthlyzip",
  RentDotComOnly.rent_count_clean_zip AS "RentOnly",
  AllRJData.all_count_clean_zip AS "Total",
  TrueComps.true_comps AS "TrueComps"
FROM
  archived_apartments
JOIN AllRJData 
ON concat(DATE_PART(mm,archived_apartments.week),archived_apartments.clean_zip) = AllRJData.monthlyzip
JOIN RentDotComOnly
ON concat(DATE_PART(mm,archived_apartments.week),archived_apartments.clean_zip) = RentDotComOnly.monthlyzip
JOIN TrueComps
ON concat(DATE_PART(mm,archived_apartments.week),archived_apartments.clean_zip) = TrueComps.monthlyzip

GROUP BY AllRJData.monthlyzip, archived_apartments.week, archived_apartments.clean_zip, rentdotcomonly.rent_count_clean_zip, allrjdata.all_count_clean_zip, truecomps.true_comps
ORDER BY AllRJData.monthlyzip
3
  • Despite my suggestion not seeming to fix the problem I can still nearly guarantee that the parentheses in the rent bounds conditions are not correct. Should you also be matching the rows on monthlyzip or something? Commented Oct 30, 2015 at 0:01
  • Yes, my effort is to match on monthlyzip with the JOIN ON Commented Oct 30, 2015 at 0:10
  • I think you need to do that inside TrueComps as well. Commented Oct 30, 2015 at 0:13

2 Answers 2

1

Try adding a join condition in TrueComps:

FROM
    archived_apartments INNER JOIN RentDotComOnly
        ON concat(DATE_PART(mm,archived_apartments.week),archived_apartments.clean_zip) =
           RentDotComOnly.monthlyzip
Sign up to request clarification or add additional context in comments.

5 Comments

still no luck. sql is hard
Okay so I got results now but something is going wrong because I'm seing 5+ rows per monthlyzip instead of just a count of truecomps for a single monthlyzip. any thoughts?
Well your grouping is on three columns and I almost wrote that to join on all three (monthlyzip, week, clean_zip). I don't know what any of those mean so I'm still just throwing out guesses.
Adding to the previous comment: Because of those groupings there will be multiple rows with the same monthlyzip in RentDotComOnly. If you were only expecting a single row and you're getting a lot more then it's either because of the join or the groups.
In case you didn't realize, the join condition can be any expression. So you can put ...AND x.week = y.week AND... in there.
0

I think you probably have the parentheses wrong on the last part of the WHERE clause. I don't know what logic you're trying to implement but my guess to fix it is:

AND (
            low_price  > RentDotComOnly.rent_lower_bound
        and low_price  < RentDotComOnly.rent_upper_bound
    or      high_price < RentDotComOnly.rent_upper_bound
        and high_price > RentDotComOnly.rent_lower_bound
)

As you've got it written the or'd condition is not combined with the others and stands separate which is also likely to cause the slow-down you're seeing.

Another guess is that you're looking for an overlap in price ranges. Is it possible that you really wanted this?:

AND (
            low_price  <= RentDotComOnly.rent_upper_bound
        and high_price >= RentDotComOnly.rent_lower_bound
)

5 Comments

It's still timing out on me: "[Amazon](500310) Invalid operation: Disk Full"
I replaced your line with 'AND ( low_price > 3000 and low_price < 4000 or high_price < 4000 and high_price > 3000' and found that maybe my logic with the JOINs is off because it still stalls.
Then it probably has something to do with the cross product of archived_apartments and RentDotComOnly. I don't really understand the relationship between those two tables/table expressions or how many rows in each. It must be pretty large.
RentDotCom is only 26k rows, archives_apartments is indeed very large.
Shawn, really appreciate your persistence here. I'm still messing around on my end and I am confident that does not address the issue. I will update the question now with my current version.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.