Correct way to use SQL Intersect

Question

I'm trying to build a query for the following scenario:

I have two tables Table1 and Table2.

The primary keys of Table1 goes like T1Attr1, T1Attr2 and so on.

Corresponding to each primary key in Table1, I can get a set of attributes from Table2 which goes like T2Attr1, T2Attr2 and so on.

I'm trying to query for the attributes which are common to the attributes of Table1, for example, if the input is T1Attr1 and T1Attr2, the results should have the attributes common to both of them from Table2. As the input parameters grows, the results would be less since common-to-all attributes would be less.

My query is similar to this:

Select indId, indName from indData where pId =1

intersect
Select indId, indName from indData where pId =2

intersect
Select indId, indName from indData where pId =3

The query works fine but when the pId list is huge(above 100), jdbc driver throws an error message.

Can someone please provide suggestions on using this query correctly or provide a better approach for the problem?

Thanks!

What is the primary key of this table? And what error message do you get exactly? — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Jul 1, 2013 at 19:40
Primary key is an incremented value and not included in the query. The error I get is pasted here-> pastebin.com/VsG92F2n — jobinbasani
– jobinbasani, Commented Jul 1, 2013 at 20:07
Out of curiosity, did this error was produced with the intersection of 100 tables? — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Jul 1, 2013 at 20:44
No other unique constraints, and yes I got that error when when the pId was from 1 to 100(so 100 intersections) — jobinbasani
– jobinbasani, Commented Jul 1, 2013 at 20:46

ypercubeᵀᴹ · Accepted Answer · 2013-07-01 20:41:47Z

3

You can use this query but it won't be as efficient as the one you have:

SELECT indId, indName 
FROM indData 
WHERE pId IN (1, 2, 3, ..., 100)
GROUP BY indId, indName
HAVING COUNT(DISTINCT pId) = 100 ;  -- the number of pId you are searching on

You can also use JOINs. Perhaps this will result in a better execution plan and not cause this error. If there is a unique constraint on (indId, pId) this will be equivalent to your query:

SELECT a1.indId, a1.indName 
FROM indData AS a1
  JOIN indData AS a2
    ON a2.indId = a1.indId
  JOIN indData AS a3
    ON a3.indId = a1.indId 
  ...
  JOIN indData AS a100
    ON a100.indId = a1.indId
WHERE a1.pId = 1
  AND a2.pId = 2
  ...
  AND a100.pId = 100 ;

An index on (pId, indID) INCLUDE (indName) would help efficiency.

edited Jul 1, 2013 at 20:41

answered Jul 1, 2013 at 19:38

ypercubeᵀᴹ

116k19 gold badges181 silver badges249 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Gordon Linoff · Accepted Answer · 2013-07-01 19:35:00Z

Intersect is not the only way to do what you want. Your query is an example of a "set-within-sets" query. The "set" is the indid, indname pair. The "within-sets" are having all three values for pid.

I advocate using aggregation with a having clause for this type of query, because this is a very flexible approach for many types of conditions. In your case, the results query is:

select indid, indname
from indData
group by indid, indname
having SUM(case when pid = 1 then 1 else 0 end) > 0 and
       SUM(case when pid = 2 then 1 else 0 end) > 0 and
       SUM(case when pid = 3 then 1 else 0 end) > 0;

If you have an index on pid and the values are relatively rare, than adding a where pid in (1, 2, 3) could benefit the query performance-wise.

Collectives™ on Stack Overflow

Correct way to use SQL Intersect

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related