0

Sorry for the abysmal title - if someone wants to change it for something more self-explanatory, great - I'm not sure how to express the problem. Which is:

I have a table like so:

POST_ID (INT)   TAG_NAME (VARCHAR)

    1              'tag1'
    1              'tag2'
    1              'tag3'
    2              'tag2'
    2              'tag4'
   ....

What I want to do is count the number of POSTs which have both tag1 AND tag2.
I've messed about with GROUP BY and DISTINCT and COUNT but I can't construct a query which does the trick.
Any suggestions?

Edit: In pseudo sql, the query I want is:

SELECT DISTINCT(POST_ID) WHICH HAS TAG_NAME = 'tag1' AND TAG_NAME = 'tag2'; 

Thanks

2
  • It'd be helpful to also see what rows you want the query to return, since I'm still not totally sure what you're asking. Commented Nov 25, 2010 at 13:04
  • @Matchu - in pseudo sql: select post_id which has tag_name = 'tag1' and tag_name = 'tag2'; Commented Nov 25, 2010 at 13:06

3 Answers 3

2

Edit: because 'TABLE' was a poor choice for a missing tablename, I'll suppose your table is called Posts.

Join the table against itself:

SELECT * FROM Posts P1
JOIN Posts P2
ON P1.POST_ID = P2.POST_ID
WHERE P1.TAG_NAME = 'tag1'
AND P2.TAG_NAME = 'tag2'
Sign up to request clarification or add additional context in comments.

4 Comments

thanks this looks promising. My actual problem is a little more complicated as my table is not really a table, it's the result of a select+join on two normalised tables.
@Richard, simply replace in the query the TABLE by (<your query here>).
I think your query should actually be: SELECT * FROM T1 JOIN T1 as T2 ON T1.POST_ID = T2.POST_ID WHERE T1.TAG_NAME = 'tag1' AND T2.TAG_NAME = 'tag2'
@Richard: that is exactly the same query? You didn't supply a tablename, so I used "TABLE" is table name - don't confuse it with the keyword "TABLE". I edited the answer for clarity.
0

I'm just leaving this (untested) dependent subquery solution here for reference, even though it'll probably be horribly slow once you get to large data sets. Any solution that does the same thing using joins should be chosen over this.

Assuming you have a posts table with an id field, as well:

SELECT count(*) FROM posts WHERE EXISTS(SELECT NULL FROM posts_tags WHERE tag = 'tag1' AND post_id = posts.id) AND EXISTS(SELECT NULL FROM posts_tags WHERE tag = 'tag2' AND post_id = posts.id)

Comments

0

Try the following query:

SELECT COUNT(*) nb_posts
FROM (
    SELECT post_id, COUNT(*) nb_tags
    FROM table
    WHERE tag_name in ('tag1','tag2')
    GROUP BY post_id
    HAVING COUNT(*) = 2
  ) t

Edit: based on Konerak answer, here is the query that handles the case when there are duplicated tag names for a given post:

SELECT DISTINCT t1.post_id
FROM table t1
  JOIN table t2
    ON t1.post_id = t2.post_id
       AND t2.tag_name = 'tag2'
WHERE t1.tag_name = 'tag1'

4 Comments

What happens when an entry is in the table two times? 1-tag1, 1-tag1?
@Konerak, when it is the case, do you want to include it into the count, or exclude it ?
@Konerak: good point. I'm betting it shouldn't happen, but it'd probably be best to avoid that possibility if doable. @Bruno: I think the point is that counting is not the best solution. Maybe it'd work if you used a DISTINCT somewhere in there, but I'm not sure.
@Matchu, if you want to select the posts and not count them, I think that the query suggested by Konerak is the best solution. But the query as it is does not work if you have duplicated tag names for a given post_id.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.