As pointed by Stanislav Kikot, although my first reading of your question focused on reducing the resultset of an or to 1 row only ("any recipe where either it has 2 eggs or water, but it would be nice to see one row in the results, not two"), a second reading makes appear a wanted and ("find the recipe title where the recipe requires 2 eggs and some water").
So here are two answers, one for the or, one for the and (which is generic enough to handle the or case too):
2 eggs or some water
The concept (and keyword) you're looking for is exists: it allows you to decouple an outer query (that you want to return one result row per table row of your "main" table, here recipe) from an inner one (that may return multiple rows per table row of your main table).
The inner query is run as a correlated subquery, that is, it is executed once for each row of the outer query, and sees the outer query's tables (while the outer won't see the inner query's tables leak to it): so the outer query can select just from recipe while the inner selects from ingredient and can compare its columns to the recipe of the outer query.
As long as the inner query returns at least one row, the exists will evaluate to true to the outer query, so you can return anything (even null).
This "one row suffices" allows the RDBMS to optimize by stopping the inner query as soon as it has found one row matching.
So your query should read:
select title from recipe
where exists
(
select 1 from ingredient
where recipe.recipeId= ingredient.recipeId
and
(
(ingredient.ingredient = 'egg' and ingredient.amount = 2) or ingredient.ingredient = 'water'
)
);
(as shown in this db<>fiddle)
2 eggs and some water
If we want an and between two criteria, we can still use exists:
where exists (select 1 from ingredient where recipe.recipeId = ingredient.recipeId and ingredient = 'egg' and amount = 2)
and exists (select 1 from ingredient where recipe.recipeId = ingredient.recipeId and ingredient = 'water')
which uses two correlated subqueries, each viewing its own walkthrough of ingredient + the main recipe table.
But we can do better: use SQL's aggregate functions (group by and so on).
As you possibly know, aggregates will return one row per group: that's a good hint towards your goal (having only one row per recipe).
Now to reach our goal, SQL requires we have a very analytical state of mind, so let's progress step by step:
- first focus on
ingredient. We will walk through ingredient with a group by recipeId, counting / summing all criteria of interest for this recipeId
- what we would want as output of our first step is a pseudo-table with each criteria in its column, let's say
n_eggs and some_water (so that we could then select from this intermediate result where n_eggs = 2 and some_water > 0)
- to aggregate how many eggs each recipe has, we can
sum(case when ingredient = 'egg' and unit = '' then amount end) as n_eggs:
the case when … then … end without an else will return null when when is not matched, and sum ignores nulls, so anything not an egg will not be summed
- we could
sum(case when … then 1 end) for water, but sum(1) is a count(*) so we can use the simpler count(case when ingredient = 'water' then 1 end) as some_water (in fact we could count anything, not necessarily 1).
Note however that count(null) returns 0 while sum(null) returns null. May lead to debugging headaches.
- once we have built this intermediate table, we can wrap it in a Common Table Expression,
with agg as (…):
with agg as (select recipeId, sum(…) as n_eggs, count(…) as some_water from … group by recipeId)
-- Now the agg (Common) Table (Expression) really is usable as a table!
select … from recipe join agg …;
A CTE will be similar to a very temporary table, that lasts until the ;
Now let's wrap it together:
with
agg as
(
select
recipeId,
sum(case when ingredient = 'egg' and unit = '' then amount end) n_eggs, -- Handles the case where eggs were by error split in twice a single egg.
count(case when ingredient = 'water' then 1 end) some_water
from ingredient
-- Pre-filtering to only ingredients we know will fill one of the selected columns gives the RDBMS an opportunity to optimize (using indexes for example):
where ingredient in ('egg', 'water')
group by recipeId
)
select
title,
recipeId,
n_eggs,
some_water
from recipe join agg using (recipeId)
where n_eggs = 2 and some_water > 0; -- > 0, not is not null, because count(null) returns 0, not null.
| title |
recipeId |
n_eggs |
some_water |
| Slime cake |
1 |
2 |
1 |
I've put up this query as the forelast query in a db<>fiddle showing all solutions.
Note that:
- I used the shortcut
join … using (recipeId), which can be used when both left and right table have a column named recipeId: it's equivalent to join … on agg.recipeId = recipe.recipId, and allows referring to column recipeId in the select without having to tell if it's agg.recipeId or recipe.recipeId
- compared to the
exists, the sum() allows precisely counting how many eggs we have, even if they are in separated rows (
for example a recipe with 1 ostrich egg + 1 chicken egg would still match)
- as our
agg pseudo-table holds each criteria in its own column, we can now apply our boolean logic as we want:
we did where n_eggs = 2 and some_water > 0,
but we could do where n_eggs = 2 or some_water > 0 to reproduce the results for "2 eggs or some water" instead of and.
- the basic form of CTE we used here is equivalent to a subquery-table (
select title from recipe join (select recipeId, …) as agg using (recipeId) where … instead of with agg as (select recipeId, …) select title from recipe join agg using (recipeId)).
The difference lies in more complex cases:
- subqueries can't, CTE can:
- recursive CTE
- reusability (you can refer to a CTE twice, including self-joining it; hence the "Common" in Common Table Expression: define once, use twice)
- readability IMO
- CTE can't, subqueries can:
- correlated subquery, where the subquery runs once per row of the precedingly declared / defined tables
Last word (and last query in the fiddle):
as the CTE was a tool to help you organize the building of your query, you could transform it to a subquery… or even do it in one pass (select title from recipe join ingredient using (recipeId) where … group by recipeId).
A small transformation is however needed: where applies before group by which itself precedes the aggregate functions, so filtering on the aggregates must be done after the group by, by the keyword having instead of where: … group by recipeId having sum(…) = 2 and count(…) > 0.
However, you'll rapidly find that keeping your query in its CTE form will ease you maintenance.
"egg"means "column namedegg", while'egg'means "string literal "egg"". Single quotes are the only accepted character for strings in SQL.