2

I have an SQLite database - and to simplify, it has two tables:- recipe and ingredients.

The recipe database is comprised of

recipeId (autoincrement, primary index)
title

and the ingredients:-

id (autoincrement, primary index)
recipeId (relating to the recipeId in the recipe table)
ingredient
unit (e.g.pounds or litres)
amount (the quantity)

I am using the sqlite database browser in Ubuntu to allow me to craft and execute the scripts.

An example of the recipe table may contain the following data:

recipeId title
1 Slime cake
2 Giraffe biscuits
3 Toad toasties

and the ingredients:-

id recipeId ingredient unit amount
1 1 water pint 1
2 1 porridge g 100
3 1 egg 2
4 2 egg 1
5 2 flour g 100
6 3 flour g 150
7 3 bread slice 2

I am seeking e.g. to find the recipe title where the recipe requires 2 eggs and some water. I tried things like:

select title from recipe
   join ingredient on recipe.recipeId= ingredient.recipeId 
   where 
      (ingredient.ingredient = "egg" and ingredient.amount = 2) or ingredient.ingredient = "water" ;

That gives me any recipe where either it has 2 eggs or water, but it would be nice to see one row in the results, not two!

I cannot see how nested select statements would help - but my knowledge of SQLite ... is very limited, so any guidance would be appreciated, as I couldn't find an answer online (which means I most likely haven't phrased my questions adequately!). Many thanks for your help.

Sadly, I was unable to leave a comment to @Guillaume Outters but, it worked, which I was most grateful for, but there was one anomaly.

I saw one result that had the recipe listed twice, and assuming I had two copies of the recipe, I listed the recipeId as well, but the duplicate was a genuine duplicate.

When I looked at the recipe, in the one recipe, it requires eggs three times (!!), but each time, with a different quantity - so it still should only have picked it up once.

It isn't a problem, in terms of what I was trying to achieve, but I remain puzzled as to why.

My thanks again!

2
  • Beware that in SQL, "egg" means "column named egg", while 'egg' means "string literal "egg"". Single quotes are the only accepted character for strings in SQL. Commented Oct 22 at 20:34
  • Many thanks for the information - I wasn't aware of that - and maybe it is the database browser I use, but it didn't matter as to whether I used a single inverted comma or double inverted commas! I will, however, try to use the correct one as standard practice. Thank you! Commented Oct 23 at 19:05

2 Answers 2

1

Did you try

select title from recipe
   join ingredient as i1 on recipe.recipeId= i1.recipeId join ingredient as i2 on recipe.recipeId= i2.recipeId  
   where 
      (i1.ingredient = "egg" and i1.amount = 2) and i2.ingredient = "water" ;

?

Sign up to request clarification or add additional context in comments.

Comments

0

As pointed by Stanislav Kikot, although my first reading of your question focused on reducing the resultset of an or to 1 row only ("any recipe where either it has 2 eggs or water, but it would be nice to see one row in the results, not two"), a second reading makes appear a wanted and ("find the recipe title where the recipe requires 2 eggs and some water").

So here are two answers, one for the or, one for the and (which is generic enough to handle the or case too):

2 eggs or some water

The concept (and keyword) you're looking for is exists: it allows you to decouple an outer query (that you want to return one result row per table row of your "main" table, here recipe) from an inner one (that may return multiple rows per table row of your main table).

The inner query is run as a correlated subquery, that is, it is executed once for each row of the outer query, and sees the outer query's tables (while the outer won't see the inner query's tables leak to it): so the outer query can select just from recipe while the inner selects from ingredient and can compare its columns to the recipe of the outer query.

As long as the inner query returns at least one row, the exists will evaluate to true to the outer query, so you can return anything (even null).
This "one row suffices" allows the RDBMS to optimize by stopping the inner query as soon as it has found one row matching.

So your query should read:

select title from recipe
where exists
(
   select 1 from ingredient
   where recipe.recipeId= ingredient.recipeId 
   and
   (
      (ingredient.ingredient = 'egg' and ingredient.amount = 2) or ingredient.ingredient = 'water'
   )
);

(as shown in this db<>fiddle)

2 eggs and some water

If we want an and between two criteria, we can still use exists:

where exists (select 1 from ingredient where recipe.recipeId = ingredient.recipeId and ingredient = 'egg' and amount = 2)
  and exists (select 1 from ingredient where recipe.recipeId = ingredient.recipeId and ingredient = 'water')

which uses two correlated subqueries, each viewing its own walkthrough of ingredient + the main recipe table.

But we can do better: use SQL's aggregate functions (group by and so on).
As you possibly know, aggregates will return one row per group: that's a good hint towards your goal (having only one row per recipe).

Now to reach our goal, SQL requires we have a very analytical state of mind, so let's progress step by step:

  • first focus on ingredient. We will walk through ingredient with a group by recipeId, counting / summing all criteria of interest for this recipeId
  • what we would want as output of our first step is a pseudo-table with each criteria in its column, let's say n_eggs and some_water (so that we could then select from this intermediate result where n_eggs = 2 and some_water > 0)
  • to aggregate how many eggs each recipe has, we can sum(case when ingredient = 'egg' and unit = '' then amount end) as n_eggs:
    the case when … then … end without an else will return null when when is not matched, and sum ignores nulls, so anything not an egg will not be summed
  • we could sum(case when … then 1 end) for water, but sum(1) is a count(*) so we can use the simpler count(case when ingredient = 'water' then 1 end) as some_water (in fact we could count anything, not necessarily 1).
    Note however that count(null) returns 0 while sum(null) returns null. May lead to debugging headaches.
  • once we have built this intermediate table, we can wrap it in a Common Table Expression, with agg as (…):
    with agg as (select recipeId, sum(…) as n_eggs, count(…) as some_water from … group by recipeId)
    -- Now the agg (Common) Table (Expression) really is usable as a table!
    select … from recipe join agg …;
    
    A CTE will be similar to a very temporary table, that lasts until the ;

Now let's wrap it together:

with
    agg as
    (
        select
            recipeId,
            sum(case when ingredient = 'egg' and unit = '' then amount end) n_eggs, -- Handles the case where eggs were by error split in twice a single egg.
            count(case when ingredient = 'water' then 1 end) some_water
        from ingredient
        -- Pre-filtering to only ingredients we know will fill one of the selected columns gives the RDBMS an opportunity to optimize (using indexes for example):
        where ingredient in ('egg', 'water')
        group by recipeId
    )
select
    title,
    recipeId,
    n_eggs,
    some_water
from recipe join agg using (recipeId)
where n_eggs = 2 and some_water > 0; -- > 0, not is not null, because count(null) returns 0, not null.
title recipeId n_eggs some_water
Slime cake 1 2 1

I've put up this query as the forelast query in a db<>fiddle showing all solutions.

Note that:

  • I used the shortcut join … using (recipeId), which can be used when both left and right table have a column named recipeId: it's equivalent to join … on agg.recipeId = recipe.recipId, and allows referring to column recipeId in the select without having to tell if it's agg.recipeId or recipe.recipeId
  • compared to the exists, the sum() allows precisely counting how many eggs we have, even if they are in separated rows ( for example a recipe with 1 ostrich egg + 1 chicken egg would still match)
  • as our agg pseudo-table holds each criteria in its own column, we can now apply our boolean logic as we want:
    we did where n_eggs = 2 and some_water > 0,
    but we could do where n_eggs = 2 or some_water > 0 to reproduce the results for "2 eggs or some water" instead of and.
  • the basic form of CTE we used here is equivalent to a subquery-table (select title from recipe join (select recipeId, …) as agg using (recipeId) where … instead of with agg as (select recipeId, …) select title from recipe join agg using (recipeId)).
    The difference lies in more complex cases:
    • subqueries can't, CTE can:
      • recursive CTE
      • reusability (you can refer to a CTE twice, including self-joining it; hence the "Common" in Common Table Expression: define once, use twice)
      • readability IMO
    • CTE can't, subqueries can:
      • correlated subquery, where the subquery runs once per row of the precedingly declared / defined tables

Last word (and last query in the fiddle):
as the CTE was a tool to help you organize the building of your query, you could transform it to a subquery… or even do it in one pass (select title from recipe join ingredient using (recipeId) where … group by recipeId).
A small transformation is however needed: where applies before group by which itself precedes the aggregate functions, so filtering on the aggregates must be done after the group by, by the keyword having instead of where: … group by recipeId having sum(…) = 2 and count(…) > 0.
However, you'll rapidly find that keeping your query in its CTE form will ease you maintenance.

3 Comments

This query will still (among others rows) give recipes with only eggs and only water because of the OR connective. You need to join recipe with ingredients twice in order to get the desired behavior.
You're totally right. I had read the question too rapidly, focusing on a problem of grouping rows , not necessarily on the boolean logic. Your remark gave me the opportunity to enrich my answer with this second case, but moreover with an introduction to CTEs and aggregate functions (… which avoids the double join by the way). Thanks!
Suppose that there is a binary relation :contains between recipes and ingredients. Then to bring all recipes that contain both eggs and water in SPARQL you need to SELECT ?r WHERE {?r :contains "eggs". ?r :contains "water"}. This suggests that a join of two copies of the 'contains' table cannot be avoided.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.