1

Table: test, JSONB column: content:

create table test (id bigserial primary key, content jsonb);

content contains a list of fixed-length lists:

insert into test values (1, '[["first 1", "second 3"]]');
insert into test values (2, '[["first 1", "second 2"], ["first 2", "second 3"]]');
insert into test values (3, '[["first 1", "second 2"], ["first 1", "second 3"]]');
insert into test values (4, '[["first 2", "second 3"], ["first 1", "second 2"], ["first 1", "second 2"]]');

What's the correct Postgres syntax for a query that returns all rows where at least one of the content elements satisfies (first element = "first 1") AND (second element ILIKE "%3%")?

That is, in the example above, it should select rows 1 and 3, but not 2 or 4.

Bonus question: what is the most efficient way to do such query (in case there are multiple alternatives)? Does it make sense to look into GIN over JSONB with pg_trgm? (There are millions of rows, the inner string values are typically 10-100 characters long, and each content list contains 0-1000s of lists (most usually 0).)

Thanks!

2 Answers 2

3

You should split apart the top level arrays and check the elements from there:

select distinct id, content
FROM test
JOIN lateral (
    select elems 
    FROM jsonb_array_elements(content) jae(elems)
) all_arrays ON TRUE
WHERE elems ->> 0 = 'first 1'
and elems ->> 1 ilike '%3%'
ORDER BY 1;

As for the best way to do this, that depends a lot on your actual data - how many rows, how big these jsonb structures are, etc. In general, though, a search like ilike '%3%' will benefit from indexes based off of pg_trgm because they can't use traditional btree indexes.

Edit: @Abelisto's query in the comments is better because it should be more performant, especially if content can contain 1000s of elements:

select * from test 
where exists 
  (select 1 
   from jsonb_array_elements(content) jae(elems) 
   where elems ->> 0 = 'first 1' 
   and elems ->> 1 ilike '%3%'
  );
Sign up to request clarification or add additional context in comments.

7 Comments

Using exists could be shorter and more convenient: select * from test where exists (select 1 from jsonb_array_elements(content) jae(elems) where elems ->> 0 = 'first 1' and elems ->> 1 ilike '%3%')
Thanks Abelisto, I like that as well.
@Jeremy I updated the question with expected data sizes. Can you include an example of creating the right index for the second list element (the one queried with ILIKE)?
@user124114 It could be faster because it removes two operations from the query: join and distinct. But to get really good performance you need to normalize your data structure.
@user124114 He means use single values per row instead of using jsonb. If your data were in columns of (id, first_val, second_val) you could index first_val with a btree index and pretty easily add the trgm index to second_val.
|
1

Inner select expands array elements into separate rows with jsonb_array_elements, outer select does the filtering you want. See SQL Fiddle for live example.

select * from (
select id, jsonb_array_elements(content) as item from test  
) as expandedtest
where item->>0 like 'first 1' and item->>1 like '%3%'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.