Composite JSONB array query in Postgres?

Question

Table: test, JSONB column: content:

create table test (id bigserial primary key, content jsonb);

content contains a list of fixed-length lists:

insert into test values (1, '[["first 1", "second 3"]]');
insert into test values (2, '[["first 1", "second 2"], ["first 2", "second 3"]]');
insert into test values (3, '[["first 1", "second 2"], ["first 1", "second 3"]]');
insert into test values (4, '[["first 2", "second 3"], ["first 1", "second 2"], ["first 1", "second 2"]]');

What's the correct Postgres syntax for a query that returns all rows where at least one of the content elements satisfies (first element = "first 1") AND (second element ILIKE "%3%")?

That is, in the example above, it should select rows 1 and 3, but not 2 or 4.

Bonus question: what is the most efficient way to do such query (in case there are multiple alternatives)? Does it make sense to look into GIN over JSONB with pg_trgm? (There are millions of rows, the inner string values are typically 10-100 characters long, and each content list contains 0-1000s of lists (most usually 0).)

Thanks!

Jeremy · Accepted Answer · 2019-06-06 19:55:50Z

3

You should split apart the top level arrays and check the elements from there:

select distinct id, content
FROM test
JOIN lateral (
    select elems 
    FROM jsonb_array_elements(content) jae(elems)
) all_arrays ON TRUE
WHERE elems ->> 0 = 'first 1'
and elems ->> 1 ilike '%3%'
ORDER BY 1;

As for the best way to do this, that depends a lot on your actual data - how many rows, how big these jsonb structures are, etc. In general, though, a search like ilike '%3%' will benefit from indexes based off of pg_trgm because they can't use traditional btree indexes.

Edit: @Abelisto's query in the comments is better because it should be more performant, especially if content can contain 1000s of elements:

select * from test 
where exists 
  (select 1 
   from jsonb_array_elements(content) jae(elems) 
   where elems ->> 0 = 'first 1' 
   and elems ->> 1 ilike '%3%'
  );

edited Jun 6, 2019 at 19:55

answered Jun 6, 2019 at 19:26

Jeremy

6,77322 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Abelisto Over a year ago

Using exists could be shorter and more convenient:

select * from test where exists (select 1 from jsonb_array_elements(content) jae(elems) where elems ->> 0 = 'first 1' and elems ->> 1 ilike '%3%')

Jeremy Over a year ago

Thanks Abelisto, I like that as well.

user124114 Over a year ago

@Jeremy I updated the question with expected data sizes. Can you include an example of creating the right index for the second list element (the one queried with ILIKE)?

Abelisto Over a year ago

@user124114 It could be faster because it removes two operations from the query: join and distinct. But to get really good performance you need to normalize your data structure.

Jeremy Over a year ago

@user124114 He means use single values per row instead of using jsonb. If your data were in columns of (id, first_val, second_val) you could index first_val with a btree index and pretty easily add the trgm index to second_val.

|

Anton · Accepted Answer · 2019-06-06 19:35:27Z

1

Inner select expands array elements into separate rows with jsonb_array_elements, outer select does the filtering you want. See SQL Fiddle for live example.

select * from (
select id, jsonb_array_elements(content) as item from test  
) as expandedtest
where item->>0 like 'first 1' and item->>1 like '%3%'

answered Jun 6, 2019 at 19:35

Anton

4,0902 gold badges15 silver badges32 bronze badges

Collectives™ on Stack Overflow

Composite JSONB array query in Postgres?

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related