1

I have a posts table with a JSON column of an array with IDs. This column's data looks exactly like:

[1, 3, 17, 19] These values are not quoted.

To check if a user should see a post, I simply use JSON_CONTAINS(userlist, '(the user ID)', '$'). However, after many thousands of posts, this is starting to get too slow. I'd like to avoid normalizing this to another table with proper relations for now, so I'm wondering what's the best way to see if a user ID exists in a field like what I have.

Note: this is not exactly a duplicate. My values are straight integers, which is why I seemingly can't use JSON_SEARCH()?

2
  • 2
    What does SELECT VERSION(); return? Are you using MySQL 8.0 so you can create multi-valued indexes? If so, then have you created such an index? If not, then why not? Commented Sep 24, 2022 at 6:27
  • @BillKarwin I'm on 8.0.27 and... I'd never heard of this type of index! However, I don't think it would be helpful for my use case? My JSON is all numeric values in an array. Simply: [1, 3, 7] for user IDs 1, 3, and 7. Commented Sep 24, 2022 at 19:41

2 Answers 2

3

You really should bite the bullet and normalise, as this operation is only going to get slower. In the meantime, there are a couple of ways you can do this with string operations using LIKE and REGEXP:

select userlist regexp '\\b3\\b' AS got_3,
       userlist regexp '\\b7\\b' AS got_7
from test
;
select userlist like '[3,%' or userlist like '% 3,%' or userlist like '%,3]' AS got_3,
       userlist like '[7,%' or userlist like '% 7,%' or userlist like '%,7]' AS got_7
from test

In both cases for your sample data the output is:

got_3   got_7
1       0

Using LIKE will probably be faster than JSON_CONTAINS, but using REGEXP probably won't. You'd need to benchmark on your server.

If you're using MySQL 8+, then you can use JSON_TABLE:

select *
from test
join json_table(userlist,
                '$[*]' columns (user int path '$')
               ) ul
where ul.user = 3

Again, performance will be dependent on your server.

Demo on db-fiddle

Sign up to request clarification or add additional context in comments.

2 Comments

I appreciate it! json_table shaved my response time down from an average of 450ms to 190s. That'll do for now. I definitely will need to refactor this app later.
@VaelVictus I'm glad I could help. If you're benchmarking I'd also be interested in hearing if Bill's solution offered a significant improvement.
3

Demo of the MySQL 8.0 multi-valued index:

mysql> create table mytable (id serial primary key, data json);

mysql> insert into mytable set data = '[1, 3, 17, 19]';

mysql> create index i on mytable ((cast(data as unsigned array)));

mysql> explain select * from mytable where 17 member of (data)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mytable
   partitions: NULL
         type: ref
possible_keys: i
          key: i    <-- see, it is using the index `i`
      key_len: 9
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where

It's true that JSON_SEARCH() doesn't work with integer values, but the multi-valued index I defined in the example is indexing integer values, so this works.

Of course the whole task would be simpler if you normalized your table instead of using a JSON array. JSON generally makes queries and optimization harder.

4 Comments

I hadn't noticed this addition to MySQL 8. It looks really useful for JSON (if you're stuck with it for some reason).
Since MySQL 8.0, they keep adding significant new features in point-releases. In this case, multi-valued indexes were added in 8.0.17, which was 15 months after the first GA release. You really need to read the release notes carefully these days. There are also deprecations in point-releases, which is going to make it hard to do regular upgrades.
Indeed, going to have spend more time reading. thanks
@BillKarwin Thanks! This looks great, and I'm sure it sped up the individual queries, but I was still getting 0.25s average query time while separately, they were performed in 0.005s. Looks like UNION is the culprit. I'll paste here with no expectation; take a look if you'd like and have any ideas on how to speed this up, else I'll rely on application code to union the results. pastebin.com/aFvPCnw6

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.