3

I have a JSON field to save post's tags.

id:1, content:'...', tags: ["tag_1", "tag_2"]

id:2, content:'...', tags: ["tag_3", "tag_2"]

id:3, content:'...', tags: ["tag_1", "tag_2"]

I just want to list all tags with their popularities (or even without them) something like this:

tag_2: 3,

tag_1: 2,

tag_3: 1

1 Answer 1

2

Here's the setup:

create table t ( id serial primary key, content json);
insert into t set content = '{"tags": ["tag_1", "tag_2"]}';
insert into t set content = '{"tags": ["tag_3", "tag_2"]}';
insert into t set content = '{"tags": ["tag_1", "tag_2"]}';

If you know the maximum number of tags in any tag array, you can extract all the tags using UNION:

select id, json_extract(content, '$.tags[0]') AS tag from t 
union
select id, json_extract(content, '$.tags[1]') from t;

+----+---------+
| id | tag     |
+----+---------+
|  1 | "tag_1" |
|  2 | "tag_3" |
|  3 | "tag_1" |
|  1 | "tag_2" |
|  2 | "tag_2" |
|  3 | "tag_2" |
+----+---------+

You need as many unioned subqueries as the number of tags in the longest array.

Then you can put this in a derived table and perform an aggregation on it:

select tag, count(*) as count
from ( 
    select id, json_extract(content, '$.tags[0]') as tag from t 
    union 
    select id, json_extract(content, '$.tags[1]') from t
) as t2
group by tag
order by count desc;

+---------+-------+
| tag     | count |
+---------+-------+
| "tag_2" |     3 |
| "tag_1" |     2 |
| "tag_3" |     1 |
+---------+-------+

This would be easier if you stored tags in a second table instead of in a JSON array:

create table tags ( id bigint unsigned, tag varchar(20) not null, primary key (id, tag));
insert into tags set id = 1, tag = 'tag_1';
insert into tags set id = 1, tag = 'tag_2';
insert into tags set id = 2, tag = 'tag_3';
insert into tags set id = 2, tag = 'tag_2';
insert into tags set id = 3, tag = 'tag_1';
insert into tags set id = 3, tag = 'tag_2';

select tag, count(*) as count 
from tags
group by tag
order by count desc;

+-------+-------+
| tag   | count |
+-------+-------+
| tag_2 |     3 |
| tag_1 |     2 |
| tag_3 |     1 |
+-------+-------+

This solutions works no matter how many tags per id you have. You don't need to know the max length of the list of tags per id.

JSON is nice when you need to store a 'document' of semi-structured data, but only when you treat the document as one irreducible data value. As soon as you need to access elements of the document and apply relational operations to them, the document-oriented approach shows its weakness.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! Very helpful. Working on this for a week!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.