3

I have a simple select query on JSON data with a few simple calculations which return different results when I run it several times.

For the life of me, I just cannot find out why and so far no luck in my search for an answer. I can see from the query stats that a temporary table is built, which I imagine fails to truncate/ is saved somewhere upon completion of the query, but I am unable to locate (let alone truncate/drop) this table. It is just a working hypothesis though - the answer might be in a setting or ...

Can anyone find any logic to this?

I am using MySQL Workbench on MySQL DB on local instance. MySQL: Version 8.0.19 MySQL Community Server - GPL.

This is a simplified version of the table:

CREATE TABLE regulation_entries (
    id INTEGER UNSIGNED AUTO_INCREMENT,
    employee_id VARCHAR(10) NOT NULL,
    regulation JSON NOT NULL,
    PRIMARY KEY (id)
    );

# Sample data to work with    

INSERT INTO regulation_entries VALUES
    (DEFAULT, 2, '{"entry_base_salary": "8000", "pension_pct": "0.08"}'),
    (DEFAULT, 3, '{"entry_base_salary": "10000", "pension_pct": "0.08"}'),
    (DEFAULT, 5, '{"entry_base_salary": "11000", "pension_pct": "0.08"}'),
    (DEFAULT, 8, '{"entry_base_salary": "11000", "pension_pct": "0.08"}'),
    (DEFAULT, 9, '{"entry_base_salary": "9000", "pension_pct": "0.08"}'),
    (DEFAULT, 1, '{"entry_base_salary": "14000", "pension_pct": "0.10"}'),
    (DEFAULT, 6, '{"entry_base_salary": "13000", "pension_pct": "0.08"}'),
    (DEFAULT, 7, '{"entry_base_salary": "14000", "pension_pct": "0.08"}'),
    (DEFAULT, 2, '{"base_salary_adjustment": "500"}'),
    (DEFAULT, 9, '{"base_salary_adjustment": "800"}'),
    (DEFAULT, 3, '{"base_salary_adjustment": "400"}'),
    (DEFAULT, 5, '{"base_salary_adjustment": "350"}'),
    (DEFAULT, 8, '{"base_salary_adjustment": "200"}'),
    (DEFAULT, 9, '{"base_salary_adjustment": "1250"}'),
    (DEFAULT, 1, '{"base_salary_adjustment": "-200"}'),
    (DEFAULT, 1, '{"base_salary_adjustment": "50"}'),
    (DEFAULT, 6, '{"base_salary_adjustment": "700"}'),
    (DEFAULT, 7, '{"base_salary_adjustment": "825"}');

# The query that is bugging me:

SELECT employee_id, 
        SUM(regulation->>'$.entry_base_salary') AS entry_base_salary,
        regulation->>'$.pension_pct' AS pension_pct,
        AVG(regulation->>'$.pension_pct') * SUM(regulation->>'$.entry_base_salary') AS entry_pension,
        SUM(regulation->>'$.base_salary_adjustment') AS salary_adjustments,
        SUM(regulation->>'$.entry_base_salary') + SUM(regulation->>'$.base_salary_adjustment') AS future_salary,
        AVG(regulation->>'$.pension_pct') * (SUM(regulation->>'$.entry_base_salary') + SUM(regulation->>'$.base_salary_adjustment')) AS future_pension 
    FROM sreg.regulation_entries 
    GROUP BY employee_id
    ORDER BY employee_id

When I run the query I expect to see this:

employee_id #entry_base_salary  #pension_pct    #entry_pension  #salary_adjustments #future_salary  #future_pension#
#1          #14000              #0.10           #1400           #1250               #15250          #1525
#2          #8000               #0.08           #640            #1100               #9100           #728
#3          #10000              #0.08           #800            #1000               #11000          #880
#5          #11000              #0.08           #880            #950                #11950          #956
#6          #13000              #0.08           #1040           #1900               #14900          #1192
#7          #14000              #0.08           #1120           #2025               #16025          #1282
#8          #11000              #0.08           #880            #800                #11800          #944
#9          #9000               #0.08           #720            #2650               #11650          #932

But if I Execute the same query again, I get new random results. e.g.:

employee_id #entry_base_salary  #pension_pct    #entry_pension  #salary_adjustments     #future_salary  #future_pension#

1           #14000              #0.10           #1400           #3050                   #17050          #1705
2           #8000               #0.08           #640            #2900                   #10900          #872
3           #10000              #0.08           #800            #2800                   #12800          #1024
5           #11000              #0.08           #880            #2750                   #13750          #1100
6           #13000              #0.08           #1040           #3700                   #16700          #1336
7           #14000              #0.08           #1120           #3825                   #17825          #1426
8           #11000              #0.08           #880            #2600                   #13600          #1088
9           #9000               #0.08           #720            #4450                   #13450          #1076

Another example

1           #14000              #0.10           #1400           #2.119191149652875e88   #15250          #1525
2           #8000               #0.08           #640            #2.119191149652875e88   #9100           #728
3           #10000              #0.08           #800            #2.119191149652875e88   #11000          #880
5           #11000              #0.08           #880            #2.119191149652875e88   #11950          #956
6           #13000              #0.08           #1040           #2.119191149652875e88   #14900          #1192
7           #14000              #0.08           #1120           #2.119191149652875e88   #16025          #1282
8           #11000              #0.08           #880            #2.119191149652875e88   #11800          #944
9           #9000               #0.08           #720            #2.119191149652875e88   #11650          #932

And a third

1           #14000              #0.10           #1400           #3650                   #17650          #1765
2           #8000               #0.08           #640            #3500                   #11500          #920
3           #10000              #0.08           #800            #3400                   #13400          #1072
5           #11000              #0.08           #880            #3350                   #14350          #1148
6           #13000              #0.08           #1040           #4300                   #17300          #1384
7           #14000              #0.08           #1120           #4425                   #18425          #1474
8           #11000              #0.08           #880            #3200                   #14200          #1136
9           #9000               #0.08           #720            #5050                   #14050          #1124

Have any one seen this before? Any explanation? Or better yet, does anyone know what I have to change to obtain consistent results?

12
  • I only checked Employee 1 but your expected values dont seem Right to me !! Commented Jun 8, 2020 at 14:15
  • Hmm... I believe it is though. Employee 1 has 10% pension as opposed to 8% to the others (just to test how it is handled), which might explain why it looks different to you. But whether it is or not, the big question mark remains why the results change. Commented Jun 8, 2020 at 14:21
  • inconsistencies are only related to base_salary_adjustment, make sure you run the query against the same rows having that field Commented Jun 8, 2020 at 14:21
  • Just in case it turns out to be relevant, what version of MySQL are you running Commented Jun 8, 2020 at 14:25
  • @RiggsFolly No, you're right. The adjustment is wrong too. The 1250 should be for employee 9. Commented Jun 8, 2020 at 14:27

1 Answer 1

2

This isn't specifically about JSON at all. This is a bug in MySQL 8.0.18, fixed in 8.0.20. The bug can occur with non-JSON columns.

https://bugs.mysql.com/bug.php?id=97920 Aggregation function [sum()] return random numbers

In fact the numbers are not wholly random, but are increasing each time one runs the query. It's like the SUM() is accumulating the sum in a temp table, and the same temp table is used for subsequent runs of the query, without zeroing out the sum.

Here's the result from the test case given in the bug above:

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |          22675 |            227 |
+--------+----------------+----------------+

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |          22675 |            454 |
+--------+----------------+----------------+

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |          22675 |            682 |
+--------+----------------+----------------+

As I test, it continues to increase by 227 each time I run the query. 227 is the correct result, and was returned the first time I ran the query.

Furthermore, the problem does not occur if the underlying data to be summed does not include any NULLs.

mysql> SELECT ym, ROUND(SUM(vb)), ROUND(SUM(vc)) FROM t_test where vc is not null GROUP BY ym;
+--------+----------------+----------------+
| ym     | ROUND(SUM(vb)) | ROUND(SUM(vc)) |
+--------+----------------+----------------+
| 201912 |            252 |            227 |
+--------+----------------+----------------+

The correct value of 227 is returned every time I run this query.

The same thing happens with your query. Since your expression regulation->>'$.base_salary_adjustment' returns NULL on rows where the JSON does not contain that key, it exhibits the same bug when using SUM().

If I modified the query to convert NULL to 0, it gives the correct result and does not change no matter how many times I run the query.

SUM(COALESCE(regulation->>'$.base_salary_adjustment', 0)) AS salary_adjustments

This must be done each time you reference that JSON key (or any time you use SUM() on an expression that may be NULL.

I suggest you just upgrade to MySQL 8.0.20.

Sign up to request clarification or add additional context in comments.

6 Comments

how about entry_base_salary then? there are rows with no entry_base_salary fields set, and still it gives the correct sum in the query result. does the order matter?
Apparently it does matter. It's a bug, so all logical behavior is moot.
@BillKarwin Thanks! I knew someone genius in here could find the answer... and I take some consolation it the fact that it is a bug, so I can cancel my appointment with the shrink ;)
LOL - MySQL was gaslighting you!
@BillKarwin I took a while because the system table update kept failing, but I took your advice and updated to MySQL 8.0.20 and you were spot on. Now the calculations work like a charm. Thanks again for your help!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.