1

I have a table like this in MySQL. I want to calculate average of values (column Value) but not using all rows. From each group (Column Group) I want to use only one value, with the biggest rank (Column Rank) if there are multiple rows in that group.

+----+-------+-------+------+
| ID | Value | Group | Rank |
+----+-------+-------+------+
|  1 |    10 |     1 |    1 |
+----+-------+-------+------+
|  2 |     9 |     1 |    2 |
+----+-------+-------+------+
|  3 |     7 |     2 |    2 |
+----+-------+-------+------+
|  4 |    10 |     2 |    1 |
+----+-------+-------+------+
|  5 |    11 |     3 |    1 |
+----+-------+-------+------+
|  6 |     9 |     4 |    1 |
+----+-------+-------+------+
|  7 |     8 |     5 |    1 |
+----+-------+-------+------+
|  8 |    10 |     6 |    2 |
+----+-------+-------+------+
|  9 |     9 |     7 |    1 |
+----+-------+-------+------+

So, in group 1 I must use value 9 from row ID 2 because it has the biggest rank. In group 2 I will use value 7 from row ID 3 because it has the biggest rank. And the rest I will use the only values because there are no alternatives. In the end I want to calculate the average of values from rows 2, 3, 5, 6, 7, 8, 9. How can I do that in one query?

3
  • what does select version(); show? Commented Jul 29, 2021 at 22:58
  • is (group,rank) unique? if not, what value is used if there are multiple ranks for the same group? Commented Jul 29, 2021 at 22:59
  • select version(); result is 5.7.35-log Commented Jul 31, 2021 at 20:28

1 Answer 1

1

You can use NOT EXISTS to filter only the rows with the highest rank from each group:

SELECT AVG(t1.Value) avg_value
FROM tablename t1
WHERE NOT EXISTS (
  SELECT 1
  FROM tablename t2
  WHERE t2.group = t1.group AND t2.rank > t1.rank
)

Or, with a correlated subquery:

SELECT AVG(t1.Value) avg_value
FROM tablename t1
WHERE t1.rank = (SELECT MAX(t2.rank) FROM tablename t2 WHERE t2.group = t1.group)

See the demo.

Sign up to request clarification or add additional context in comments.

5 Comments

in mysql 8 or mariadb 10.2+, you can use FIRST_VALUE for this: dbfiddle.uk/…
@ysth yes but you still need a subquery.
not all subqueries are created equal :) I would bet window functions outperform both of your queries where there are large numbers of records with a lot of ranks for each group
@ysth don't bet on that. Window functions need a full table scan for each partition. EXISTS does not need that. It returns as soon as it finds a match. Also, not all window functions are as fast as you believe. For example row_number() is (usually) faster than rank() and max() is (usually) faster than first_value().
often the most efficient way to do this kind of thing is SELECT AVG(Value) FROM (SELECT CAST(SUBSTR(MAX(CONCAT(LPAD(t.Rank,10,0),t.Value)),11) AS INTEGER) AS Value FROM tablename t GROUP BY t.Group) AS max_rank_values (assuming unsigned ranks) but that's kind of gross

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.