Write an algorithm for finding the duplicated grades

Question

I have an interesting real world problem. My brother is currently in school. The teachers perform the following procedure when giving him his grades. They take his list of grades grades (which are non-negative real numbers that are multiples of 0.1 - ex. 95.3, 86, 103.4), duplicate some or none of them to form the list duplicated_grades, calculate the average of duplicated_grades rounded to the tenths place dup_avg = round_tenth(average(duplicated_grades)), and then give him the list grades and the average dup_avg. My mother wants to know which of the grades in grades could have been duplicated to form dup_avg.

To provide some examples:

input:
grades = ["100", "90", "80"]
dup_avg = "90"

desired output:
no duplicates
1 90
1 100, 1 80
1 100, 1 90, 1 80

no duplicates means that duplicating none of the grades in grades and then taking the average results in dup_avg

1 90 means that duplicating one 90 in grades and then taking the average results in dup_avg

1 100, 1 80 means that duplicating one 100 and one 80 in grades and then taking the average results in dup_avg

input:
grades = ["40", "70", "90", "70", "90"]
dup_avg = "67.1"

desired output:
1 70, 1 40

I developed a recursive solution that works well for this problem, shown below:

from treelib import Node, Tree
from decimal import localcontext, Decimal, ROUND_HALF_UP

class SNA(object):
  def __init__(self, sum, num, avg):
    self.sum = sum
    self.num = num
    self.avg = avg


def find_dup_grades(grades, dup_avg):

  with localcontext() as ctx:
    ctx.rounding = ROUND_HALF_UP

    grades = [Decimal(g) for g in grades]
    dup_avg = Decimal(dup_avg)

    uniq_grades = set()
    for g in grades:
      uniq_grades.add(g)

    desc_u_grad = list(uniq_grades)
    desc_u_grad.sort(reverse=True)

    grade_count = {}
    for g in uniq_grades:
      grade_count[g] = grades.count(g)

    tree = Tree()
    s = sum(grades)
    n = len(grades)
    a = round(s / n, 1)
    tree.create_node("root", "root", data=SNA(s, n, a))
    fdg_helper(tree, grade_count, desc_u_grad, dup_avg, 0, "root")

    # print(tree.show(data_property="avg", stdout=False))


def fdg_helper(tree, grade_count, desc_u_grad, dup_avg, i, pname):

  pdata = tree.get_node(pname).data

  if i == len(desc_u_grad):
    if pdata.avg == dup_avg:
      print(solution(desc_u_grad, pname))
    return

  if pdata.avg < dup_avg and desc_u_grad[i] < dup_avg:
    return

  for j in range(grade_count[desc_u_grad[i]] + 1):
    name = "{0}{1}".format(pname, j)
    s = pdata.sum + (j * desc_u_grad[i])
    n = pdata.num + j
    a = round(s / n, 1)

    tree.create_node(name, name, parent=pname, data=SNA(s, n, a))
    fdg_helper(tree, grade_count, desc_u_grad, dup_avg, i+1, name)


def solution(desc_u_grad, pname):

  sol = ""
  for i, val in enumerate(pname[4:]):
    if val != "0":
      sol += "{0} {1}, ".format(val, desc_u_grad[i])

  if sol == "":
    return "no duplicates"

  return sol[:-2]

Even though I have a solution that works, with a love for programming, I was curious if there are any other approaches to this problem. Maybe a solution with a faster runtime or that uses parallel programming. All approaches are welcome.

Also, please let me know if there is a better place to post a question of this nature. I will gladly move my question.

In general you can't know; there might be multiple valid solutions. What do you want to happen then? Output one valid solution? — D.W.
– D.W. ♦, Commented Apr 12, 2024 at 17:58
If there are multiple valid solutions, then all valid solutions should be printed out. For example, for the first provided test case, there are 4 valid solutions: no duplicates; 1 90; 1 100, 1 80; and 1 100, 1 90, 1 80. — Pranav Jain
– Pranav Jain, Commented Apr 12, 2024 at 23:44
I updated the question to be more clear that we are looking for all valid solutions. — Pranav Jain
– Pranav Jain, Commented Apr 12, 2024 at 23:55
are grades a multiple of $0.1$? (If so, my dynamic programming / subset sum method can be adjusted accordingly.) Thank you for updating the question! — D.W.
– D.W. ♦, Commented Apr 13, 2024 at 22:02
Yes. The grades are a multiple of 0.1. I'll add that to the question. — Pranav Jain
– Pranav Jain, Commented Apr 14, 2024 at 11:44

D.W. · Accepted Answer · 2024-04-14 21:00:52Z

I will describe two different approaches to solve this problem.

Caveats

There might be exponentially many valid solutions, so any algorithm might have to take exponentially long. So there is no general-purpose solution that always has a reasonable running time.

Integer linear programming

This can be framed as an instance of integer linear programming. Let $g_1,\dots,g_n$ denote the grades, and $\mu$ the average of duplicated grades. These are known (inputs to the algorithm). Let $d_1,\dots,d_n$ be variables, where $d_i$ indicates the number of times that grade $g_i$ has been duplicated. Each $d_i$ is a non-negative integer, and the goal is to find the $d_i$'s. Then the requirement is

$$(\mu - 0.05)n \le (1+d_1) g_1 + (1+d_2) g_2 + \dots + (1+d_n)g_n < (\mu + 0.05)n.$$

Since each grade can only be duplicated at most once, we additionally have the constraint that $0 \le d_i \le 1$.

The $0.05$ arises to account for rounding to the nearest $1/10$th.

Your goal is to find all tuples $(d_1,\dots,d_n)$ that satisfy this requirement. One simple approach is to use an integer linear programming (ILP) solver and ask it to return all feasible solutions to this ILP instance.

Dynamic programming

Alternatively, you can view this as a subset sum problem. Let $\mu$ denote the average of duplicated grades. Let $T_\ell = (\mu - 0.05)n - (g_1 + \dots + g_n)$ and $T_u = (\mu + 0.05)n - (g_1 + \dots + g_n)$. The goal is to find all ways to choose a subset of the grades $g_1,\dots,g_n$ such that the subset sums to some value in the range $[T_\ell,T_u)$.

There is a standard dynamic programming algorithm to solve the subset sum problem. This algorithm can be easily adapted to output all solutions such that the subset sums to something in the range $[T_\ell,T_u)$. The running time is "pseudo-polynomial", i.e., polynomial in $g_1+\dots+g_n$. Therefore, it might be practical for real-world problems where each grade $g_i$ is an integer between 0 and 100.

Apparently, in your setting grades are a multiple of 0.1, not integers. But if you multiply everything by 10, then you everything is an integer, and you can apply the above methods.

Of course, if there are exponentially many valid solutions, it will still take exponential time to solve the problem.

Thank you for the reply. So one point to clarify is that each grade my brother receives can only be duplicated (not tripled, quadrupled, etc.). So if he received the grades 100, 100, 90 , the teacher could decide duplicate both of the 100s. So in regards to the way you framed the problem, d_i is either 0 or 1. — Pranav Jain
– Pranav Jain, Commented Apr 13, 2024 at 4:52
So the reason we can be sure that no general solution exists that has a nonexponential runtime is because we could consider a case where there are exponentially many solutions? — Pranav Jain
– Pranav Jain, Commented Apr 13, 2024 at 4:59
@PranavJain, hopefully my revised answer will address all of your comments. — D.W.
– D.W. ♦, Commented Apr 13, 2024 at 5:59
Thank you for your insight! I will try implementing some of the ideas you mentioned in python. One thing I forgot to mention is that the grades are not necessarily integers and between 0 and 100. You could have a grade of 97.3 or 103 (due to bonus points). Would that significantly change any of the solutions you provided here, or would they still apply? Sorry about omitting this important detail. — Pranav Jain
– Pranav Jain, Commented Apr 13, 2024 at 14:07

Stack Exchange Network

Write an algorithm for finding the duplicated grades

1 Answer 1

Caveats

Integer linear programming

Dynamic programming

Your Answer

Hot Network Questions

Write an algorithm for finding the duplicated grades

1 Answer 1

Caveats

Integer linear programming

Dynamic programming

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions