1
$\begingroup$

I am trying to evaluate several methods to compress some 2D data points. The algorithm itself is not relevant, but from the output, I can compute the MSE and the number of points (which can be used to calculate the compression ratio). Is there any metric that combines compression quality (MSE) with the resulting number of points (compression ratio)?

Given some feedback in the comments, I will explain in more detail the problem and one possible metric (hoping to get some comments on the metric).

As stated, we have a set of x and y points that is large (millions of points) that needed to be simplified in order to be analysed. The points are not produced by any smooth function, they can be noisy. The analysis includes finding relevant peaks and valleys. I am evaluating several different methods that reduce/compress this set of points into a smaller one while minimizing a cost function (based on MSE). The reducing algorithm itself is based on RDP.

There are several different configurations that need to be evaluated. So I require a single metric that gives me information regarding the quality of each point. The error is minimal if I use all the points in the set, but that does not reduce the number of points. The idea is to get the minimum number of points while keeping the ones that reduce the greatest amount the error.

Based on the comments, one idea would be to compute a rough estimation of the improvement of each point. Let us define improvement as the difference between a reference MSE and the final MSE: I = MSE_r - MSE_f One brought estimative for the reference MSE (MSE_r) would be to compute the MSE using only the first and the last point in the set. These points would be connected by a straight line.

The final metric (AIP) could them be the improvement over the number of points (N): AIP = I/N = (MSE_r - MSE_f)/N

AIP stand by Average Improvement per Point.

Any better idea?

$\endgroup$
7
  • $\begingroup$ yes, sure, any function that you can think of of one positive real number and a positive integer would would fulfill that criterion. You need to define what that function should do! $\endgroup$ Commented May 23, 2022 at 19:34
  • $\begingroup$ That's an interesting problem you're working on, by the way :) My recommendation is to first make a scatter plot with compression and quality as the axes. You can then define the points that are "best" for their region (you might find that some points are simply worse in both compression and quality than others. you don't care about these.) Pareto Front is probably the concept you'll want to understand! $\endgroup$ Commented May 23, 2022 at 19:35
  • $\begingroup$ That is a good idea, I was trying to find any standard metric to evaluate this kind of problem. Any metric that could be adopted from image, video or audio compression... $\endgroup$ Commented May 23, 2022 at 20:08
  • $\begingroup$ yeah, as said, any function that is monotonous in these two parameters works as metric. So, that's not a "useful" requirement. You need to define what the function should actually do. $\endgroup$ Commented May 23, 2022 at 20:30
  • $\begingroup$ I would like to measure the Average Improvement per Point (AIP). My idea would be something like this. The improvement would be defined as the difference between a reference MSE and the final MSE. The reference MSE would be computed using a straight line from the first to the second point. The AIP would then be the improvement over the number of points used. $\endgroup$ Commented May 24, 2022 at 9:26

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.