0

I have a List collection where each Metric contains several properties such as: metricName, namespace, fleet, type, component, firstSeenTime, lastSeenTime, etc. There are duplicates in this list such that all properties are same except for firstSeenTime and lastSeenTime. I am looking for an elegant way to filter this list and only return the metrics with the most recent lastSeenTime when there are such duplicates.

Something better than this:

private List<Metric> processResults(List<Metric metrics) {
    List<Metric> results = new ArrayList<>();

    for (Metric incomingMetric: metrics) {

        // We need to implement "contains" below so that only properties
        // other than the two dates are checked.
        if (results.contains(incomingMetric) { 
            int index = results.indexOf(incomingMetric);
            Metric existing = results.get(index); 
            if (incomingMetric.getLastSeen().after(existing.getLastSeen())) {
                results.set(index, metricName);
            } else {
                // do nothing, metric in results is already the latest 
            }
        } else {
            // add incomingMetric to results for the first time
            results.add(incomingMetric);
        }
    }

    return results;
}

The results.contains check is done by iterating over all the Metrics in results and checking if each object matches the properties except for the two dates.

What could be a better approach that this for both elegance and performance?

0

3 Answers 3

1

In java the most elegant way to compare things is the Comparator interface. You should remove the duplicates using something like:

public List<Metric> removeDuplicates(List<Metric> metrics) {

    List<Metric> copy = new ArrayList<>(metrics);
    //first sort the metrics list from most recent to older
    Collections.sort(copy, new SortComparator());

    Set<Metric> set = new TreeSet<Metric>(new Comparator<Metric>() {

        @Override
        public int compare(Metric o1, Metric o2) {
            int result = 0;
            // compare the two metrics given your rules
            return result;
        }
    });

    for(Metric metric : copy) {
        set.add(metric);
    }

    List<Metric> result = Arrays.asList(set.toArray());
    return result;
 }

class SortComparator implements Comparator<Metric> {

    @Override
    public int compare(Metric o1, Metric o2) {
        int result = 0;
        if(o2.getLastSeenTime() != null && o1.getLastSeenTime() != null) {
            result = o2.getLastSeenTime().compareTo(o1.getLastSeenTime());
        }
        return result;
    }

}

The strong of this approach is that you could write a family of comparators and provide a Factory to choose at runtime the best way to compare your metrics and remove or not instances as duplicates among the runtime conditions:

public void removeDuplicates(List<Metric> metrics, Comparator<Metric> comparator) {

    List<Metric> copy = new ArrayList<>(metrics);
    Collections.sort(copy, new SortComparator());

    Set<Metric> set = new TreeSet<Metric>(comparator);
    for(Metric metric : copy) {
        set.add(metric);
    }
    List<Object> result = Arrays.asList(set.toArray());
    return result;
 }
Sign up to request clarification or add additional context in comments.

2 Comments

you are maintaining TreeSet to avoid the duplicates right? So how do you know the Set is holding latest Metric? Basically how do you differentiate the latest Metric from the duplicate Metric in your logic?
Thank you. I will fix my answer.
1

I’m not sure how you are generating List<Metric>. But if you can maintain a Map<String, Metric> instead of that list you may can try the below approach.

So the key of this map will be a combination of all these values you need to compare. (except the date attributes.)

Key: “{metricName}${type}$.....”

For this you can maintain another attribute in Metric object with getter. When you call the getter it will return the key.

Then check the key is exist or not before you put into the map. If it’s exist, get the stored Metric in map for that key and do the date comparison to find the latest Metric object. If it’s the latest replace the map's stored object with new object.

PS : Do the execution time comparison for both cases. So you will find the best approach.

1 Comment

Thanks. This looks good, I constructed the key using a static nested class (to not have to deal with field delimiters when constructing the key as string), but yes just a string would suffice.
0

Thanks for the answers. I went with the map approach since it does not incur additional sorts and copies.

@VisibleForTesting
Set<Metric> removeDuplicates(List<Metric> metrics) {

Map<RawMetric, Metric> metricsMap = new HashMap<>();
for (Metric metric : metrics) {
    RawMetric rawMetric = RawMetric.builder()
            .metricName(metric.getName())
            .metricType(metricName.getMetricType())
            ... // and more
            .build();

        // pick the latest updated metric (based on lastSeen date)
        BiFunction<RawMetric, Metric, Metric> biFunction =
            (k, v) -> Metric.builder()
                    .name(k.getMetricName())
                    .metricType(k.getMetricType())
                    ... // and more                        
                    .lastSeen(v.getLastSeen().after(
                        metricName.getLastSeen()) ? v.getLastSeen() : 
                            metricName.getLastSeen())
                    .firstSeen(v.getFirstSeen())
                    .build();

        metricsMap.putIfAbsent(rawMetric, metric);
        metricsMap.computeIfPresent(rawMetric, biFunction);
    }

    return ImmutableSet.copyOf(metricsMap.values());
}

@Value
@Builder
static class RawMetricName {
    private String metricName;
    private String metricType;
    private String ad;
    private String project;
    private String fleet;
    private String host;
    private int granularity;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.