2

I have a series of arrays that i draw data from and ultimately lead to a final array that holds the information i want. The final array is 2-dimensional, being comprised of a large number of single dimensional arrays, each holding up to 3 entries.

int[][] realOcc = new int[result.length][3];

The way the array holds data is as follows: The first "cell" holds a name, the second a region ID and the third an occurence number - an int telling me how many times this name came up in this particular region ID.

After sorting the array according to name using a Bubble Sort algorithm i naturally see many entries that i wouldn't want to be there. As an example, imagine a name coming up 3 times in a specific region ID. The way the array entries for this name would look like would be as follows:

Name1 regionID17 1
Name1 regionID17 2
Name1 regionID17 3
...
Name156 regionID1 1
Name168 regionID99 1
...

What i want to do is get rid of all the excess entries as in, the entries that correspond to the same name and regID, and only keep the maximum occurence number for each name in a specific region. So, taking the above example, what i would like to see after manipulating the array would be:

Name1 regionID17 3
...
Name156 regionID1 1
Name168 regionID99 1
...

Any ideas would be greatly appreciated since i'm pretty much stumped.. Keep in mind that since the data i'm pulling is pretty big in number, i also need to keep my code efficient.

4 Answers 4

2

I agree with Mario, you shouldn't be using an array structure here. The fact that you're using Bubble Sort suggests that you're in an intro programming course of some kind, so you may not know about ArrayLists, HashSets, the .equals() method, or the like, but that's what you really want to do. Create a custom object with a custom .equals() method - something like:

public class Record{
  String name;
  String region;

  public boolean equals(Object o){
    Record r = (Record)o;
    return name.equals(r.name) && region.equals(r.region);
  }

  public int hashCode(){
    return name.hashCode()+region.hashCode();
  }
}

Then you can use a HashMap<Record, Integer> to check if a record already exists in the set - if it does, increment count (the map's value) by 1, else add it.

If you want everything sorted in a certain order, you can either define a custom .compareTo() method and use a TreeMap or, if you want everything in insertion order, use a LinkedHashSet<Record> to preserve that order.

Sign up to request clarification or add additional context in comments.

2 Comments

Don't forget to implement hashCode or else this method will not work since the HashSet won't correctly find the already-inserted object. Also, for something like this I would use a HashMap<Record, Integer>; the Record has the name and region only (and is otherwise as in your answer) while the Integer is the count.
Thanks a lot for the answer, it pushed me to the right direction.
2

What you really should look into is using an ArrayList class to hold these 'items.'

You should also create a specific class to hold this data.

Your custom class should look something like this:

class Entry implements Comparable<Entry> {
    private String name, region;
    private int occuranceCount;

    public Entry(String nameP, regionP, occurCountP){
        name = nameP;
        region = regionP;
        occuranceCount = occurCountP;
    }

    // Getters

    public int compareTo(Entry other){
        return name.compareTo(other.name);
    }

    // Equals and hashcode
}

Then you can put these objects into an ArrayList<Entry> and use Collections.sort() which will be much faster than a bubble sort.

After they are sorted, you can loop through and remove duplicate entries using ArrayList.remove().

5 Comments

Why use an ArrayList and have a second iteration when you can use a hash data structure of some kind?
Well, perhaps he/she wants to have some sort of order to the data.
I know that a set could solve many of the problems, but it would also have no sense of order to the elements.
A TreeMap then would allow you to define a custom ordering. By the sound of the problem though, the sorting is only being done to identify duplicates and remove them.
You are probably right. I guess I was making the assumption that he would like to keep his code as similar to what it is now as possible.
2

A question comes to mind: Why are you using an array? I would think that it is better to use a Set object to hold your results, and then create a Result object that has three fields: one for name, one for region and one for the count. If the equals and hash methods are overriden to only take into account the region and the name, then you will have no duplicates on your set and you can use it to keep track of these result objects.

Another way to achieve the same is to have a Map, where the key is the name + region and the value is the count. That would also make it simple to implement and ensure that you have no duplicates.

Comments

0

Sounds like a Hashtable or Map might be useful. You could make a single pass through the raw data, and use the Map to lookup the name, add it if not seen yet, or check to see if it has been exceeded in value by the new entry if it does. This wouldn't require sorting ahead of time; you could sort afterwards, and save a lot of time :-)

1 Comment

Though not technically deprecated, you don't want to use Hashtable if you can help it, HashMap is correct usage.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.