How to find better algorithm for my prefix matcher algorithm

Question

I was solving online problem and the task was something like this:

There are two arrays: numbers and prefixes.

Array numbers contains numbers: “+432112345”, “+9990”, “+4450505”

Array prefixes contains prefixes: “+4321”, “+43211”, “+7700”, “+4452”, “+4”

Find longest prefix for each number. If no prefix found for number, match with empty string.

For example:

“+432112345” matches with the longest prefix “+43211” (not +4321, cause 43211 is longer).

“+9990” doesn't match with anything, so empty string "".

“+4450505” matches with “+4” (“+4452” doesn’t match because of the 2).

I came up with the most straight forward solution where I loop through each number with each prefix. So each time new number, I check prefixes, if some prefix is longer than last one, I will change.

Map<String, String> numsAndPrefixes = new HashMap<>();

for (String number : A) {
    for (String prefix : B) {
        if (number.contains(prefix)) {
            // if map already contains this number, check for prefixes. 
            // if longer exists, switch longer one
            if (numsAndPrefixes.containsKey(number)) {
                int prefixLength = prefix.length();
                int currentLen = numsAndPrefixes.get(number).length();
                if (prefixLength > currentLen) {
                    numsAndPrefixes.put(number, prefix);
                }
            } else {
                numsAndPrefixes.put(number, prefix);
            }
        } else if (!number.contains(prefix) && !numsAndPrefixes.containsKey(number)){
            numsAndPrefixes.put(number, "");
        }
    }
}

So it will have two for loops. I see that each time I am doing the same job over and over, e.g checking for prefixes. It works, but it is slow. The problem is that I can’t come up with anything better.

Could someone explain how they would approach to find better algorithm?

And more general, how do you proceed if you have somewhat working solution and trying to find better one? What knowledge am I still missing?

you can use Trie or Suffix tree data structures for more optimal solution, still it depends on time limit and stuff like that, if your solution works fast enough for your purpose those data structures are overkill. — Photon
– Photon, Commented Feb 16, 2020 at 14:05
You do realize that number.contains(prefix) is O(max_prefix.length) and you do that check for numbers.length*prefix.length times, your actual complexity ends up with O(n*p*max_prefix.length) where n is number of numbers and p is number of prefixes. — SomeDude
– SomeDude, Commented Feb 16, 2020 at 14:33
No, I did not realize that. Thanks for poining that out. Will check contains() function. — AverageJoe9000
– AverageJoe9000, Commented Feb 16, 2020 at 14:37
Jetty solved the same problem for URL path mappings -> webtide.com/jetty-9-goes-fast-with-mechanical-sympathy — PranasB
– PranasB, Commented Oct 19, 2023 at 5:10

Andreas · Accepted Answer · 2020-02-16 14:06:57Z

3

I would implement this using a TreeSet and the floor(E e) method.

String[] numbers = { "+432112345", "+9990", "+4450505" };
String[] prefixes = { "+4321", "+43211", "+7700", "+4452", "+4" };

TreeSet<String> prefixSet = new TreeSet<>(Arrays.asList(prefixes));
for (String number : numbers) {
    String prefix = prefixSet.floor(number);
    while (prefix != null && ! number.startsWith(prefix))
        prefix = prefixSet.floor(prefix.substring(0, prefix.length() - 1));
    if (prefix == null)
        prefix = "";
    System.out.println(number + " -> " + prefix);
}

Output

+432112345 -> +43211
+9990 -> 
+4450505 -> +4

answered Feb 16, 2020 at 14:06

Andreas

160k13 gold badges164 silver badges262 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

AverageJoe9000 Over a year ago

This is nice, but I can't see how come is your solution faster than mine. You are still looping each number with prefixes which I see is still O(n**2)? No? And can you please explain how expensive is to put array to TreeSet? Is it ok to do when some other data structure is somehow better?

Andreas Over a year ago

@AverageJoe9000 Your code is a full Cartesian join of numbers and prefixes, so O(n*m). This solution is O(n*log m). On average, it probably only has to look up in the prefixSet about 3 times per phone number, not scan the full prefix array. Sure, there is a one-time O(m*log m) startup cost to build the Set, so if you only have a few numbers or prefixes, it may not be worth it, but if the prefix list is huge, which it will be if you're doing all the international prefixes, and you have a lot of phone numbers to check, then the performance gain is immense.

AverageJoe9000 Over a year ago

Thanks for the explanation. Can you briefly describe how do you approach this kind of solution? I'm trying to figure out where do I lack some knowledge.

Andreas Over a year ago

@AverageJoe9000 Difficult to say. Learn about the capabilities of the various Java API classes. You don't have to remember the details, just high-level capabilities, e.g. in this case the capability to find neighboring values, which is a feature of both TreeMap and TreeSet, as well as sorted arrays where you can use Arrays.binarySearch() to find the location in sorted array. So at a very high level, finding neighboring values can be done for sorted collections.

sikr_ Over a year ago

Even though this is the accepted answer, the speed/efficiency of this solution is actually quite bad (esp. for cases where the prefix for a number is not in the map, speed is even worse than what a basic for-loop through the numbers would provide). The reason is: the keys of the map are compared lexicographically, meaning the call prefixSet.floor(shortenedPrefix) can result in a longer prefix to be returned. E.g. floor(496) -> 4930545648. This is why the algorithm can take many iterations to reach the smallest key X in the map where floor(X) -> null and the loop ends.

ardenit · Accepted Answer · 2020-02-16 15:01:13Z

2

The data structure you need is trie.

Add all prefixes in trie
For each string S in numbers:
- Start from the root of trie
- For each character in S:
  - If there is a link from current node, associated with current character, go by this link to the next node
  - If there is no link, then you reached the longest prefix - prefix stored in the current node is the answer for S

This algorithm works in O(length(prefixes) + length(numbers))

answered Feb 16, 2020 at 15:01

ardenit

3,89011 silver badges17 bronze badges

1 Comment

SomeDude Over a year ago

I would do it this way.

zomega · Accepted Answer · 2020-02-16 16:44:06Z

You are using .contains(). You should use .startsWith(). It's a lot faster.

Then in your else if you are checking what you already checked in the if.

This is only one approach on how to improve the algorithm:

Sort the prefixes:

+43211 +4321 +4452 +4 +7700

What is good about this? Well, it will always find the longest prefix first. You can exit the loop and don't have to look for longer prefixes.

Arrays.sort(prefixes, new Comparator<String>() {
@Override
public int compare(String o1, String o2) {
    return o1.startsWith(o2) ? 1 : o1.compareTo(o2);
}
});

Map<String, String> numsAndPrefixes = new HashMap<>();

for (String number: numbers) {
    numsAndPrefixes.put(number, "");
    for (String prefix: prefixes) {
        if (number.startsWith(prefix, 1)) {
            numsAndPrefixes.put(number, prefix);
            break;
        }
    }
}

But if your number starts with +1 and there is no prefix it will continue checking all the prefixes with +2 +3 +4 ... which are obviously not matching. (Issue 1)

Also if your number starts with +9 the prefix will be found very late. (Issue 2)

How to fix this? Well you can save the indices where +1 starts, +2 starts, ...:

In our prefix list:

0      1     2     3  4  5   (index)
+1233  +123  +2233 +2 +3 +4

+2 starts at index [2] and +3 starts at index [4]. So when you want to know the prefix for a number starting with +2 you only have to check elements [2] and [3]. This will both fix issue 1 and 2.

It would also be possible to store the indices for more digits (for example where +13 starts).

Collectives™ on Stack Overflow

How to find better algorithm for my prefix matcher algorithm

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related