Fast ordered list matching algorithm in Java

Question

I have a list of rules in the form

L1 -> (A, B, C)

L2 -> (D, E),

L3 -> (F, G, A),

L4 -> (C, A)

.....

This list contains ~30k such rules.

I have an input in the form (X, Y, Z)

This creates a method

List <Rule> matchRules(input)

Which belongs to a class RuleMatcher

I started with a very simple clear naive solution, in order to get the framework down, get something working.

public RuleMatcher(Collection<Rule> rules) {
   this.rules = rules;
}

public Collection<Rule> matchRules(List<Token> input) {
   List<Rule> matchingRules = new ArrayList<>();
   for(Rule r: this.rules) {
        if(r.matches(input)) {
            matchingRules.add(r);
        }
   }
   return matchingRules; 
}

Where matches is a very simple function that checks if the lengths are the same, and then checks each token as a for loop.

This matchRules function is called in the magnitude of billions of times.

Obviously this is a very poor implementation. According to my profiler at least half of the execution time is is spent in this matches function.

I was thinking of two possible solutions:

A. Some sort of Trie data structure holding the chains of rules which could be matched.

B. some sort of hash function. Each symbol is given a unique identifier. Unfortunately, there are about 8 thousand unique symbols so this might be difficult.

C. Make a hashmap conditioning on the size of the right hand side, the number of tokens in the rule. Unfortunately, the majority of the rules are about the same size, so this may not even be worthwhile.

D. Some awesome solution one of you come up with.

I hope somebody can shed some light on this problem.

Edit: A token is just an object with a unique number. For example "NN" is a token. Each instance of "NN" is exactly the same.

Matches code:

public boolean rhsMatches(List<Token> tokens) {
   if(tokens.size()!=rhsSize()) return false;
   for(int i = 0;i<rhsSize();i++) {
      if(!rightSide.get(i).equals(tokens.get(i)) {
        return false;
      }
   }
   return true;
}

Its not very pretty, but its simple.

Could you give us the definition of the tokens. Without knowing what is being matched and how the matching is done it will be difficult to propose an optimization. — Leonard Brünings
– Leonard Brünings, Commented Jan 16, 2014 at 16:58
So you have 30,000 rules (L1, L2, ...) containing sets of 8,000 unique tokens (A, B, ...) correct? Have you considered creating a "reverse lookup table" (can't remember the actual name) where you index which rules the tokens are in? This may take a lot of memory, but speed should increase greatly. — Uxonith
– Uxonith, Commented Jan 16, 2014 at 17:00
You can use some another hash (like checksum) for keys, not only the length. And, yes, matches code would be helpful. — khachik
– khachik, Commented Jan 16, 2014 at 17:01
I'd say your idea of a TrieSet would be your best first-hit. Essentially - you need to build a grammar. — OldCurmudgeon
– OldCurmudgeon, Commented Jan 16, 2014 at 17:06
I think that HashMap<List<Token>, Rule> should work, if you override hashCode and equals methods on Token class. Trie would be far more efficient when it comes to memory usage — Sami Korhonen
– Sami Korhonen, Commented Jan 16, 2014 at 18:01

ElKamina · Accepted Answer · 2014-01-16 17:33:41Z

1

Why not sort your rule list to begin with. Then you can binary search for the matching rule.

answered Jan 16, 2014 at 17:33

ElKamina

7,82730 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Uxonith Over a year ago

This would require the Rule to implement Comparable. That sounds like the easiest and fastest solution to me. Depending on how often items are added to the list, the list should be a SortedMap or SortedTree.

janek · Accepted Answer · 2014-01-16 17:39:55Z

0

To me it looks like a perfect scenario for engaging some worker threads. Tasks of matching seem independent of each other, divide the list of rules and delegate the matching to workers, if its possible in your situation.

answered Jan 16, 2014 at 17:39

janek

1841 silver badge6 bronze badges

Collectives™ on Stack Overflow

Fast ordered list matching algorithm in Java

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related