-2

Using an open source Java automaton library, eg: org.apache.lucene.util.automaton or dk.brics.automaton, how can I build an automaton for prefix matching?

eg: an automaton created from the set of strings ["lucene", "lucid"], that will match when given "luc", or "luce", but not match when given "lucy" or "lucid dream".

2
  • This is exactly how a trie works. A similar idea can be used to construct the automaton. Usage of an "end of input" character might be useful as well - such as $. Commented Feb 12, 2017 at 21:56
  • I'm familiar with tries, although the implementations I've found in Java (eg: PatriciaTrie) are actually Maps, and will return a value associated with a prefix. I just want to check for the presence of a prefix. Commented Feb 22, 2017 at 0:13

1 Answer 1

0

Prefix matching is possible using org.apache.lucene.util.automaton by setting all states to accept, eg:

    String[] strings = new String[]{"lucene", "lucid dream"};
    final List<BytesRef> terms = new ArrayList<>();
    for(String s : strings) {
        terms.add(new BytesRef(s));
    }
    Collections.sort(terms);
    final Automaton a = DaciukMihovAutomatonBuilder.build(terms);

    for (int i = 0; i < a.getNumStates(); i++) {
        a.setAccept(i, true);
    }
Sign up to request clarification or add additional context in comments.

1 Comment

Could you complete the example with the code that tests a string against the such created Automaton? I cannot find any examples with the DaciukMihovAutomatonBuilder

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.