If you have a lot of phrases and many keywords, it might be better to parallelize the matching instead of using regex. This is indeed much faster than using a regex in a loop on the same processor.
First you need one processing class, which is submitted to individual work threads:
final class StringMatchFinder implements Runnable {
private final String text;
private final Collection<Match> results;
public StringMatchFinder(final String text, final Collection<Match> results) {
this.text = text;
this.results = results;
}
@Override
public void run() {
for (final String keyword : keywords) {
if (text.contains(keyword)) {
results.add(new Match(text, keyword));
}
}
}
}
Now you need a ExecutorService:
final ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
And then process the phrases:
public void processText(List<String> texts) {
final Collection<Match> results = new ConcurrentLinkedQueue<Match>();
final Collection<Future<?>> futures = new LinkedList<Future<?>>();
for (final String text : texts) {
futures.add(es.submit(new StringMatchFinder(text, results)));
}
es.shutdown();
try {
es.awaitTermination(1, TimeUnit.DAYS);
} catch (InterruptedException e) {
e.printStackTrace();
}
for (final Match match : results) {
System.out.println(match.getOriginalText() + " ; keyword found:" + match.getKeyword());
//or write them to a file
}
}
The loop over the futures is to check for processing errors. Results are saved in a list of matches
Here is a complete example.
The class Match
public class Match {
private String originalText;
private String keyword;
public Match(String originalText, String keyword) {
this.originalText = originalText;
this.keyword = keyword;
}
public void setOriginalText(String originalText) {
this.originalText = originalText;
}
public String getOriginalText() {
return originalText;
}
public void setKeyword(String keyword) {
this.keyword = keyword;
}
public String getKeyword() {
return keyword;
}
}
The Processor class
public class Processor {
final ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
private Collection<String> keywords;
public Processor(Collection<String> keywords) {
this.keywords = keywords;
}
final class StringMatchFinder implements Runnable {
private final String text;
private final Collection<Match> results;
public StringMatchFinder(final String text, final Collection<Match> results) {
this.text = text;
this.results = results;
}
@Override
public void run() {
for (final String keyword : keywords) {
if (text.contains(keyword)) {
results.add(new Match(text, keyword));
}
}
}
}
public void processText(List<String> texts) {
final Collection<Match> results = new ConcurrentLinkedQueue<Match>();
final Collection<Future<?>> futures = new LinkedList<Future<?>>();
for (final String text : texts) {
futures.add(es.submit(new StringMatchFinder(text, results)));
}
es.shutdown();
try {
es.awaitTermination(1, TimeUnit.DAYS);
} catch (InterruptedException e) {
e.printStackTrace();
}
for (final Match match : results) {
System.out.println(match.getOriginalText() + " ; keyword found:" + match.getKeyword());
}
}
}
A main class for testing
public class Main {
public static void main(String[] args) {
List<String> texts = new ArrayList<String>();
List<String> keywords = new ArrayList<String>();
texts.add("John was killed in London");
texts.add("No match test!");
texts.add("Joe was murdered in New York");
texts.add("Michael was kidnapped in York");
//add more
keywords.add("murdered");
keywords.add("killed");
keywords.add("kidnapped");
Processor pp = new Processor(keywords);
pp.processText(texts);
}
}