18

What is the most concise and efficient way to translate an array of strings in a regex and then use the regex multiple times on different strings to get the result matches and then iterate over them? Now I'm using the following:

var myArray = ['peaches', 'bananas', 'papaya', 'supercity'];
var myString = 'I want some papaya and some peaches';

var regexFromMyArray = new RegExp(myArray.toString().replace(/,/g, '|'), 'gi');

var matches = myString.match(regexFromMyArray) || [];

if (matches.length) {
  for (var i = 0, l = matches.length; i < l; i++) {
    console.log('Found: ' + matches[i]);
  }
}

performance is important here, so plain javascript please.

2 Answers 2

41

Just join with pipeline, using Array.join

var regexFromMyArray = new RegExp(myArray.join("|"), 'gi');

and just do this as if condition is just redundant.

for(var i = 0; i < matches.length; i++)
   console.log("Found:", matches[i]);
  1. A single method is being used instead of initial 3. (toString internally calls join(",")) and replace function is also not used.
  2. We have removed an unnecessary if-condition. So that's pretty quick.

And since you talk about regexes, I'd like to say that

  1. A single regex initialization isn't going to cost you much.
  2. If your objective is really to match the words in the array, then just go with String.indexOf, which is a non-regex form of solving the same.
Sign up to request clarification or add additional context in comments.

10 Comments

Well, I guess you really should mention the need to escape the input strings in case they contain special regexp characters. Also, if the OP is so worried about performance, he should be less worried about the one-time cost of building the regexp, and more worried about actual matching performance. From that perspective, could you opine on the performance of regexps compared to the approach of just checking each string individually?
@torazaburo I'd for sure. By From that perspective, could you opine on the performance of regexps compared to the approach of just checking each string individually? are you meaning indexOf?
As usual, the good answer depends on the size of the arrays... I made a jsperf with your both source and my implementation with indexOf... seems to be better.
@PatrickFerreira Your jsperf is not really right, because you're reconstructing the regexp every single time, whereas in real life you'd be constructing it once then using it many times. Try moving the regexp construction into the preparation code and see how things look.
Under those conditions the regexp is dramatically faster.
|
7

Here is an improved version of the currently accepted answer presented as a function that accepts an array of strings and an optional list of flags (as a string). It accounts for strings containing special characters and escapes them. It also sorts the strings in such a way so to function as expected by the maximal munch principle.

const regexFrom = (strings, flags) =>
  new RegExp(
    strings
      // Escape special characters
      .map(s => s.replace(/[()[\]{}*+?^$|#.,\/\\\s-]/g, "\\$&"))
      // Sort for maximal munch
      .sort((a, b) => b.length - a.length)
      .join("|"),
    flags
  );


// Example

const strings = [".+", "apple", "[a-z]*", "app", "apps", "orange", "banana[]"];
const pattern = regexFrom(strings, "gi");

const string = "I really like Apple phones, they have great apps! Check out this regex: /.+/";

let result; while (result = pattern.exec(string)) {
  console.log(result[0]);
}

1 Comment

Escaping slashes in a bracket expression is redundant. We can just use /[()[\]{}*+?^$|#.,/\\\s-]/g.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.