Is there a way to retrieve the (starting) character positions inside a string of the results of a regex match() in Javascript?
13 Answers
exec returns an object with a index property:
var match = /bar/.exec("foobar");
if (match) {
console.log("match found at " + match.index);
}
And for multiple matches:
var re = /bar/g,
str = "foobarfoobar";
while ((match = re.exec(str)) != null) {
console.log("match found at " + match.index);
}
19 Comments
re as a variable, and adding the g modifier are both crucial! Otherwise you will get an endless loop.undefined. jsfiddle.net/6uwn1vof/2 which is not a search-like example like yours.g flag and it'll work. Since match is a function of the string, not the regex it cannot be stateful like exec, so it only treats it like exec (i.e. has an index property) if you're not looking for a global match...because then statefulness doesn't matter.Here's what I came up with:
// Finds starting and ending positions of quoted text
// in double or single quotes with escape char support like \" \'
var str = "this is a \"quoted\" string as you can 'read'";
var patt = /'((?:\\.|[^'])*)'|"((?:\\.|[^"])*)"/igm;
while (match = patt.exec(str)) {
console.log(match.index + ' ' + patt.lastIndex);
}
5 Comments
match.index + match[0].length also works for the end position.match.index + match[0].length - 1?.slice() and .substring(). Inclusive end would be 1 less as you say. (Be careful that inclusive usually means index of last char inside match, unless it's an empty match where it's 1 before match and might be -1 outside the string entirely for empty match at start...)patt = /.*/ it goes infinity loop how can we restrict that?In modern browsers, you can accomplish this with string.matchAll().
The benefit to this approach vs RegExp.exec() is that it does not rely on the regex being stateful, as in @Gumbo's answer.
let regexp = /bar/g;
let str = 'foobarfoobar';
let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
console.log("match found at " + match.index);
});
3 Comments
matchAll ``` let regexp = /bar/g; let str = 'foobarfoobar'; let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index); console.log(matchIndices)```g flag and get errorFrom developer.mozilla.org docs on the String .match() method:
The returned Array has an extra input property, which contains the original string that was parsed. In addition, it has an index property, which represents the zero-based index of the match in the string.
When dealing with a non-global regex (i.e., no g flag on your regex), the value returned by .match() has an index property...all you have to do is access it.
var index = str.match(/regex/).index;
Here is an example showing it working as well:
var str = 'my string here';
var index = str.match(/here/).index;
console.log(index); // <- 10
I have successfully tested this all the way back to IE5.
4 Comments
index property (see the answer)Here is a cool feature I discovered recently, I tried this on the console and it seems to work:
var text = "border-bottom-left-radius";
var newText = text.replace(/-/g,function(match, index){
return " " + index + " ";
});
Which returned: "border 6 bottom 13 left 18 radius"
So this seems to be what you are looking for.
1 Comment
arguments that is the position. Not "the second argument". The function arguments are "full match, group1, group2, ...., index of match, full string matched against"I'm afraid the previous answers (based on exec) don't seem to work in case your regex matches width 0. For instance (Note: /\b/g is the regex that should find all word boundaries) :
var re = /\b/g,
str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
console.log("match found at " + match.index);
if (guard-- < 0) {
console.error("Infinite loop detected")
break;
}
}
One can try to fix this by having the regex match at least 1 character, but this is far from ideal (and means you have to manually add the index at the end of the string)
var re = /\b./g,
str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
console.log("match found at " + match.index);
if (guard-- < 0) {
console.error("Infinite loop detected")
break;
}
}
A better solution (which does only work on newer browsers / needs polyfills on older/IE versions) is to use String.prototype.matchAll()
var re = /\b/g,
str = "hello world";
console.log(Array.from(str.matchAll(re)).map(match => match.index))
Explanation:
String.prototype.matchAll() expects a global regex (one with g of global flag set). It then returns an iterator. In order to loop over and map() the iterator, it has to be turned into an array (which is exactly what Array.from() does). Like the result of RegExp.prototype.exec(), the resulting elements have an .index field according to the specification.
See the String.prototype.matchAll() and the Array.from() MDN pages for browser support and polyfill options.
Edit: digging a little deeper in search for a solution supported on all browsers
The problem with RegExp.prototype.exec() is that it updates the lastIndex pointer on the regex, and next time starts searching from the previously found lastIndex.
var re = /l/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
This works great as long as the regex match actually has a width. If using a 0 width regex, this pointer does not increase, and so you get your infinite loop (note: /(?=l)/g is a lookahead for l -- it matches the 0-width string before an l. So it correctly goes to index 2 on the first call of exec(), and then stays there:
var re = /(?=l)/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
The solution (that is less nice than matchAll(), but should work on all browsers) therefore is to manually increase the lastIndex if the match width is 0 (which may be checked in different ways)
var re = /\b/g,
str = "hello world";
while ((match = re.exec(str)) != null) {
console.log("match found at " + match.index);
// alternative: if (match.index == re.lastIndex) {
if (match[0].length == 0) {
// we need to increase lastIndex -- this location was already matched,
// we don't want to match it again (and get into an infinite loop)
re.lastIndex++
}
}
Comments
I had luck using this single-line solution based on matchAll (my use case needs an array of string positions)
let regexp = /bar/g;
let str = 'foobarfoobar';
let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);
console.log(matchIndices)
output: [3, 9]
1 Comment
This member fn returns an array of 0-based positions, if any, of the input word inside the String object
String.prototype.matching_positions = function( _word, _case_sensitive, _whole_words, _multiline )
{
/*besides '_word' param, others are flags (0|1)*/
var _match_pattern = "g"+(_case_sensitive?"i":"")+(_multiline?"m":"") ;
var _bound = _whole_words ? "\\b" : "" ;
var _re = new RegExp( _bound+_word+_bound, _match_pattern );
var _pos = [], _chunk, _index = 0 ;
while( true )
{
_chunk = _re.exec( this ) ;
if ( _chunk == null ) break ;
_pos.push( _chunk['index'] ) ;
_re.lastIndex = _chunk['index']+1 ;
}
return _pos ;
}
Now try
var _sentence = "What do doers want ? What do doers need ?" ;
var _word = "do" ;
console.log( _sentence.matching_positions( _word, 1, 0, 0 ) );
console.log( _sentence.matching_positions( _word, 1, 1, 0 ) );
You can also input regular expressions:
var _second = "z^2+2z-1" ;
console.log( _second.matching_positions( "[0-9]\z+", 0, 0, 0 ) );
Here one gets the position index of linear term.
Comments
var str = "The rain in SPAIN stays mainly in the plain";
function searchIndex(str, searchValue, isCaseSensitive) {
var modifiers = isCaseSensitive ? 'gi' : 'g';
var regExpValue = new RegExp(searchValue, modifiers);
var matches = [];
var startIndex = 0;
var arr = str.match(regExpValue);
[].forEach.call(arr, function(element) {
startIndex = str.indexOf(element, startIndex);
matches.push(startIndex++);
});
return matches;
}
console.log(searchIndex(str, 'ain', true));
2 Comments
str.indexOf here just finds the next occurrence of the text captured by the match, which is not necessarily the match. JS regex supports conditions on text outside of the capture with lookahead. For instance searchIndex("foobarfoobaz", "foo(?=baz)", true) should give [6], not [0].function trimRegex(str, regex){
return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}
let test = '||ab||cd||';
trimRegex(test, /[^|]/);
console.log(test); //output: ab||cd
or
function trimChar(str, trim, req){
let regex = new RegExp('[^'+trim+']');
return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}
let test = '||ab||cd||';
trimChar(test, '|');
console.log(test); //output: ab||cd
Comments
Use Regex d flag and indices property
let str = 'ab1c de fgh23 ij klmn456';
for (let match of str.matchAll (/[a-z]+(\d+)/dg))
console.log (JSON.stringify (match),
JSON.stringify (match.indices));
/* Output (formatted for better readability):
'["ab1","1"]' "[[0,3],[2,3]]"
'["fgh23","23"]' "[[8,13],[11,13]]"
'["klmn456","456"]' "[[17,24],[21,24]]"
*/
Note: Tried to insert the code as a "Stack Snipet", it's runninng in the Editor but not here, in the answer. Removing the "Snippet" from the answer is also a problem, i had to discard the answer and paste the answer again.
Comments
var str = 'my string here';
var index = str.match(/hre/).index;
alert(index); // <- 10