0

I want to get numbers in the inner text of an html by javascript regex to replace them.
for example in the below code I want to get 1,2,3,4,5,6,1,2,3,1,2,3, but not the 444 inside of the div tag.

<body>
  aaaa123aaa456
  <div style="background: #444">aaaa123aaaa</div>
  aaaa123aaa
</body>

What could be the regular expression?

2

3 Answers 3

4

Your best bet is to use innerText or textContent to get at the text without the tags and then just use the regex /\d/g to get the numbers.

function digitsInText(rootDomNode) {
  var text = rootDomNode.textContent || rootDomNode.innerText;
  return text.match(/\d/g) || [];
}

For example,

alert(digitsInText(document.body));

If your HTML is not in the DOM, you can try to strip the tags yourself : JavaScript: How to strip HTML tags from string?


Since you need to do a replacement, I would still try to walk the DOM and operate on text nodes individually, but if that is out of the question, try

var HTML_TOKEN = /(?:[^<\d]|<(?!\/?[a-z]|!--))+|<!--[\s\S]*?-->|<\/?[a-z](?:[^">']|"[^"]*"|'[^']*')*>|(\d+)/gi;

function incrementAllNumbersInHtmlTextNodes(html) {
  return html.replace(HTML_TOKEN, function (all, digits) {
    if ("string" === typeof digits) {
      return "" + (+digits + 1);
    }
    return all; 
  });
}

then

incrementAllNumbersInHtmlTextNodes(
    '<b>123</b>Hello, World!<p>I <3 Ponies</p><div id=123>245</div>')

produces

    '<b>124</b>Hello, World!<p>I <4 Ponies</p><div id=123>246</div>'

It will get confused around where special elements like <script> end and won't recognize digits that are entity encoded, but should work otherwise.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer, but i'm still wondering to find a regular expression which can find inner text of elements in a string.
Mike Samuel I need to replace the numbers, in your way i should use a function and call it over and over. but with regular expression which can parse all the html as a string I can replace numbers at once.
0

You don't necessarily need RegExp to get the text contents of an element excluding its descendant elements' — in fact I'd advise against it as RegExp matching for HTML is notoriously difficult — there are DOM solutions:

function getImmediateText(element){
    var text = '';

    // Text and elements are all DOM nodes. We can grab the lot of immediate descendants and cycle through them.
    for(var i = 0, l = element.childNodes.length, node; i < l, node = element.childNodes[i]; ++i){
    // nodeType 3 is text
        if(node.nodeType === 3){
            text += node.nodeValue;
        }
    }

    return text;
}

var bodyText = getImmediateText(document.getElementsByTagName('body')[0]);

So here there's a function that will return only the immediate text content as a string. Of course, you could then strip that for numbers with the RegExp using something like this:

var numberString = bodyText.match(/\d+/g).join('');

Comments

0

Just to answer my old question:
It is possible to achieve it by lookahead.

/\d(?=[^<>]*(<|$))/g

to replace the numbers

    html.replace(/\d(?=[^<>]*(<|$))/g, function($0) {
        return map[$0]
    });

the source of the answer https://www.drupal.org/node/619198#comment-5710052

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.