5

I'm trying to figure out how to do a replace with Javascript. I'm looking at the entire body of the page and would like to replace the keyword matches NOT within an HTML tag.

Here is an example:

<body>
  <span id="keyword">blah</span>
  <div>
    blah blah keyword blah<br />
    whatever keyword whatever
  </div>
</body>

<script type="text/javascript">
var replace_terms = {
  'keyword':{'url':'http://en.wikipedia.org/','target':'_blank'}
}

jQuery.each(replace_terms, function(i, val) {
  var re = new RegExp(i, "gi");
  $('body').html(
    $('body').html().replace(re, '<a href="'+ val['url'] +'" target="'+val['target']+'">' + i + '</a>')
  );
});

</script>

I'm looking to replace all instances of the "keyword" that isn't within an HTML tag (between < and >).

I guess I also need to ignore if "keyword" is within a script or style element.

4
  • 2
    Isn't the entire page by definition inside an HTML tag? Commented Sep 18, 2009 at 12:59
  • Yes. The HTML I had in my example didn't come through. I basically mean I don't want to replace any attributes of a tag. Commented Sep 18, 2009 at 13:01
  • 1
    I'm thinking he means within the brackets (like an attribute name/value). Commented Sep 18, 2009 at 13:01
  • 1
    In a tag is between < and >. To be between <> and </> would be in an element :) Commented Sep 18, 2009 at 13:11

1 Answer 1

13

Don't use regex to parse HTML. [X][HT]ML is not a regular language and cannot reliably be processed using regex. Your browser has a good HTML parser built-in; let that take the strain of working out where the tags are.

Also you don't really want to work on html()/innerHTML on body. This will serialise and re-parse the entire page, which will be slow and will lose any information that cannot be serialised in HTML, such as event handlers, form values and other JavaScript references.

Here's a method using DOM that seems to work for me:

function replaceInElement(element, find, replace) {
    // iterate over child nodes in reverse, as replacement may increase
    // length of child node list.
    for (var i= element.childNodes.length; i-->0;) {
        var child= element.childNodes[i];
        if (child.nodeType==1) { // ELEMENT_NODE
            var tag= child.nodeName.toLowerCase();
            if (tag!='style' && tag!='script') // special case, don't touch CDATA elements
                replaceInElement(child, find, replace);
        } else if (child.nodeType==3) { // TEXT_NODE
            replaceInText(child, find, replace);
        }
    }
}
function replaceInText(text, find, replace) {
    var match;
    var matches= [];
    while (match= find.exec(text.data))
        matches.push(match);
    for (var i= matches.length; i-->0;) {
        match= matches[i];
        text.splitText(match.index);
        text.nextSibling.splitText(match[0].length);
        text.parentNode.replaceChild(replace(match), text.nextSibling);
    }
}

// keywords to match. This *must* be a 'g'lobal regexp or it'll fail bad
var find= /\b(keyword|whatever)\b/gi;

// replace matched strings with wiki links
replaceInElement(document.body, find, function(match) {
    var link= document.createElement('a');
    link.href= 'http://en.wikipedia.org/wiki/'+match[0];
    link.appendChild(document.createTextNode(match[0]));
    return link;
});
Sign up to request clarification or add additional context in comments.

7 Comments

i-->0 Clever. I've never seen that before.
I can't claim credit for that, it's an idiom for reverse-iteration in C-like languages! :-)
I usually use just i--, as in: for (var i=100; i--; )
Yep, that'll work too for lower bound 0. The explicit >0 is also a defensive measure for cases where i might be able to start off negative (which would loop endlessly).
What I liked about i-->0 is that I first read it as i→0, or "i approaches zero."
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.