1

I want to replace some text in a webpage, only the text, but when I replace via the document.body.innerHTML I could get stuck, like so:

HTML:

<p>test test </p>
<p>test2 test2</p>
<p>test3 test3</p>

Js:

var param = "test test test2 test2 test3";
var text = document.body.innerHTML;
document.body.innerHTML = text.replace(param, '*' + param + '*');

I would like to get:

*test test
test2 test2
test3* test3

HTML of 'desired' outcome:

<p>*test test </p>
<p>test2 test2</p>
<p>test3* test3</p>

So If I want to do that with the parameter above ("test test test2 test2 test3") the <p></p> would not be taken into account - resulting into the else section.

How can I replace the text with no "consideration" to the html markup that could be between it?

Thanks in advance.

Edit (for @Sonesh Dabhi):

Basically I need to replace text in a webpage, but when I scan the webpage with the html in it the replace won't work, I need to scan and replace based on text only

Edit 2:
'Raw' JavaScript Please (no jQuery)

11
  • @SoneshDabhi, did you understand my problem? (Basically I need to replace text in a webpage, but when I scan the webpage with the html in it the replace won't work, I need to scan and replace based on text only) Commented Aug 22, 2012 at 21:17
  • 2
    Do you want to keep the <p>s? Your current output would just require you to use textContent instead of innerHTML. Commented Aug 22, 2012 at 21:19
  • @pimvdb, if your'e asking that I need to change the structure of the page while replacing - then no, I don't want to change the html markup of the page. Commented Aug 22, 2012 at 21:20
  • What I mean is: do you want *test ... test3* test3 as output or <p>*test ... </p><p>test3* test3</p>? Commented Aug 22, 2012 at 21:21
  • @pimvdb The output should not show the html markup (then I guess, yes - no <p>s) - but I don't want to remove it from the "inside" (innerHTML should stay intact) Commented Aug 22, 2012 at 21:24

3 Answers 3

1

This will do what you want, it builds a regex expression to find the text between tags and replace in there. Give it a shot.

http://jsfiddle.net/WZYG9/5/

The magic is

(\s*(?:<\/?\w+>)*\s*)*

Which, in the code below has double backslashes to escape them within the string. The regex itself looks for any number of white space characters (\s). The inner group (?:</?\w+>)* matches any number of start or end tags. ?: tells java script to not count the group in the replacement string, and not remember the matches it finds. < is a literal less than character. The forward slash (which begins an end html tag) needs to be escaped, and the question mark means 0 or 1 occurrence. This is proceeded by any number of white space characters.

Every space within the "text to search" get replaced with this regular expression, allowing it to match any amount of white space and tags between the words in the text, and remember them in the numbered variables $1, $2, etc. The replacement string gets built to put those remembered variables back in.

Which matches any number of tags and whitespace between them.

function wrapTextIn(text, character) {
            if (!character) character = "*"; // default to asterik
            // trim the text
            text = text.replace(/(^\s+)|(\s+$)/g, "");
            //split into words
            var words = text.split(" ");
            // return if there are no words
            if (words.length == 0)
                return;
                // build the regex
            var regex = new RegExp(text.replace(/\s+/g, "(\\s*(?:<\\/?\\w+>)*\\s*)*"), "g");
            //start with wrapping character
            var replace = character;
            //for each word, put it and the matching "tags" in the replacement string
            for (var i = 0; i < words.length; i++) {
                replace += words[i];
                if (i != words.length - 1 & words.length > 1)
                    replace += "$" + (i + 1);
            }
            // end with the wrapping character
            replace += character;
            // replace the html
            document.body.innerHTML = document.body.innerHTML.replace(regex, replace);
        }
Sign up to request clarification or add additional context in comments.

5 Comments

A slight modification allows this to add starting and ending tags, if you wanted to style things across the elements, for instance. http://jsfiddle.net/WZYG9/6/
Thank you, it works! Can you explain the regex part a bit more (I am kinda new to regex in general). And how can the wrapping tags be able to wrap around the whole nodes? (e.g. bgcolor would be "un cut"?)
Editing the original post to detail the regex.
To answer your other question - there isn't a very nice way to wrap the whole nodes with this method. Again, unless you know specifically how your html is formatted, you risk not climbing high enough in the parent tree. I think that any attempt to wrap the "contained" elements would result in heartache, particularly since at that point, you wouldn't have picked specific words out of the paragraphs, the paragraphs would all have a solid back ground color (not just the selected words). In short, no, I don't think so, sorry.
Thanks anyways, I think I found something to encounter that with out ruining the DOM tree.
0

WORKING DEMO

USE THAT FUNCTION TO GET TEXT.. no jquery required

4 Comments

Sorry, but I need the solution with 'raw' JavaScript - no jQuery.
…but you still use Mootools :-)
jsbin.com/ijekex/1/edit - I think that you tried to get only text, but that isn't my problem (at least I don't think it is) I need to replace the "real" innerHTML with out ruining the markup and replacing elements that could span over several nodes (like a paragraph).
No idea what you have done, but calling your getText function on the string 'body' ends up in exceeding max recursion depth…
0
  1. First remove tags. i.e You can try document.body.textContent / document.body.innerText or use this example var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
  2. Find and replace (for all to be replace add 1 more thing "/g" after search)

String.prototype.trim=function(){return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');};

var param = "test test test2 test2 test3";

var text = (document.body.textContent || document.body.innerText).trim();

var replaced = text.search(param) >= 0;

if(replaced) {

  var re = new RegExp(param, 'g');

  document.body.innerHTML = text.replace(re , '*' + param + '*');

} else {

//param was not replaced

//What to do here?

}

See here Note: Using striping you will lose the tags.

3 Comments

The problem is near here: "replace(param/g" it should be "replace(/param/g" and it doesn't really work. it outputs the html into the page...
It should be new RegExp(param, "g"), but still, it will discard the HTML.
No, that messed-up the HTML markup (structure) I need the html structure intact, that is the whole point, thanks anyways for the try.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.