JavaScript Replace Text with HTML Between it

Question

I want to replace some text in a webpage, only the text, but when I replace via the document.body.innerHTML I could get stuck, like so:

HTML:

<p>test test </p>
<p>test2 test2</p>
<p>test3 test3</p>

Js:

var param = "test test test2 test2 test3";
var text = document.body.innerHTML;
document.body.innerHTML = text.replace(param, '*' + param + '*');

I would like to get:

*test test
test2 test2
test3* test3

HTML of 'desired' outcome:

<p>*test test </p>
<p>test2 test2</p>
<p>test3* test3</p>

So If I want to do that with the parameter above ("test test test2 test2 test3") the  would not be taken into account - resulting into the else section.

How can I replace the text with no "consideration" to the html markup that could be between it?

Thanks in advance.

Edit (for @Sonesh Dabhi):

Basically I need to replace text in a webpage, but when I scan the webpage with the html in it the replace won't work, I need to scan and replace based on text only

Edit 2:
'Raw' JavaScript Please (no jQuery)

@SoneshDabhi, did you understand my problem? (Basically I need to replace text in a webpage, but when I scan the webpage with the html in it the replace won't work, I need to scan and replace based on text only) — funerr
– funerr, Commented Aug 22, 2012 at 21:17
Do you want to keep the s? Your current output would just require you to use textContent instead of innerHTML. — pimvdb
– pimvdb, Commented Aug 22, 2012 at 21:19
@pimvdb, if your'e asking that I need to change the structure of the page while replacing - then no, I don't want to change the html markup of the page. — funerr
– funerr, Commented Aug 22, 2012 at 21:20
What I mean is: do you want *test ... test3* test3 as output or *test ... test3* test3? — pimvdb
– pimvdb, Commented Aug 22, 2012 at 21:21
@pimvdb The output should not show the html markup (then I guess, yes - no s) - but I don't want to remove it from the "inside" (innerHTML should stay intact) — funerr
– funerr, Commented Aug 22, 2012 at 21:24

Chris Carew · Accepted Answer · 2012-08-24 06:12:00Z

1

This will do what you want, it builds a regex expression to find the text between tags and replace in there. Give it a shot.

http://jsfiddle.net/WZYG9/5/

The magic is

(\s*(?:<\/?\w+>)*\s*)*

Which, in the code below has double backslashes to escape them within the string. The regex itself looks for any number of white space characters (\s). The inner group (?:</?\w+>)* matches any number of start or end tags. ?: tells java script to not count the group in the replacement string, and not remember the matches it finds. < is a literal less than character. The forward slash (which begins an end html tag) needs to be escaped, and the question mark means 0 or 1 occurrence. This is proceeded by any number of white space characters.

Every space within the "text to search" get replaced with this regular expression, allowing it to match any amount of white space and tags between the words in the text, and remember them in the numbered variables $1, $2, etc. The replacement string gets built to put those remembered variables back in.

Which matches any number of tags and whitespace between them.

function wrapTextIn(text, character) {
            if (!character) character = "*"; // default to asterik
            // trim the text
            text = text.replace(/(^\s+)|(\s+$)/g, "");
            //split into words
            var words = text.split(" ");
            // return if there are no words
            if (words.length == 0)
                return;
                // build the regex
            var regex = new RegExp(text.replace(/\s+/g, "(\\s*(?:<\\/?\\w+>)*\\s*)*"), "g");
            //start with wrapping character
            var replace = character;
            //for each word, put it and the matching "tags" in the replacement string
            for (var i = 0; i < words.length; i++) {
                replace += words[i];
                if (i != words.length - 1 & words.length > 1)
                    replace += "$" + (i + 1);
            }
            // end with the wrapping character
            replace += character;
            // replace the html
            document.body.innerHTML = document.body.innerHTML.replace(regex, replace);
        }

edited Aug 24, 2012 at 6:12

answered Aug 24, 2012 at 0:21

Chris Carew

1,41811 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Chris Carew Over a year ago

A slight modification allows this to add starting and ending tags, if you wanted to style things across the elements, for instance. http://jsfiddle.net/WZYG9/6/

funerr Over a year ago

Thank you, it works! Can you explain the regex part a bit more (I am kinda new to regex in general). And how can the wrapping tags be able to wrap around the whole nodes? (e.g. bgcolor would be "un cut"?)

Chris Carew Over a year ago

Editing the original post to detail the regex.

Chris Carew Over a year ago

To answer your other question - there isn't a very nice way to wrap the whole nodes with this method. Again, unless you know specifically how your html is formatted, you risk not climbing high enough in the parent tree. I think that any attempt to wrap the "contained" elements would result in heartache, particularly since at that point, you wouldn't have picked specific words out of the paragraphs, the paragraphs would all have a solid back ground color (not just the selected words). In short, no, I don't think so, sorry.

funerr Over a year ago

Thanks anyways, I think I found something to encounter that with out ruining the DOM tree.

Ashirvad · Accepted Answer · 2012-08-22 21:42:06Z

0

WORKING DEMO

USE THAT FUNCTION TO GET TEXT.. no jquery required

edited Aug 22, 2012 at 21:42

answered Aug 22, 2012 at 21:24

Ashirvad

2,3771 gold badge16 silver badges20 bronze badges

4 Comments

funerr Over a year ago

Sorry, but I need the solution with 'raw' JavaScript - no jQuery.

Bergi Over a year ago

…but you still use Mootools :-)

funerr Over a year ago

jsbin.com/ijekex/1/edit - I think that you tried to get only text, but that isn't my problem (at least I don't think it is) I need to replace the "real" innerHTML with out ruining the markup and replacing elements that could span over several nodes (like a paragraph).

Bergi Over a year ago

No idea what you have done, but calling your getText function on the string 'body' ends up in exceeding max recursion depth…

Waqar Alamgir · Accepted Answer · 2012-08-22 21:51:29Z

0

First remove tags. i.e You can try document.body.textContent / document.body.innerText or use this example var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Find and replace (for all to be replace add 1 more thing "/g" after search)

String.prototype.trim=function(){return this.replace(/^\s\s*/, '').replace(/\s\s*$/, '');};

var param = "test test test2 test2 test3";

var text = (document.body.textContent || document.body.innerText).trim();

var replaced = text.search(param) >= 0;

if(replaced) {

  var re = new RegExp(param, 'g');

  document.body.innerHTML = text.replace(re , '*' + param + '*');

} else {

//param was not replaced

//What to do here?

}

See here Note: Using striping you will lose the tags.

edited Aug 22, 2012 at 21:51

answered Aug 22, 2012 at 21:25

Waqar Alamgir

9,9884 gold badges33 silver badges37 bronze badges

3 Comments

funerr Over a year ago

The problem is near here: "replace(param/g" it should be "replace(/param/g" and it doesn't really work. it outputs the html into the page...

pimvdb Over a year ago

It should be new RegExp(param, "g"), but still, it will discard the HTML.

funerr Over a year ago

No, that messed-up the HTML markup (structure) I need the html structure intact, that is the whole point, thanks anyways for the try.

Collectives™ on Stack Overflow

JavaScript Replace Text with HTML Between it

3 Answers 3

5 Comments

4 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related