Manipulate the content of HTML strings without changing the HTML

Question

If I have a string of HTML, maybe like this...

<h2>Header</h2><p>all the <span class="bright">content</span> here</p>

And I want to manipulate the string so that all words are reversed for example...

<h2>redaeH</h2><p>lla eht <span class="bright">tnetnoc</span> ereh</p>

I know how to extract the string from the HTML and manipulate it by passing to a function and getting a modified result, but how would I do so whilst retaining the HTML?

I would prefer a non-language specific solution, but it would be useful to know php/javascript if it must be language specific.

Edit

I also want to be able to manipulate text that spans several DOM elements...

Quick<em>Draw</em>McGraw

warGcM<em>warD</em>kciuQ

Another Edit

Currently, I am thinking to somehow replace all HTML nodes with a unique token, whilst storing the originals in an array, then doing a manipulation which ignores the token, and then replacing the tokens with the values from the array.

This approach seems overly complicated, and I am not sure how to replace all the HTML without using REGEX which I have learned you can go to the stack overflow prison island for.

Yet Another Edit

I want to clarify an issue here. I want the text manipulation to happen over x number of DOM elements - so for example, if my formula randomly moves letters in the middle of a word, leaving the start and end the same, I want to be able to do this...

<em>going</em><i>home</i>

Converts to

<em>goonh</em><i>gmie</i>

So the HTML elements remain untouched, but the string content inside is manipulated (as a whole - so goinghome is passed to the manipulation formula in this example) in any way chosen by the manipulation formula.

If you want to do it after the page has loaded, you are left with nothing but Javascript. If you are reading the page into a language like PHP, then you can pretty much do anyting you like. Change it via regex (Uhhhhggg) or use the DOM to find and replace what you need. — Fluffeh
– Fluffeh, Commented Aug 9, 2012 at 13:46
@Fluffeh Please don't recommend using regex to parse HTML. Ever. Again. I know you know better than that. — Matt
– Matt, Commented Aug 9, 2012 at 13:46

Fabrizio Calderan · Accepted Answer · 2012-08-09 13:52:42Z

1

If you want to achieve a similar visual effect without changing the text you could cheat with css, with

h2, p {
  direction: rtl;
  unicode-bidi: bidi-override;
}

this will reverse the text

example fiddle: http://jsfiddle.net/pn6Ga/

edited Aug 9, 2012 at 13:52

answered Aug 9, 2012 at 13:46

Fabrizio Calderan

124k26 gold badges172 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Quentin Over a year ago

Won't that will reverse word order as well as character order within each word?

Billy Moon Over a year ago

I did not actually want to reverse the words, I want to do other string manipulations. It is a problem that keeps coming up for me in different circumstances, so I wanted a general solution, but thanks for the tip.

WatsMyName · Accepted Answer · 2012-08-09 14:07:45Z

1

Hi I came to this situation long time ago and i used the following code. Here is a rough code

<?php
function keepcase($word, $replace) {
   $replace[0] = (ctype_upper($word[0]) ? strtoupper($replace[0]) : $replace[0]);
   return $replace;
}

// regex - match the contents grouping into HTMLTAG and non-HTMLTAG chunks
$re = '%(</?\w++[^<>]*+>)                 # grab HTML open or close TAG into group 1
|                                         # or...
([^<]*+(?:(?!</?\w++[^<>]*+>)<[^<]*+)*+)  # grab non-HTMLTAG text into group 2
%x';

$contents = '<h2>Header</h2><p>the <span class="bright">content</span> here</p>';

// walk through the content, chunk, by chunk, replacing words in non-NTMLTAG chunks only
$contents = preg_replace_callback($re, 'callback_func', $contents);

function callback_func($matches) { // here's the callback function
    if ($matches[1]) {             // Case 1: this is a HTMLTAG
        return $matches[1];        // return HTMLTAG unmodified
    }
    elseif (isset($matches[2])) {  // Case 2: a non-HTMLTAG chunk.
                                   // declare these here
                                   // or use as global vars?
        return preg_replace('/\b' . $matches[2] . '\b/ei', "keepcase('\\0', '".strrev($matches[2])."')",
            $matches[2]);
    }
    exit("Error!");                // never get here
}
echo ($contents);
?>

answered Aug 9, 2012 at 14:07

WatsMyName

4,5025 gold badges49 silver badges83 bronze badges

2 Comments

Billy Moon Over a year ago

This looks useful, but still does not appear to address the issue of a text element that needs to be manipulated as a whole, but contains some HTML within it. Please see latest update.

Quentin Over a year ago

It breaks if, for instance, an attribute value contains a >. Don't try parsing HTML with regex.

Quentin · Accepted Answer · 2012-08-09 13:46:45Z

0

Parse the HTML with something that will give you a DOM API to it.

Write a function that loops over the child nodes of an element.

If a node is a text node, get the data as a string, split it on words, reverse each one, then assign it back.

If a node is an element, recurse into your function.

answered Aug 9, 2012 at 13:46

Quentin

949k136 gold badges1.3k silver badges1.4k bronze badges

2 Comments

Quentin Over a year ago

@BillyMoon — Then call it for each one.

Billy Moon Over a year ago

I think I am getting close to what I am looking for - please see my answer, comments and improvements welcome. Thanks

Alex · Accepted Answer · 2012-08-09 13:55:09Z

0

could use jquery?

$('div *').each(function(){
    text = $(this).text();
    text = text.split('');
    text = text.reverse();
    text = text.join('');
    $(this).text(text);
});

See here - http://jsfiddle.net/GCAvb/

answered Aug 9, 2012 at 13:55

Alex

9,0612 gold badges29 silver badges44 bronze badges

3 Comments

Billy Moon Over a year ago

Can this work when the text to be manipulated spans several DOM elements?

Quentin Over a year ago

The output is <div><h2>redaeH</h2><p>ereh tnetnoc eht</p></div> — you've destroyed all the elements that aren't children of the div.

Billy Moon Over a year ago

@AlexThomas I think I am getting close to what I am looking for - please see my answer, comments and improvements welcome. Thanks

Billy Moon · Accepted Answer · 2012-08-10 12:15:06Z

I implemented a version that seems to work quite well - although I still use (rather general and shoddy) regex to extract the html tags from the text. Here it is now in commented javascript:

Method

/**
* Manipulate text inside HTML according to passed function
* @param html the html string to manipulate
* @param manipulator the funciton to manipulate with (will be passed single word)
* @returns manipulated string including unmodified HTML
*
* Currently limited in that manipulator operates on words determined by regex
* word boundaries, and must return same length manipulated word
*
*/

var manipulate = function(html, manipulator) {

  var block, tag, words, i,
    final = '', // used to prepare return value
    tags = [], // used to store tags as they are stripped from the html string
    x = 0; // used to track the number of characters the html string is reduced by during stripping

  // remove tags from html string, and use callback to store them with their index
  // then split by word boundaries to get plain words from original html
  words = html.replace(/<.+?>/g, function(match, index) {
    tags.unshift({
      match: match,
      index: index - x
    });
    x += match.length;
    return '';
  }).split(/\b/);

  // loop through each word and build the final string
  // appending the word, or manipulated word if not a boundary
  for (i = 0; i < words.length; i++) {
    final += i % 2 ? words[i] : manipulator(words[i]);
  }

  // loop through each stored tag, and insert into final string
  for (i = 0; i < tags.length; i++) {
    final = final.slice(0, tags[i].index) + tags[i].match + final.slice(tags[i].index);
  }

  // ready to go!
  return final;

};

The function defined above accepts a string of HTML, and a manipulation function to act on words within the string regardless of if they are split by HTML elements or not.

It works by first removing all HTML tags, and storing the tag along with the index it was taken from, then manipulating the text, then adding the tags into their original position in reverse order.

Test

/**
 * Test our function with various input
 */

var reverse, rutherford, shuffle, text, titleCase;

// set our test html string
text = "<h2>Header</h2><p>all the <span class=\"bright\">content</span> here</p>\nQuick<em>Draw</em>McGraw\n<em>going</em><i>home</i>";

// function used to reverse words
reverse = function(s) {
  return s.split('').reverse().join('');
};

// function used by rutherford to return a shuffled array
shuffle = function(a) {
  return a.sort(function() {
    return Math.round(Math.random()) - 0.5;
  });
};

// function used to shuffle the middle of words, leaving each end undisturbed
rutherford = function(inc) {
  var m = inc.match(/^(.?)(.*?)(.)$/);
  return m[1] + shuffle(m[2].split('')).join('') + m[3];
};

// function to make word Title Cased
titleCase = function(s) {
  return s.replace(/./, function(w) {
    return w.toUpperCase();
  });
};

console.log(manipulate(text, reverse));
console.log(manipulate(text, rutherford));
console.log(manipulate(text, titleCase));

There are still a few quirks, like the heading and paragraph text not being recognized as separate words (because they are in separate block level tags rather than inline tags) but this is basically a proof of method of what I was trying to do.

I would also like it to be able to handle the string manipulation formula actually adding and removing text, rather than replacing/moving it (so variable string length after manipulation) but that opens up a whole new can of works I am not yet ready for.

Now I have added some comments to the code, and put it up as a gist in javascript, I hope that someone will improve it - especially if someone could remove the regex part and replace with something better!

Gist: https://gist.github.com/3309906

Demo: http://jsfiddle.net/gh/gist/underscore/1/3309906/

(outputs to console)

And now finally using an HTML parser

(http://ejohn.org/files/htmlparser.js)

Demo: http://jsfiddle.net/EDJyU/

Ricardo Araque · Accepted Answer · 2021-04-29 01:16:18Z

You can use a setInterval to change it every ** time for example:

 
const TITTLE = document.getElementById("Tittle") //Let's get the div
   
 setInterval(()=> { 
      let TITTLE2 = document.getElementById("rotate") //we get the element at the moment of execution
      let spanTittle = document.createElement("span"); // we create the new element "span"

      spanTittle.setAttribute("id","rotate");  // attribute to new element
      (TITTLE2.textContent == "TEXT1")       // We compare wich string is in the div
      ? spanTittle.appendChild(document.createTextNode(`TEXT2`)) 
      : spanTittle.appendChild(document.createTextNode(`TEXT1`))

      TITTLE.replaceChild(spanTittle,TITTLE2)   //finally, replace the old span for a new
    },2000)

<html>
<head></head>
<body>  
   <div id="Tittle">TEST YOUR <span id="rotate">TEXT1</span></div>
</body>
</html>

Collectives™ on Stack Overflow

Manipulate the content of HTML strings without changing the HTML

Edit

Another Edit

Yet Another Edit

6 Answers 6

2 Comments

2 Comments

2 Comments

3 Comments

Method

Test

Gist: https://gist.github.com/3309906

Demo: http://jsfiddle.net/gh/gist/underscore/1/3309906/

And now finally using an HTML parser

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Edit

Another Edit

Yet Another Edit

6 Answers 6

2 Comments

2 Comments

2 Comments

3 Comments

Method

Test

Gist: https://gist.github.com/3309906

Demo: http://jsfiddle.net/gh/gist/underscore/1/3309906/

And now finally using an HTML parser

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related