0

I have a javascript variable that contains the contents of a HTML page. I would like to remove a inline <style type="text/css"> ... </style> from this. I asked before and it was suggested that I add this to the DOM.

Is there a simpler way that I could remove this using a regular expression. I need to match <style> as a start and </style> as a finish. I heard about regex but not even sure if this can be used with javascript.

4
  • 1
    javascript has its own regex for sure, but why don't you make a or multiple CSS class(also make them more reusable) contains everything in the <style></style>, therefore you can remove them easily by jQuery removeClass() function Commented Jun 19, 2014 at 15:12
  • If all else fails, you can always use substring to remove it Commented Jun 19, 2014 at 15:12
  • Could there be more than 1 style declaration in your value? Commented Jun 19, 2014 at 15:12
  • 1
    I'll just leave this here: link Commented Jun 19, 2014 at 15:24

4 Answers 4

2

Ingmars has the right idea, except it's missing an important question mark, some additional HTML/XML possibilities (such as whitespace allowed after the tag name in both cases, and attributes in the first case), and also replacing it with a message (I'm assuming that you just wanted to delete it completely).

This will work except if attributes contain ">" which is a calculated risk. The code is written given that htmlString is the actual variable that you have containing the HTML document.

htmlString = htmlString.replace(/<style\b[^<>]*>[\s\S]*?<\/style\s*>/gi, '');
Sign up to request clarification or add additional context in comments.

7 Comments

Your first * still looks a bit too greedy. And it'll match <styleasdfg....
It's OK for the first [^<>] to be greedy, because there is no chance for it to get beyond the end of the tag since both > and < are not allowed (the second one is also illegal). As far as matching your example, there is no tag name beginning with the substring style other than style, so we are safe in isolating the matching of style tags. You are right that no validation is being done of the HTML in the document, but it is well known that such a task is impossible in regular expressions as they are.
This: "such a task is impossible in regular expressions". See the comment I left on the question?
@JosephMyers - I just rechecked and there is <style type="text/css">. I missed out the type="text/css" by accident. Will your version also check for this?
@SamanthaJ Yes, my version will also check for this (as well as any other attributes there might be like media.
|
1

If it's just one set of <style> tags, then a Javascript Reg Exp would work just fine:

var re = /(<style\b[^>]*>)[^<>]*(<\/style>)/i; // To remove ALL style tags, change the i at the end to gi.
var html = "!<DOCTYPE html>..."; // Your HTML string;

html = html.replace(re, "");

This solution isn't practical where you want to target specific <style> tags though (i.e. You can only remove the first match, or all matches).

5 Comments

can you explain you mention. You can only remove the first match or all matches. In your example would it remove the first or all ?
What about something like <style>.foo > .bar { color: red; }</style>? See the > there?
Sure. Regular Expressions will return at the first expression that they match, unless you specify a g (or gi) at the end of the statement. If a g is specified, it will continue even after the first match and find everything in the string that matches.
@jupenur Well spotted. Didn't consider that character.
@Matt - I just rechecked and there is <style type="text/css">. I missed out the type="text/css" by accident. Will your version also check for this?
1

Simple regex which will wipe it with no regrets:

var a = 'aaaa <style type="text/css" favouriteAnimal="horse">style</StYlE> bbbbb <styLE>another style</STyle> cccc';
var b = a.replace( /<style[\s\S]*?>[\s\S]*?<\/style>/gi, '' );
console.log( b );

EDIT: updating my answer to handle current question specifics.

3 Comments

Your regexp needs to be lazy [\s\S]*? or you are going to gobble up everything from the first stylesheet on the page until the end of the last one. One some web pages this will devour the entire web page as well, because they have stylesheets at the top and at the bottom.
@JosephMyers: good catch, I've updated my code, and learned a bit myself. Thanks!
Thanks. In fact, my code isn't perfect, either. @jupenur has a good point at his link, that there are always failure cases when trying to do anything with HTML without actually parsing it, and parsing it is impossible with regular expressions.
0

Following the advice of bobince (as recommended by jupenur), use an XML parser. Then you can find all <style> tags, remove them, and retrieve the HTML. It'll work every time. Here's an example:

var im = document.implementation;
var doc = 'createHTMLDocument' in im ?
    im.createHTMLDocument('') : new ActiveXObject("htmlfile");
if(!doc.body)
    doc.write('<body></body>');
doc.body.innerHTML = '<p><style type="text/css"></style></p><p>Hii</p>';
var temp=doc.getElementsByTagName('style');
while(temp.length)
    temp[0].parentNode.removeChild(temp[0]);
console.log(doc.body.innerHTML); // '<p></p><p>Hii</p>'

If you don't do that, you could unintentionally remove stuff from other tags, like in comments or very necessary text from script tags (ie. $('body').append('<style>p { color: blue; }</style>');).

May the <center> tag hold.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.