0

i have the next code:

 textResponse = textResponse.replace(/<head>(.|\n)*?<\/head\>/img, '');
 alert("Ups, Error " + jqxhr.status + ", " + textResponse);    

it is used to display an error on an ajax req, the text response contains the html of the response page error, im striping that page of unnecessary content, so i try to remove the <head> from the flowing string text:

<!DOCTYPE html>
<html>
    <head>
        <title>No hay usuario logeado</title>
        <meta name="viewport" content="width=device-width" />
        <style>
         body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;} 
         p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
         b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
         H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
         H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
         pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
         .marker {font-weight: bold; color: black;text-decoration: none;}
         .version {color: gray;}
         .error {margin-bottom: 10px;}
         .expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
         @media screen and (max-width: 639px) {
          pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
         }
         @media screen and (max-width: 479px) {
          pre { width: 280px; }
         }
        </style>
    </head>

    <body bgcolor="white">

            <span><H1>Error de servidor en la aplicación '/HMSW'.<hr width=100% size=1 color=silver></H1>

...

but the string continues exactly the same, nothing is removed.

any idea why?

2
  • Well, one possibility is best expressed here: stackoverflow.com/a/1732454/1243641 Commented Aug 20, 2015 at 15:15
  • I've checked document.documentElement.innerHTML.replace(/<head>(.|\n)*?<\/head\>/img, ''); and as far as I can see it works as expected. Probably the issue is not is regex, but in textResponse? Commented Aug 20, 2015 at 15:27

2 Answers 2

1
  • To include newline characters use [\s\S] ("whitespace + nonwhitespace") and don't use multiline processing as it doesn't process the input text as a whole but line by line. The global flag is superfluous since there could be only one <head>.

    textResponse = textResponse.replace(/<head>[\s\S]*?<\/head>/i, '');
    
  • A better method would be to parse the response into a DOM tree and remove the head node.

    The advantage is that the parser will handle correctly a possibly commented duplicate <head> or </head> (e.g. <html><head>......<!-- </head> --!>.....</head>).

    An example using DOMParser which works on modern browsers:

    var doc = new DOMParser().parseFromString(textResponse, "text/html");
    doc.head.remove(); // Note: .head node is always present even if empty
    

    Then the contents can be imported with document.importNode:

    var container = document.querySelector(".container");
    container.appendChild(document.importNode(doc.querySelector(".something"), true));
    

    or can be extracted as html: doc.documentElement.outerHTML

    P.S. The parsing stage may be skipped if XMLHttpRequest's responseType is set to document:

    xhr = new XMLHttpRequest();
    xhr.responseType = "document";
    xhr.open("GET", "http://someurl");
    xhr.onload = function() {
        var doc = this.responseXML;
        doc.head.remove();
        ..................
    };
    xhr.send();
    
Sign up to request clarification or add additional context in comments.

Comments

0

Disregarding the fact that regular expressions are not suitable to parse HTML, this case can be much easier to process if you simply find the <body> tag and closing tag and pick everything in between. Simply do 2 indexOf() and grab the content between:

var fullHTMLStr = '<html><head>blablabla</head><body bgColor="white">Body!</body></html>';
var start = fullHTMLStr.indexOf('<body'); // don't look for '>', there might be attributes
var start = fullHTMLStr.indexOf('>', start + 4) + 1; // advance past '>'
var end = fullHTMLStr.indexOf('</body', start);

var justBody = fullHTMLStr.substring(start, end);

alert(justBody);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.