3

I want to get the TEXT ONLY from the following HTML document without the contents of the <script> tag?

<html>
  <body>
    <script>
      a = 0;
    </script>
   <div>TEST</div>
   <p>test</p>
  </body>
</html>

I have the following code:

$('body').text()

This currently gets the result:

a = 0; TEST test

But I am trying to get the result:

TEST test
3
  • 2
    I have no idea what you are trying to explain here Commented Sep 28, 2017 at 13:42
  • I edited quite a lot but I think it clears up your question, feel free to edit it if I got anything wrong Commented Sep 28, 2017 at 13:44
  • You could remove all the scripts before hand... they are all loaded into memory already. The only potential problem is if any code uses some for templates or other similar use Commented Sep 28, 2017 at 13:45

4 Answers 4

3

Ok, so as you edited your question. If you are looking to extract the text from the page but not script tags, you can write something like

let cloneBody = $('body').clone().find('script').remove().end();
                
console.log(cloneBody.text().trim());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script>
  var a = 1;
</script>
<p>Hello World</p>
<div>This is a test run</div>

Sign up to request clarification or add additional context in comments.

Comments

2

You can do this using javascript as shown in a previous answer: Removing all script tags from html with JS Regular Expression

function stripScripts(s) {
    var div = document.createElement('div');
    div.innerHTML = s;
    var scripts = div.getElementsByTagName('script');
    var i = scripts.length;
    while (i--) {
      scripts[i].parentNode.removeChild(scripts[i]);
    }
    return div.innerHTML;
  }

alert(
 stripScripts('<span><script type="text/javascript">alert(\'foo\');<\/script><\/span>')
);

Comments

1

This is probably not a perfect solution, but should be good enough for simple html pages:

$('<div>').html($('body').html()).find('script').remove().end().text()

Explanation: it creates a div element, copies the html content of the body into it, removes all script tags from the div, and finally gets the text content.

Comments

1

First of all, you can get all the 'none script' elements with the following code:

var elements = $('#body').children().not('script');

Now you could just do the following to get all the text:

var text = elements.text();

However, this will result in no spaces between text nodes, i.e. TESTtest. If this is what you want then great, stop here.

But if you want the spaces, you can loop the elements and build a string:

var text = "";
elements.each(function(){
    text += $(this).text() + " ";
});
text = text.trim();

Note that this solution does not maintain any line breaks, which is what I have assumed based on your question.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.