I am trying to build a string of the contents of a webpage, without HTML syntax (probably replace it with a space, so words are not all conjoined) or punctuation.
so say you have the code:
<body>
<h1>Content:</h1>
<p>paragraph 1</p>
<p>paragraph 2</p>
<script> alert("blah blah blah"); </script>
This is some text<br />
....and some more
</body>
I want to return the string:
var content = "Content paragraph 1 paragraph 2 this is some text and this is some more";
any idea how to do this? Thanks.