1

I am using an API to retrieve the HTML for all webforms inside of a certain application. The trouble is that the returned HTML contains <html>, <style>, and <body> tags around the <form>, but all I need is the <form> (there is also an onsubmit attribute, but I am fairly sure I can handle that a little bit down the road).

I was able to remove the style tags with some clever regex, but I am unsure of a way to remove the <form> from the middle of the <html> and <body> tags.

So far this is all happening in PHP. I am thinking it might be possible to json_encode the string and then pass it over to JS and use jQuery to getJSON maybe? I'm still not 100% clear on the best way to do this though.

Sample of my returned php string...

<html width="100%" height="100%">
  <body class="body stuff">
    <form>
      <input type="text" name="input">
      <input type="text" name="anotherInput">
    </form>
  </body>
</form>

All I want out of this string is the <form> though

1
  • Have you tried using DOMDocument? Commented Oct 20, 2015 at 13:26

2 Answers 2

2
<?
$regex = "/<form>(.*?)<\/form>/s";
preg_match($regex,$string,$match);
print_r($match);
?>

should result in something like this:

Array
(
    [0] => <form>
      <input type="text" name="input">
      <input type="text" name="anotherInput">
    </form>
    [1] => 
      <input type="text" name="input">
      <input type="text" name="anotherInput">

)

what you need then would be $match[1]

Sign up to request clarification or add additional context in comments.

1 Comment

I had to modify the regex a little bit because of some extra attributes on the form tags themselves "/<form (.*?)>(.*?)<\/form>/is", but this ended up working perfectly. thanks! I also had to use preg_match_all for some reason, which I don't completely understand
0

You could use $.parseHTML() to convert your string to a DOM Object.

Then append or use DOM traversal to find whatever objects you need. Additionally when you use $.parseHTML() it appears to remove the <html> and <body> tags automatically, leaving only the innards of the document.

var string = '<html width="100%" height="100%"><body class="body stuff"><form><input type="text" name="input"><input type="text" name="anotherInput"></form></body></html>';

var htmlObject = $.parseHTML(string);
$('body').append( htmlObject );

Here is a quick DEMO I put together.

Edit

In the DEMO you can see an $.each() method loop through the object and append the nodenames to an unordered list. This is how you can verify that $.parseHTML() actually removed the <html> and <body> from the object.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.