1

I've got a page i want to parse that has overlapping tags like this

 <div>
  <p>
   <strong>
    <span>sometext</span>
     <div> <- this tag is misplaced
   </strong>
  </p>
       <- and should be here
     </div>

The problem is there're more p tags to be parsed, but the parser thinks that it reached the end.

I need it to be parsed in the way i can access each p separately

$ar_w = $ar->find('div[itemprop=ar] p');
    foreach ($ar_w as $para) {
        //something
    }

any ideas how to solve this?

1 Answer 1

1

Your HTML is invalid.

  • You cannot put a <div> inside a <p> (but since the end tag for <p> is optional, the <div> will implicitly end it and then the </p> will be ignored because there is no matching <p>).
  • You cannot put a <div> or inside a <strong>
  • You cannot have a <div> start tag without a matching end tag

If you want to recover from the HTML errors in a particular, non-standard way, you'll need to write a custom parser. Pre-built ones tend to follow the HTML rules.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.