3

Today when I was parsing one page with Simple HTML DOM parser I didn't get any result. So I thought, that it must be strange. So I went to see HTML code written there. I found that there's many mistakes.

So here comes the question. What to do in state, when parser works correctly, but HTML is a mess. Maybe some one would suggest some aproach or some other parser which is able to handle, that sort of matters.

Thank you all for help.

12
  • possible duplicate of How do I parse partial HTML? Commented Apr 6, 2011 at 9:19
  • 1
    possible duplicate of Best methods to parse HTML Commented Apr 6, 2011 at 9:22
  • both above are incorrect. it is not partial HTML, but broken HTML and he is already using the "best option" from that second link. duplicate would be something like stackoverflow.com/questions/2351526/… Commented Apr 6, 2011 at 9:35
  • It also heavily depends on why are you parsing this HTML and whether you have control over the source – the answer might be tidy, simpledom, even regexp might be the right tool in few cases. Commented Apr 6, 2011 at 9:38
  • @dogmatic The answer I linked is specifically about parsing HTML (which implies broken because HTML is broken by design). OP asked for alternatives. DOM can parse broken HTML fine. And SimpleHTMLDom is the worst solution for an HTML parser ever. So the options given in my answer should solve the OP's question, hence, it's a duplicate. Commented Apr 6, 2011 at 9:53

2 Answers 2

2

Run it through tidy before trying to load it into a DOM tree, http://php.net/manual/en/book.tidy.php

Sign up to request clarification or add additional context in comments.

Comments

0

Seems like php's built in stuff should work fine for the html that is not so well written. Have a read in the comments as some people have info about it.

http://docs.php.net/manual/en/domdocument.loadhtml.php

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.