I'm trying to do a custom HTML to LaTeX converter, which uses Wordpress posts as a source.
Basically, it needs to do some "replacing", like:
<h2>H2 Title</h2>
<p>Text text text</p>
<img src="/image.png" alt="Image ALT tag" \>
To this
\begin{document}
\section{H2 Title}
Text text text
\shorthandoff{=}
\begin{figure}[H]
\centering
\includegraphics[scale=0.7]{./img/image.png}
\caption{Image ALT tag}
\end{figure}
\shorthandon{=}
\end{document}
Which approach should I use? Is there a HTML DOM parser that allows replacements like this? Or other suggestions?
Update: Is there any way to walk properly in HTML DOM tree in PHP? I tried RecursiveDOMIterator (http://stackoverflow.com/questions/4431142/loop-through-all-elements-of-body-tags-using-dom) but I can't get a successfull result.
Thanks.