Converting HTML to another syntax (LaTeX) with PHP

Question

I'm trying to do a custom HTML to LaTeX converter, which uses Wordpress posts as a source.

Basically, it needs to do some "replacing", like:

<h2>H2 Title</h2>
<p>Text text text</p>
<img src="/image.png" alt="Image ALT tag" \>

To this

   \begin{document}

   \section{H2 Title}

   Text text text

   \shorthandoff{=}
   \begin{figure}[H]
   \centering
   \includegraphics[scale=0.7]{./img/image.png}
   \caption{Image ALT tag}
   \end{figure}
   \shorthandon{=}

   \end{document}

Which approach should I use? Is there a HTML DOM parser ~~that allows replacements like this~~? Or other suggestions?

Update: Is there any way to walk properly in HTML DOM tree in PHP? I tried RecursiveDOMIterator (http://stackoverflow.com/questions/4431142/loop-through-all-elements-of-body-tags-using-dom) but I can't get a successfull result.

Thanks.

have you look at: html2latex.sourceforge.net

RobertPitt
– RobertPitt

2011-02-06 19:52:45 +00:00
Commented Feb 6, 2011 at 19:52 — RobertPitt
– RobertPitt, Commented Feb 6, 2011 at 19:52

Michael Koval · Accepted Answer · 2011-02-06 20:55:16Z

1

Have you tried PHP Simple HTML DOM Parser? Specifically, the "How to traverse the DOM tree?" section in the manual might be what you are looking for.

answered Feb 6, 2011 at 20:55

Michael Koval

8,4075 gold badges44 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kyle · Accepted Answer · 2011-02-06 21:08:10Z

1

Depending on how complicated the structure of the HTML in your posts is, you could use regular expression-based replacements (if the markup is fairly simple, as in your example). If you want to replicate complex structures (nested elements) into LaTeX, then regex likely wouldn't work.

answered Feb 6, 2011 at 21:08

Kyle

2,8922 gold badges20 silver badges25 bronze badges

1 Comment

Michael Koval Over a year ago

Even if it is possible to parse the subset of HTML necessary for Hazar's task using regular expressions, it is still not advisable. This would quickly become unwieldy when dealing with attributes and would not give the tree-like data structure necessary to construct the LaTeX document.

Collectives™ on Stack Overflow

Converting HTML to another syntax (LaTeX) with PHP

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related