0

I have some HTML ordered/ unordered list in HTML. As I want to export it to a txt file, I need to convert it to plain text.

eg. Original HTML:

<ol><li>Item 1</li></li>Item 2</li><li>Item 3</li></ol>

I want to change it to

1. Item 1
2. Item 2
3. Item 3

I searched on StackOverflow but only found a solution of the opposite conversion. A regex that converts text lists to html in PHP

Is there any ways I can handle it? Thanks!

3
  • By the way, there is an error in the original HTML, the second li tag starts with </li>, instead of <li>. Commented Oct 3, 2019 at 9:45
  • You want HTML rendered output to be saved in a text file? Commented Oct 3, 2019 at 9:48
  • Do you want to know how to convert HTML to text using PHP? Or will any way be acceptable? Perhaps XSLT could be appropriate? Commented Oct 3, 2019 at 9:51

4 Answers 4

0

You can simply replace the tags you do not need and explode it on a tag that will return itself for every row.

<?php
$html = '
<ol>
    <li>Item 1</li>
    <li>Item 2</li>
    <li>Item 3</li>
</ol>
';

$html = str_replace(['<ol>', '</ol>', '</li>'], '', $html);
$html = explode('<li>', $html);

print_r($html);
Sign up to request clarification or add additional context in comments.

Comments

0

I think it's a little more complicated than just regex, especially if you want to add the numbers in front. But this little piece of code wil translate <ol><li>Item 1</li></li>Item 2</li><li>Item 3</li></ol> to

* Item 1
* Item 2
* Item 3
<?php

$string = "<ol><li>Item 1</li><li>Item 2</li><li>Item 3</li></ol>";

$string= preg_replace("/<li>/", " * ", $string);
$string= preg_replace("/<\/li>/", "\n", $string);
$string= preg_replace("/<\/?ol>/", "", $string);

echo $string;

4 Comments

* does not represent an ordered list though. This could work for unordered lists
As I said, you're going to have to loop through the list to add numbers, as far as I know, you can't loop through and add numbers in a regex.
Remember - the user has placed the name "Item 1" in as an example that it is item 1, it does not mean the item would have a number in it.
I know, but if he wants an ordered list, he could make a for-loop that adds a number to each line or something like that.
0

Please have a look html2text library. It has different methods to convert your HTML string into plain text.

Comments

0

I think we need to remember here that the numbers mentioned in the LI tags aren't to be used as a reference as they might be "donkey", "lamb", "monkey".

My solution matches anything inside the LI tags and then loops on the matches to create the item numbers.

The preg_match_all will create an array with sub-arrays. The first contains the whole match including the LI tags and the second will just match whatever was found inside the (.*?) non-greedy area.

I have used \n as a line break but if it was HTML output that would be a BR tag obviously

$str = "<ol><li>Monkey</li></li><li>Lamb</li><li>Elephant</li></ol>";
preg_match_all("/<li>(.*?)<\/li>/i",$str,$matches);
if(count($matches[1])>0){
    foreach($matches[1] as $k=>$v){
        echo ($k+1).". $v\n";
    }
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.