2

Let's say I have this input:

I can haz a listz0rs!
# 42
# 126
I can haz another list plox?
# Hello, world!
# Welcome!

I want to split it so that each set of hash-started lines becomes a list:

I can haz a listz0rs!
<ul>
    <li>42</li>
    <li>126</li>
</ul>
I can haz another list plox?
<ul>
    <li>Hello, world!</li>
    <li>Welcome!</li>
</ul>

If I run the input against the regex "/(?:(?:(?<=^# )(.*)$)+)/m", I get the following result:

Array
(
    [0] => Array
    (
        [0] => 42
    )
    [1] => Array
    (
        [0] => 126
    )
    [2] => Array
    (
        [0] => Hello, world!
    )
    [3] => Array
    (
        [0] => Welcome!
    )
)

This is fine and dandy, but it doesn't distinguish between the two different lists. I need a way to either make the quantifier return a concatenated string of all the occurrences, or, ideally, an array of all the occurrences.

Ideally, this should be my output:

Array
(
    [0] => Array
    (
        [0] => 42
        [1] => 126
    )
    [1] => Array
    (
        [0] => Hello, world!
        [1] => Welcome!
    )
)

Is there any way of achieving this, and if not, is there a close alternative?

6 Answers 6

1

If you want to do this with regular expressions, you'll need two. Use the regex ^(#.*\r?\n)+ to match each list and add tags around it. Within each list (as matched by the first regex), search-and-replace ^#.* with <li>$0</li> to add tags around each list item. Both regexes require ^ to match at line breaks (/m flag in PHP).

In PHP you can use preg_replace_callback and preg_replace to achieve this in just a few lines of code.

$result = preg_replace_callback('/^(#.*\r?\n)+/m', 'replacelist', $subject);

function replacelist($groups) {
  return "<ul>\n" .
    preg_replace('/^#.*/m', '    <li>$0</li>', $groups[0])
    . "</ul>\n";
}
Sign up to request clarification or add additional context in comments.

Comments

0

I'd say don't try to do it all in a single regex - instead, first use a regex to match sets of consecutive lines that all begin with # signs and wrap those lines with a <ul></ul> pair. Then use a second regex (or not even a regex at all - you could just split on line breaks) to match each individual line and convert it to <li></li> format.

2 Comments

I thought of doing that as well, but the problem is that you can't quantify the lines, so when you wrap them in ul tags, you'll be wrapping each line instead of the entire set.
With a multiline regex, you could match multiple lines at once. You'd just need to match the whitespace newline characters between them too.
0

If it was me I would:

  1. explode("\n", $input) into an array where 1 key = line
  2. foreach through that array
  3. whenever you get a line that doesn't start with a #, that's when you add your closing/opening ul tags

Add a little more to deal with unexpected input (like two non hash lines in a row) and you're good.

Comments

0

Looks like Syntax Error has already explained what I'm doing. But here goes the link to a working example.

Comments

0

With structured content like this, I would not do this as a regex. How about another approach?

$your_text = <<<END
I can haz a listz0rs!
# 42
# 126
I can haz another list plox?
# Hello, world!
# Welcome!
END;

function printUnorderedList($temp) {
    if (count($temp)>0) {
        print "<ul>\n\t<li>" .implode("</li>\n\t<li>", $temp) . "</li>\n</ul>\n";
    }
}

$lines = explode("\n", $your_text);
$temp = array();
foreach($lines as $line) {
    if (substr($line, 0, 1) == '#') {
        $temp[] = trim(substr($line,1));
    } else {
        printUnorderedList($temp);
        $temp = array();
        echo $line . "\n";
    }
}
printUnorderedList($temp);

Comments

0

You could avoid regex altogether, and simply try a simpler approach by having it read the file, line by line (an array of lines), and every time it encounters a non-hash-started line, it starts a new list. Like so:

// You can get this by using file('filename') or 
// just doing an explode("\n", $input)
$lines = array(
    'I can haz a listz0rs!',
    '# 42',
    '# 126',
    'I can haz another list plox?',
    '# Hello, world!',
    '# Welcome!'
);

$hashline = false;
$lists = array();
$curlist = array();
foreach ($lines as $line) {
    if ($line[0] == '#')
        $curlist[] = $line;
    elseif ($hashline) {
        $lists[] = $curlist;
        $curlist = array();
        $hashline = false;
    }
}

A little clean-up may be in order, but hopefully it helps.

(after reading new answers, this is basically an indepth explanation of Syntax Error's answer.)

EDIT: You may want it to strip off the # at the beginning of each line too.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.