2

String I'm trying to parse.

<b>Genre:</b> <a href="http://store.steampowered.com/genre/Action/?snr=1_5_9__408">Action</a>, <a href="http://store.steampowered.com/genre/Adventure/?snr=1_5_9__408">Adventure</a>, <a href="http://store.steampowered.com/genre/Casual/?snr=1_5_9__408">Casual</a>, <a href="http://store.steampowered.com/genre/Early%20Access/?snr=1_5_9__408">Early Access</a>, <a href="http://store.steampowered.com/genre/Indie/?snr=1_5_9__408">Indie</a>, <a href="http://store.steampowered.com/genre/RPG/?snr=1_5_9__408">RPG</a><br>

What I'm trying to achieve (without all the other tags etc):

Action Adventure Casual Early Access Indie RPG

Here's what I've tried

        function getTagInfo($content,$start,$end){
            $r = explode($start, $content);
            if (isset($r[1])){
                $r = explode($end, $r[1]);
                return $r[0];
            }
            return '0';
        }


 getTagInfo($html, '/?snr=1_5_9__408">', '</a>');

and that only gives me one genre, I can't think of an algorithm to be able to parse the rest also, so how would I be able to parse the other lines?

2

5 Answers 5

1

You can use regexp's here:

<a.*?>(.*?)</a>

This RegExp will return all <a></a> contetns.

Try this php code:

preg_match(/<a.*?>(.*?)<\/a>/, $htmlString, $matches);

foreach($matches as $match) {
    echo $match . " <br /> "; 
}

This will output:

Action 
Adventure 
Casual 
Early 
Access 
Indie 
RPG

Sign up to request clarification or add additional context in comments.

1 Comment

Updated code. / symbol have to be escaped by `\` .
1

You can use this code from another stackoverflow thread.

PHP/regex: How to get the string value of HTML tag?

 <?php
function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

Comments

1

You can use preg_match_all:

$regex = '/<a.*?>(.*?)<\/a>/is';
preg_match_all($regex, $html, $matches);

$matches[1] will then be an array of the contents between the anchor tags and you could iterate over it like this:

foreach ($matches[1] as $match)
{
  echo $match .'<br>';
}

It would probably be better to use an actual HTML parser, as HTML is not regualr syntax.

Comments

1

You may try something like this (DEMO):

function getTagInfo($html)
{
    if( preg_match_all('/<a href=\"(.*?)\">/i', $html, $matches)) {
        $result = array();
        foreach($matches[1] as $href) {
            $array = explode('/', $href);
            $arr = $array[count($array) - 2];
            $result[] = urldecode($arr);
        }
        return $result;
    }
    return false;
}

// Get an array
print_r(getTagInfo($html));

Output:

Array ( 
    [0] => Action 
    [1] => Adventure 
    [2] => Casual 
    [3] => Early Access 
    [4] => Indie 
    [5] => RPG 
)

Comments

0

I would probably do this with REGEX also, but since there are already 4 posts with REGEX answers, I'll throw another idea out there. This may be overly simple, but you can use strip_tags to remove any HTML tags.

$string = '<b>Genre:</b> <a href="http://store.steampowered.com/genre/Action/?snr=1_5_9__408">Action</a>, <a href="http://store.steampowered.com/genre/Adventure/?snr=1_5_9__408">Adventure</a>, <a href="http://store.steampowered.com/genre/Casual/?snr=1_5_9__408">Casual</a>, <a href="http://store.steampowered.com/genre/Early%20Access/?snr=1_5_9__408">Early Access</a>, <a href="http://store.steampowered.com/genre/Indie/?snr=1_5_9__408">Indie</a>, <a href="http://store.steampowered.com/genre/RPG/?snr=1_5_9__408">RPG</a><br>';

print strip_tags($string);

This will return the following:

Genre: Action, Adventure, Casual, Early Access, Indie, RPG

Anyway, it's probably not how I'd go about doing it, but it's a one-liner that is really easy to implement.

I reckon, you can also turn it into the array you're looking for by combining the preceeding with some REGEX like this:

$string_array = preg_split('/,\s*/', preg_replace('/Genre:\s+/i', '', strip_tags($string)));

print_r($string_array);

That will give you the following:

Array
(
    [0] => Action
    [1] => Adventure
    [2] => Casual
    [3] => Early Access
    [4] => Indie
    [5] => RPG
)

Ha, sorry ... ended up throwing REGEX into the answer anyway. But it's still a one-liner. :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.