Remove anchor tag and get inner text in an array form using regular expression

Question

I am using this code for removing anchor tag and also to get only inner text using expression

    <ul class="alpha">
                <li><h3><a href="http://www.overstock.com/Electronics/Computers-Tablets/473/dept.html?TID=TN:ELEC:Comp">Computers &amp; Tablets</a></h3></li>
                <li><a href="http://www.overstock.com/Electronics/2-in-1s/28195/subcat.html?TID=TN:ELEC:2in1">2-in-1s</a></li>
                <li><a href="http://www.overstock.com/Electronics/Laptops/133/subcat.html?TID=TN:ELEC:Lap">Laptops</a></li>
</ul>

Expression is:

echo preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $str);

Output is:

Computers & Tablets
2-in-1s
Laptops

Can we get inner text inside anchor tag in an array form using regular expression? Please share your ideas.

As a general rule: use DOM parsing, not regexes to parse HTML. — Alexander
– Alexander, Commented Aug 17, 2015 at 9:51
Don't use regular expressions to process HTML. Please refer to this question to see how it can be done through DOM manipulation. — npinti
– npinti, Commented Aug 17, 2015 at 9:51

Narendrasingh Sisodia · Accepted Answer · 2015-08-17 11:02:13Z

2

Well I don't prefer you to work HTML along with regex instead use DomDocument but as if you want to use regex than you can use preg_match_all as

preg_match_all('/(?:(<a.*?>))(.*?)(?=<\/a>)/', '<ul class="alpha">
                <li><h3><a href="http://www.overstock.com/Electronics/Computers-Tablets/473/dept.html?TID=TN:ELEC:Comp">Computers &amp; Tablets</a></h3></li>
                <li><a href="http://www.overstock.com/Electronics/2-in-1s/28195/subcat.html?TID=TN:ELEC:2in1">2-in-1s</a></li>
                <li><a href="http://www.overstock.com/Electronics/Laptops/133/subcat.html?TID=TN:ELEC:Lap">Laptops</a></li>
</ul>',$res);
print_r($res[0]);

Output :

Array
(
    [0] => Computers & Tablets
    [1] => 2-in-1s
    [2] => Laptops
)

edited Aug 17, 2015 at 11:02

answered Aug 17, 2015 at 10:06

Narendrasingh Sisodia

21.4k6 gold badges51 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Wiktor Stribiżew Over a year ago

A regex-based solution for parsing an HTML string containing .*? is far from being perfect. I am sure you will come back here sooner than later for a reliable, right solution. Just a couple of examples: Bad 1, Bad 2. Once the input string is large enough, catastrophic backtracking is imminent.

Narendrasingh Sisodia Over a year ago

Yes you're right @stribizhev even that's why I've said its not good idea to work along with regex with HTML its not the correct way to deal with HTML parsing

Starfish · Accepted Answer · 2015-08-17 09:53:19Z

0

Since you used a jQuery tag I'd prefer to do this in jQuery:

var values = [];
$('.alpha').find('a').each(function(index){
    values.push($(this).text());
});

This code gets all the links in the .alpha class and pushes them in the values array. The output of values is:

0: "Computers & Tablets"
1: "2-in-1s"
2: "Laptops"

answered Aug 17, 2015 at 9:53

Starfish

3,6541 gold badge25 silver badges58 bronze badges

Collectives™ on Stack Overflow

Remove anchor tag and get inner text in an array form using regular expression

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related