2

I am using this code for removing anchor tag and also to get only inner text using expression

    <ul class="alpha">
                <li><h3><a href="http://www.overstock.com/Electronics/Computers-Tablets/473/dept.html?TID=TN:ELEC:Comp">Computers &amp; Tablets</a></h3></li>
                <li><a href="http://www.overstock.com/Electronics/2-in-1s/28195/subcat.html?TID=TN:ELEC:2in1">2-in-1s</a></li>
                <li><a href="http://www.overstock.com/Electronics/Laptops/133/subcat.html?TID=TN:ELEC:Lap">Laptops</a></li>
</ul>

Expression is:

echo preg_replace('#<a.*?>([^>]*)</a>#i', '$1', $str);

Output is:

Computers & Tablets
2-in-1s
Laptops

Can we get inner text inside anchor tag in an array form using regular expression? Please share your ideas.

4
  • Is it PHP then? You can easily do this with DOMDocument. Commented Aug 17, 2015 at 9:47
  • instead of preg_replace use preg_match_all Commented Aug 17, 2015 at 9:50
  • 1
    As a general rule: use DOM parsing, not regexes to parse HTML. Commented Aug 17, 2015 at 9:51
  • 1
    Don't use regular expressions to process HTML. Please refer to this question to see how it can be done through DOM manipulation. Commented Aug 17, 2015 at 9:51

2 Answers 2

2

Well I don't prefer you to work HTML along with regex instead use DomDocument but as if you want to use regex than you can use preg_match_all as

preg_match_all('/(?:(<a.*?>))(.*?)(?=<\/a>)/', '<ul class="alpha">
                <li><h3><a href="http://www.overstock.com/Electronics/Computers-Tablets/473/dept.html?TID=TN:ELEC:Comp">Computers &amp; Tablets</a></h3></li>
                <li><a href="http://www.overstock.com/Electronics/2-in-1s/28195/subcat.html?TID=TN:ELEC:2in1">2-in-1s</a></li>
                <li><a href="http://www.overstock.com/Electronics/Laptops/133/subcat.html?TID=TN:ELEC:Lap">Laptops</a></li>
</ul>',$res);
print_r($res[0]);

Output :

Array
(
    [0] => Computers & Tablets
    [1] => 2-in-1s
    [2] => Laptops
)
Sign up to request clarification or add additional context in comments.

2 Comments

A regex-based solution for parsing an HTML string containing .*? is far from being perfect. I am sure you will come back here sooner than later for a reliable, right solution. Just a couple of examples: Bad 1, Bad 2. Once the input string is large enough, catastrophic backtracking is imminent.
Yes you're right @stribizhev even that's why I've said its not good idea to work along with regex with HTML its not the correct way to deal with HTML parsing
0

Since you used a jQuery tag I'd prefer to do this in jQuery:

var values = [];
$('.alpha').find('a').each(function(index){
    values.push($(this).text());
});

This code gets all the links in the .alpha class and pushes them in the values array. The output of values is:

0: "Computers & Tablets"
1: "2-in-1s"
2: "Laptops"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.