PHP grabbing content between two strings

Question

// get CONTENT from united domains footer
$content = file_get_contents('http://www.uniteddomains.com/index/footer/');

// remove spaces from CONTENT
$content = preg_replace('/\s+/', '', $content);

// match all tld tags
$regex = '#target="_parent">.(.*?)</a></li><li>#';
preg_match($regex, $source, $matches);


print_r($matches);

I am wanting to match all of the TLDs:

Each tld is preceded by target="_parent">. and followed by </a></li><li>

I am wanting to end up with an array like array('africa','amsterdam','bnc'...ect ect )

What am I doing wrong here?

NOTE: The second step to remove all the spaces is just to simplify things.

This is still HTML parsing which should be done with an appropriate HTML parser and not regular expressions. — Gumbo
– Gumbo, Commented Jul 28, 2013 at 20:00
It is not HTML parsing, it is finding a particular pattern in a string that happens to be HTML. — Daniel Gimenez
– Daniel Gimenez, Commented Jul 28, 2013 at 20:06

Community · Accepted Answer · 2020-06-20 09:12:55Z

Here's a regular expression that will do it for that page.

\.\w+(?=</a></li>)

REY

PHP

$content = file_get_contents('http://www.uniteddomains.com/index/footer/');
preg_match_all('/\.\w+(?=<\/a><\/li>)/m', $content, $matches);
print_r($matches);

PHPFiddle

Here are the results:

.africa, .amsterdam, .bcn, .berlin, .boston, .brussels, .budapest, .gent, .hamburg, .koeln, .london, .madrid, .melbourne, .moscow, .miami, .nagoya, .nyc, .okinawa, .osaka, .paris, .quebec, .roma, .ryukyu, .stockholm, .sydney, .tokyo, .vegas, .wien, .yokohama, .africa, .arab, .bayern, .bzh, .cymru, .kiwi, .lat, .scot, .vlaanderen, .wales, .app, .blog, .chat, .cloud, .digital, .email, .mobile, .online, .site, .mls, .secure, .web, .wiki, .associates, .business, .car, .careers, .contractors, .clothing, .design, .equipment, .estate, .gallery, .graphics, .hotel, .immo, .investments, .law, .management, .media, .money, .solutions, .sucks, .taxi, .trade, .archi, .adult, .bio, .center, .city, .club, .cool, .date, .earth, .energy, .family, .free, .green, .live, .lol, .love, .med, .ngo, .news, .phone, .pictures, .radio, .reviews, .rip, .team, .technology, .today, .voting, .buy, .deal, .luxe, .sale, .shop, .shopping, .store, .eus, .gay, .eco, .hiv, .irish, .one, .pics, .porn, .sex, .singles, .vin, .vip, .bar, .pizza, .wine, .bike, .book, .holiday, .horse, .film, .music, .party, .email, .pets, .play, .rocks, .rugby, .ski, .sport, .surf, .tour, .video

Casimir et Hippolyte · Accepted Answer · 2013-07-28 20:22:45Z

0

Using the DOM is cleaner:

$doc = new DOMDocument();
@$doc->loadHTMLFile('http://www.uniteddomains.com/index/footer/');
$xpath = new DOMXPath($doc);
$items = $xpath->query('/html/body/div/ul/li/ul/li[not(@class)]/a[@target="_parent"]/text()');
$result = '';
foreach($items as $item) {
    $result .= $item->nodeValue; }
$result = explode('.', $result);
array_shift($result);
print_r($result);

edited Jul 28, 2013 at 20:22

answered Jul 28, 2013 at 20:08

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

1 Comment

user1512405 Over a year ago

How would I make this where it will only match lowercase? Using that exact code it pulls "Geographic & Travel" and other header text.

Collectives™ on Stack Overflow

PHP grabbing content between two strings

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related