3

I have this block of text:

$text = 'This just happened outside the store http://somedomain.com/2012/12/store there might be more text afterwards...';

It needs to be converted to:

$result['text_1'] = 'This just happened outside the store';
$result['text_2'] = 'there might be more text afterwards...';
$result['url'] = 'http://somedomain.com/2012/12/store';

This is my current code, it does detect the url, but i can only remove it from the text, I still need the url value separately in an array:

$string = preg_replace('/https?:\/\/[^\s"<>]+/', '', $text);
//returns "This just happened outside the store  there might be more text afterwards..."

Any ideas? Thanks!

Temporal solution (can this be optimized?)

$text = 'This just happened outside the store http://somedomain.com/2012/12/store There might be more text afterwards...';
preg_match('/https?:\/\/[^\s"<>]+/',$text,$url);
$string = preg_split('/https?:\/\/[^\s"<>]+/', $text);
$text = preg_replace('/\s\s+/','. ',implode(' ',$string));
echo '<a href="'.$url[0].'">'.$text.'</a>';
3
  • 1
    preg_match() and preg_match_all() are for extracting. Do your replacement afterwards. (Or actually, preg_replace_callback() to do all at once.) Commented Nov 9, 2012 at 14:22
  • thanks for the tip, i updated the original post with your suggestion, it is working fine now. do you think it could be optimized? Commented Nov 9, 2012 at 14:38
  • between freejosh and i the one-liner is below Commented Nov 9, 2012 at 14:42

2 Answers 2

2

Do you need it to store in a variable or just need it inside the ahref? How about this?

<?php
$text = 'This just happened outside the store http://somedomain.com/2012/12/store There might be more text afterwards...';
$pattern = '@(.*?)(https?://.*?) (.*)@';
$ret = preg_replace( $pattern, '<a href="$2">$3</a>', $text );
var_dump( $ret );

$1, $2, and $3 corresponds to the 1st, 2nd, 3rd parenthesis

the output would be

<a href="http://somedomain.com/2012/12/store">There might be more text afterwards...</a>
Sign up to request clarification or add additional context in comments.

2 Comments

This is actually a pretty nice solution-- i'm just wondering what would happen if by any chance the block of text contains more than one link.
then that means its gonna be included inside the <a too, so should still work. but do try it out with different test cases. You can change the 3rd parenthesis (.*) into (.*?) to make it greedy.
1

you could split your string on the regex using preg_split to give you an array

$result = preg_split('/(https?:\/\/[^\s"<>]+)/', $the_string, -1, PREG_SPLIT_DELIM_CAPTURE);
// $result[0] = preamble
// $result[1] = url
// $result[2] = possible afters

4 Comments

it is not returning the url, only the text blocks. Any ideas?
You also need to set the PREG_SPLIT_DELIM_CAPTURE flag, and wrap the regex in () to return it in the results.
Doesn't work. Returns an array with two items, before and after the url
I updated the post with a few new tweaks, kind of works fine. Any ideas on how to improve it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.