I am trying to migrate old blog posts (based on WP) to a new platform. One of the steps is defined by:
- Get full_text of posts
- Search for the existence of full path/url of old images (let's set https://stackoverflow.com/uploads/logo.png or just uploads/logo.png)
- Extract/save and get the guid() of new images
- Switch old path https://stackoverflow.com/uploads/logo.png to a new one (let's see https://quora.com/media/brand123.png
I tried a regex expression to search for old urls:
/(http:\/\/stackoverflow\.com\/uploads\/)+(.*?)[a-zA-Z0-9]+(\.jpg|\.png|\.gif)/
And then tried:
$old = array();
$pattern = "/(https:|http:\/\/stackoverflow\.com\/uploads\/)+(.*?)[a-zA-Z0-9]+(\.jpg|\.png|\.gif)/";
$text = "orem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://stackoverflow.com/uploads/image1.png'/> rem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor <img src='https://stackoverflow.com/uploads/image2.png'/>";
// seatch and get old urls
preg_match_all($pattern, $text, $old);
But it get's me something like this:
array(4) {
[0]=>
array(2) {
[0]=>
string(44) "https://stackoverflow.com/uploads/image1.png"
[1]=>
string(44) "https://stackoverflow.com/uploads/image2.png"
}
[1]=>
array(2) {
[0]=>
string(6) "https:"
[1]=>
string(6) "https:"
}
[2]=>
array(2) {
[0]=>
string(28) "//stackoverflow.com/uploads/"
[1]=>
string(28) "//stackoverflow.com/uploads/"
}
[3]=>
array(2) {
[0]=>
string(4) ".png"
[1]=>
string(4) ".png"
}
}