5

I have a wordpress database which has some embeded iframes from sound cloud. I want the the iframes to be replaced with some sort of shortcode. I have even created a shortcode and it works pretty good.

The problem is that i have an old database with approx 2000 posts which has already embeded codes. What i want to do is to write a code so that it would replace the iframe with the shortcode.

Here is the code which i am using to find the url from the content but it always returns blank.

$string = 'Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson\'s retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad\'s record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you\'ve ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>\'s Jeffree\'s sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F8087281&amp;color=000000&amp;auto_play=false&amp;show_artwork=true" frameborder="no" scrolling="no" width="100%" height="350"></iframe>';

preg_match("/url=(.*?)/", $string, $matches);

print_r($matches);

The above code doesn't work and i am not so familiar with regex so if any one can figure out what is wrong here then it would be great. And also if anyone can guide me the right process to do this then that would be great.

1
  • 1
    Don't use regular expressions to parse HTML. Use a proper HTML parsing module. You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See htmlparsing.com/php for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. Commented Aug 13, 2013 at 15:28

5 Answers 5

4

Since you're working with HTML here, I would recommend using DOM functions:

$doc = new DOMDocument;
$doc->loadHTML($string);

foreach ($doc->getElementsByTagName('iframe') as $iframe) {
    $url = $iframe->getAttribute('src');
    // parse the query string
    parse_str(parse_url($url, PHP_URL_QUERY), $args);
    // save the modified attribute
    $iframe->setAttribute('src', $args['url']);
}

echo $doc->saveHTML();

This outputs the full document, so you would need to trim it down:

$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
    echo $doc->saveHTML($node);
}

Output:

<p>Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson's retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad's record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you've ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>'s Jeffree's sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="http://api.soundcloud.com/playlists/8087281" frameborder="no" scrolling="no" width="100%" height="350"></iframe></p>
Sign up to request clarification or add additional context in comments.

Comments

2
+50

For a one-time fix, you might consider an SQL solution. Some assumptions with the following SQL:

  • There is only ONE iframe per post to be replaced (the SQL can be run multiple times if there are posts with more than one iframe).
  • The iframes to be replaced ALL are in the form:

<iframe src="https://w.soundcloud.com/player/?url="..." other-stuff</iframe>

  • All you care about is what's between the quotes for the url parameter
  • The end result is [soundcloud url="..."]

If all of this is true, then the following SQL should do the trick. It can be tweaked if you want a different shortcode, etc.

Be sure to backup your wp_posts table before performing ANY mass update.

CREATE TABLE wp_posts_backup SELECT * FROM wp_posts
;

Once the backup is complete, the following SQL should fix all of your posts in a single shot:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

I would suggest you TEST a few posts before running this against all of them. An easy way to test would be to add the following to the WHERE clause above (immediately before ';') changing '?' to the Post ID(s) to be tested.

AND p.ID IN (?,?,?)

If for any reason you need to restore your posts, you can do something like:

UPDATE wp_posts p
  JOIN wp_posts_backup b
    ON b.ID = p.ID
   SET p.post_content = b.post_content
;

One other thing to consider. I wasn't sure if you wanted to pass on the parameters that are currently a part of the url, so I included them. You can easily remove them by changing:

                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'

to:

                           ,'"]'

resulting in:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

Updated to allow for no parameters in the url

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX(
                                     SUBSTRING_INDEX( SUBSTR( p.post_content
                                                            , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                            )
                                                    , '&amp;', 1
                                                    )
                                                , '"', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

Good luck.

8 Comments

This works great man. can you just let me know if we can be more specific. Currently after running the script i am getting [soundcloud url="http://api.soundcloud.com/tracks/107374286?color=000000&auto_play=false&show_artwork=true"] can't we make it [soundcloud url="http://api.soundcloud.com/tracks/107374286"] ?
Sure can. As I stated at the end of the answer, "One other thing to consider. I wasn't sure if you wanted to pass on the parameters that are currently a part of the url, so I included them. You can easily remove them by changing: ...". So, replace ,'?',SUBSTRING_INDEX ... ,']' with ,'"]' and you get a URL without the parameters.
Yes that worked fine but there is another problem. When i did that for some entries it became like this [soundcloud url="http://api.soundcloud.com/playlists/3608420" frameborder="no" scrolling="no" width="100%" height="350"></iframe> Can i again do something to make this correct. ?
So, it appears that not all of your URLs contain parameters. Updated SQL to allow for URLs without parameters.
Hello, There are much more problems like for some of the iframe the src is not at the begining like <iframe width='100%' src='...'></iframe>
|
2

This should do for what you've specified

$new_string = preg_replace('/(?:<iframe[^\>]+src="[^\"]*url=([^\"]*soundcloud\.com[^\"]*))"[^\/]*\/[^\>]*>/i', '[soundcloud url="$1"]', $string);

It's limited to iframes with the url=…soundcloud… part in the src attribute and replaces the entire iframe code with [soundcloud url="{part after url=}"]

Comments

1
<?php
    preg_match("/url\=([^\"]+)/i", $string, $matches);

so basically you would like to match any characters (1+) after url= but not after the "

3 Comments

Hello thanks for the answer but can you guide me to make it more specific like w.soundcloud.com/player/?url=
And what's the best way to do the replacement of the iframe with some thing like [soundcloud url="$matches[0]" ]
preg_replace("/w\.soundcloud\.com\/\?url\=([^\"]+)/i", "[soundcloud url=\"$1\" ]", $string);
1

I'd suggest looking into simplehtmldom. It is a DOM parser that uses a selector similar to jQuery and CSS.

http://simplehtmldom.sourceforge.net/

$html = load($html_from_database);
// Find all frames
foreach($html->find('frame') as $element){
   $source = $element->src; // extract the source from the frame.
   // This is where you do your magic like changing links. 
   $element->href = $source ; // This is where you replace the old source
}


// UPDATE $html back into the table.

Make sure you make a complete backup of all tables before you UPDATE any tables after parsing :)

http://simplehtmldom.sourceforge.net/manual.htm

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.