Regex to match Youtube URL's

Question

I am trying to validate a Youtube URL using regex:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+~', $videoLink)

It kind of works, but it can match URL's that are malformed. For example, this will match ok:

http://www.youtube.com/watch?v=Zu4WXiPRek

But so will this:

http://www.youtube.com/watch?v=Zu4WX£&P!ek

And this wont:

http://www.youtube.com/watch?v=!Zu4WX£&P4ek

I think it's because of the + operator. It's matching what seems to be the first character after v=, when it needs to try and match everything behind v= with [a-zA-Z0-9-]. Any help is appreciated, thanks.

What you have looks fine. Are £, & ! valid characters in the YouTube string? If so, add them to your [a-zA-Z0-9-] char class, otherwise, isn't it working as intended? — Tim Fountain
– Tim Fountain, Commented Sep 17, 2010 at 17:46
The + btw, means: match any of these characters: [a-zA-Z0-9-] one or more times, so it will keep going until it hits something not in there. — Tim Fountain
– Tim Fountain, Commented Sep 17, 2010 at 17:47
The only characters allowed in a Youtube video ID like this is a-z, A-Z, 0-9 and -. Hence why I put [a-zA-Z0-9-]. It's not working as intended. I can submit URLs like: v=Zu4WX£&P!ek (in this case £, & and ! are illegal characters) and it matches them fine because it's only checking the first character after v=. — Will
– Will, Commented Sep 17, 2010 at 17:49
it would help if you provide more context. Where are you getting the url from? Is it from a full page scrape and the urls are in an href="..."? You could do like [a-zA-Z0-9-]+("|') do you already have the list of urls parsed and looping through them? — CrayonViolent
– CrayonViolent, Commented Sep 17, 2010 at 17:55
The URL is being submitted through a form by the user, and I need to check that it is a valid Youtube URL before I send off requests to the page. — Will
– Will, Commented Sep 17, 2010 at 17:57

Pekka · Accepted Answer · 2010-09-17 17:53:06Z

3

To provide an alternative that is larger and much less elegant than a regex, but works with PHP's native URL parsing functions so it might be a bit more reliable in the long run:

 $url = "http://www.youtube.com/watch?v=Zu4WXiPRek";

 $query_string = parse_url($url, PHP_URL_QUERY); // v=Zu4WXiPRek

 $query_string_parsed = array();                        
 parse_str($query_string, $query_string_parsed); // an array with all GET params

 echo($query_string_parsed["v"]); // Will output Zu4WXiPRek that you can then
                                  // validate for [a-zA-Z0-9] using a regex

edited Sep 17, 2010 at 17:53

answered Sep 17, 2010 at 17:47

Pekka

451k150 gold badges989 silver badges1.1k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

CrayonViolent Over a year ago

just want to point out that this is only really useful (and IMO recommended) if you already have just the url...but not really if he's scraping a page for urls...

Will Over a year ago

That just seems like added code going back to the original problem. The problem is with validating the string after v=, which is what this code extracts. I don't need it extracted, I just need to make sure the rest of the URL after v= is matched by [a-zA-Z0-9-].

Pekka Over a year ago

@Will yeah. This is a more standards-conformant way that can deal with changing URL structures to some extent. For example, it doesn't break when a URL has the popular &fmt=18 parameter. Anyway, it's just an alternative suggestion; as far as I can see, @lonesomeday answers your speficic question

lonesomeday · Accepted Answer · 2010-09-17 21:06:13Z

0

The problem is that you are not requiring any particular number of characters in the v= part of the URL. So, for instance, checking

http://www.youtube.com/watch?v=Zu4WX£&P!ek

will match

http://www.youtube.com/watch?v=Zu4WX

and therefore return true. You need to either specify the number of characters you need in the v= part:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]{10}~', $videoLink)

or specify that the group [a-zA-Z0-9-] must be the last part of the string:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+$~', $videoLink)

Your other example

http://www.youtube.com/watch?v=!Zu4WX£&P4ek

does not match, because the + sign requires that at least one character must match [a-zA-Z0-9-].

edited Sep 17, 2010 at 21:06

answered Sep 17, 2010 at 17:56

lonesomeday

239k54 gold badges330 silver badges329 bronze badges

2 Comments

Will Over a year ago

I'm pretty sure the v= part varies, that's why I didn't use that before... and using [a-zA-Z0-9-]$ didn't work either. It's just returning false for everything.

Jim Over a year ago

Thats because it should have been: [a-zA-Z0-9-]+$ just a typo.

methai · Accepted Answer · 2010-09-17 18:32:19Z

Short answer:

preg_match('%(http://www.youtube.com/watch\?v=(?:[a-zA-Z0-9-])+)(?:[&"\'\s])%', $videoLink)

There are a few assumptions made here, so let me explain:

I added a capturing group ( ... ) around the entire http://www.youtube.com/watch?v=blah part of the link, so that we can say "I want get the whole validated link up to and including the ?v=movieHash"
I added the non-capturing group (?: ... ) around your character set [a-zA-Z0-9-] and left the + sign outside of that. This will allow us to match all allowable characters up to a certain point.
Most importantly, you need to tell it how you expect your link to terminate. I'm taking a guess for you with (?:[&"\'\s])

?) Will it be in html format (e.g. anchor tag) ? If so, the link in href will obviously end with a " or '.
?) Or maybe there's more to the query string, so there would be an & after the value of v.
?) Maybe there's a space or line break after the end of the link \s.

The important piece is that you can get much more accurate results if you know what's surrounding what you are searching for, as is the case with many regular expressions.

This non-capturing group (in which I'm making assumptions for you) will take a stab at finding and ignoring all the extra junk after what you care about (the ?v=awesomeMovieHash).

Results:

http://www.youtube.com/watch?v=Zu4WXiPRek
 - Group 1 contains the http://www.youtube.com/watch?v=Zu4WXiPRek

http://www.youtube.com/watch?v=Zu4WX&a=b
 - Group 1 contains http://www.youtube.com/watch?v=Zu4WX

http://www.youtube.com/watch?v=!Zu4WX£&P4ek
 - No match

a href="http://www.youtube.com/watch?v=Zu4WX&size=large"
 - Group 1 contains http://www.youtube.com/watch?v=Zu4WX

http://www.youtube.com/watch?v=Zu4WX£&P!ek
 - No match

Mike Hicks · Accepted Answer · 2013-07-26 23:27:29Z

The "v=..." blob is not guaranteed to be the first parameter in the query part of the URL. I'd recommend using PHP's parse_url() function to break the URL into its component parts. You can also reassemble a pristine URL if someone began the string with "https://" or simply used "youtube.com" instead of "www.youtube.com", etc.

function get_youtube_vidid ($url) {
    $vidid = false;
    $valid_schemes = array ('http', 'https');
    $valid_hosts = array ('www.youtube.com', 'youtube.com');
    $valid_paths = array ('/watch');

    $bits = parse_url ($url);
    if (! is_array ($bits)) {
        return false;
    }
    if (! (array_key_exists ('scheme', $bits)
            and array_key_exists ('host', $bits)
            and array_key_exists ('path', $bits)
            and array_key_exists ('query', $bits))) {
        return false;
    }
    if (! in_array ($bits['scheme'], $valid_schemes)) {
        return false;
    }
    if (! in_array ($bits['host'], $valid_hosts)) {
        return false;
    }
    if (! in_array ($bits['path'], $valid_paths)) {
        return false;
    }
    $querypairs = explode ('&', $bits['query']);
    if (count ($querypairs) < 1) {
        return false;
    }
    foreach ($querypairs as $querypair) {
        list ($key, $value) = explode ('=', $querypair);
        if ($key == 'v') {
            if (preg_match ('/^[a-zA-Z0-9\-_]+$/', $value)) {
                # Set the return value
                $vidid = $value;
            }
        }
    }

    return $vidid;
}

Mohammad Anini · Accepted Answer · 2014-06-06 04:23:40Z

0

Following regex will match any youtube link:

$pattern='@(((http(s)?://(www\.)?)|(www\.)|\s)(youtu\.be|youtube\.com)/(embed/|v/|watch(\?v=|\?.+&v=|/))?([a-zA-Z0-9._\/~#&=;%+?-\!]+))@si';

answered Jun 6, 2014 at 4:23

Mohammad Anini

5,2904 gold badges39 silver badges47 bronze badges

2 Comments

Benjam Over a year ago

It doesn't work on youtube-nocookie.com URLs, nor does it work on URLs with a query string like ?v=0123456789a&q=18#t=12s.

Benjam Over a year ago

Also, your character class has an inverted class range ?-\. Which means it won't work with many regex flavors, including PHP preg.

Kaligula · Accepted Answer · 2024-03-17 00:35:38Z

If you'd like to cover all YouTube URL variants try this:

^(?:(?:https?:)?\/\/)?(?:(?:(?:www|m(?:usic)?)\.)?youtu(?:\.be|be\.com)\/(?:shorts\/|live\/|v\/|e(?:mbed)?\/|watch(?:\/|\?(?:\S+=\S+&)*v=)|oembed\?url=https?%3A\/\/(?:www|m(?:usic)?)\.youtube\.com\/watch\?(?:\S+=\S+&)*v%3D|attribution_link\?(?:\S+=\S+&)*u=(?:\/|%2F)watch(?:\?|%3F)v(?:=|%3D))?|www\.youtube-nocookie\.com\/embed\/)([\w-]{11})[\?&#]?\S*$

It's a RegExp from a related question for any known YouTube URL (also music.*, shorts/, live/, e/ embed/, v/, *-nocookie etc.). Doesn't catch these:

  (wrong ID)
youtube.com/watch?v=U$t-slLl30E
  (too short ID)
youtube.com/watch?v=U9t-slLl30&t=10
  (wrong or deprecated paths)
youtube.com/GitHub?v=U9t-slLl30E
youtube.com/?v=U9t-slLl30E
youtube.com/?vi=U9t-slLl30E
youtube.com/?feature=player_embedded&v=U9t-slLl30E
youtube.com/watch?vi=U9t-slLl30E
youtube.com/vi/U9t-slLl30E
  (www.youtube-nocookie.com/embed/ only!)
youtube-nocookie.com/embed/U9t-slLl30E
www.youtube-nocookie.com/watch?v=U9t-slLl30E
http://www.youtube-nocookie.com/v/U9t-slLl30E?version=3&hl=en_US&rel=0
  (playlist)
youtube.com/playlist?list=PLmXxqSJJq-yVWpRFGImHYZBQTuBGLjG4t

You can try it here: https://regex101.com/r/7upRfP/. Also catches video ID.

If you want you can restrict the video ID further with Glenn's answer instead of ([\w-]{11}).

Collectives™ on Stack Overflow

Regex to match Youtube URL's

6 Answers 6

3 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

2 Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related