How to Find All Youtube Video Ids in a String Using a Regex

How do I find all YouTube video ids in a string using a regex?

A YouTube video URL may be encountered in a variety of formats:

latest short format: http://youtu.be/NLqAF9hrVbY
iframe: http://www.youtube.com/embed/NLqAF9hrVbY
iframe (secure): https://www.youtube.com/embed/NLqAF9hrVbY
object param: http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
object embed: http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
watch: http://www.youtube.com/watch?v=NLqAF9hrVbY
users: http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
ytscreeningroom: http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
any/thing/goes!: http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
any/subdomain/too: http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
more params: http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
query may have dot: http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
nocookie domain: http://www.youtube-nocookie.com

Here is a PHP function with a commented regex that matches each of these URL forms and converts them to links (if they are not links already):

// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs($text) {
    $text = preg_replace('~(?#!js YouTubeId Rev:20160125_1800)
        # Match non-linked youtube URL in the wild. (Rev:20130823)
        https?://          # Required scheme. Either http or https.
        (?:[0-9A-Z-]+\.)?  # Optional subdomain.
        (?:                # Group host alternatives.
          youtu\.be/       # Either youtu.be,
        | youtube          # or youtube.com or
          (?:-nocookie)?   # youtube-nocookie.com
          \.com            # followed by
          \S*?             # Allow anything up to VIDEO_ID,
          [^\w\s-]         # but char before ID is non-ID char.
        )                  # End host alternatives.
        ([\w-]{11})        # $1: VIDEO_ID is exactly 11 chars.
        (?=[^\w-]|$)       # Assert next char is non-ID or EOS.
        (?!                # Assert URL is not pre-linked.
          [?=&+%\w.-]*     # Allow URL (query) remainder.
          (?:              # Group pre-linked alternatives.
            [\'"][^<>]*>   # Either inside a start tag,
          | </a>           # or inside <a> element text contents.
          )                # End recognized pre-linked alts.
        )                  # End negative lookahead assertion.
        [?=&+%\w.-]*       # Consume any URL (query) remainder.
        ~ix', '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>',
        $text);
    return $text;
}

; // End $YouTubeId.

And here is a JavaScript version with the exact same regex (with comments removed):

// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs(text) {
    var re = /https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig;
    return text.replace(re,
        '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>');
}

Notes:

The VIDEO_ID portion of the URL is captured in the one and only capture group: $1.
If you know that your text does not contain any pre-linked URLs, you can safely remove the negative lookahead assertion which tests for this condition (The assertion beginning with the comment: "Assert URL is not pre-linked.") This will speed up the regex somewhat.
The replace string can be modified to suit. The one provided above simply creates a link to the generic "http://www.youtube.com/watch?v=VIDEO_ID" style URL and sets the link text to: "YouTube link: VIDEO_ID".

Edit 2011-07-05: Added - hyphen to ID char class

Edit 2011-07-17: Fixed regex to consume any remaining part (e.g. query) of URL following YouTube ID. Added 'i' ignore-case modifier. Renamed function to camelCase. Improved pre-linked lookahead test.

Edit 2011-07-27: Added new "user" and "ytscreeningroom" formats of YouTube URLs.

Edit 2011-08-02: Simplified/generalized to handle new "any/thing/goes" YouTube URLs.

Edit 2011-08-25: Several modifications:

Added a Javascript version of: linkifyYouTubeURLs() function.
Previous version had the scheme (HTTP protocol) part optional and thus would match invalid URLs. Made the scheme part required.
Previous version used the \b word boundary anchor around the VIDEO_ID. However, this will not work if the VIDEO_ID begins or ends with a - dash. Fixed so that it handles this condition.
Changed the VIDEO_ID expression so that it must be exactly 11 characters long.
The previous version failed to exclude pre-linked URLs if they had a query string following the VIDEO_ID. Improved the negative lookahead assertion to fix this.
Added + and % to character class matching query string.
Changed PHP version regex delimiter from: % to a: ~.
Added a "Notes" section with some handy notes.

Edit 2011-10-12: YouTube URL host part may now have any subdomain (not just www.).

Edit 2012-05-01: The consume URL section may now allow for '-'.

Edit 2013-08-23: Added additional format provided by @Mei. (The query part may have a . dot.

Edit 2013-11-30: Added additional format provided by @CRONUS: youtube-nocookie.com.

Edit 2016-01-25: Fixed regex to handle error case provided by CRONUS.

Regular expression for YouTube video Id

Try this:

urls = ['http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel','http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub','http://youtu.be/6dwqZw0j_jY','http://www.youtube.com/watch?v=6dwqZw0j_jY&feature=youtu.be','http://www.youtube.com/watch?v=yZ-K7nCVnBI&playnext_from=TL&videos=osPknwzXEas&feature=sub','http://www.youtube.com/embed/nas1rJpm7wY?rel=0','http://www.youtube.com/watch?v=peFZbP64dsU','http://youtube.com/v/dQw4w9WgXcQ?feature=youtube_gdata_player','http://youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player','http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=youtube_gdata_player','http://youtu.be/afa-5HQHiAs','http://youtu.be/dQw4w9WgXcQ?feature=youtube_gdata_player','//www.youtube-nocookie.com/embed/up_lNV-yoK4?rel=0','http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo','http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I','http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/6dwqZw0j_jY','http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo?rel=0','http://www.youtube.com/watch?v=cKZDdG9FTKY&feature=channel','http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I','http://youtube.com/vi/dQw4w9WgXcQ?feature=youtube_gdata_player','http://youtube.com/?v=dQw4w9WgXcQ&feature=youtube_gdata_player','http://youtube.com/?vi=dQw4w9WgXcQ&feature=youtube_gdata_player','http://youtube.com/watch?vi=dQw4w9WgXcQ&feature=youtube_gdata_player']
_getVideoIdFromUrl = function (value) {  var regEx = "^(?:https?:)?//[^/]*(?:youtube(?:-nocookie)?\.com|youtu\.be).*[=/]([-\\w]{11})(?:\\?|=|&|$)";  var matches = value.match(regEx);  if (matches) {      console.log(value + "\n" + matches[1] + "\n");  }  return false;}
urls.forEach(function(url) {  _getVideoIdFromUrl(url)});

PHP: preg_match_all Youtube video IDs from text

To expand on my comment, you're replacing the result text each time with the original string, $sample_text. This is a simple fix, just initialise $processed_text at the start, and work on that.

function regex($sample_text) {
    $processed_text = $sample_text;
    if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
        print_r($matches);
        foreach ($matches as $match) {
            $add = ' (here)';
            $processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
        }
    }
    return $processed_text;
}
echo regex($sample_test);

Your regex is also not matching to the end of the URL. For the purposes of the sample text you provided, you could match up to anything that isn't whitespace:

'#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s'

However this won't match characters like " or ., but you could add those in as an | in a group. You seem to have a pretty good grasp of regex, so I'll assume you can work this out - if not, comment and I'll update my answer.

For completeness sake, I've included the completed code with my regex:

function regex($sample_text) {
    $processed_text = $sample_text;
    if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', $sample_text, $matches, PREG_SET_ORDER)) {
        print_r($matches);
        foreach ($matches as $match) {
            $add = ' (here)';
            $processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
        }
    }
    return $processed_text;
}
echo regex($sample_test);

Extract youtube video ID from url with R stringr regex

You could use something like the following, but note that it's pretty heavily hard-coded to the examples you provided.

links = c("youtube.com/v/kFF0v0FQzEI", 
          "youtube.com/vi/kFF0v0FQzEI", 
          "youtu.be/kFF0v0FQzEI", 
          "www.youtube.com/v/kFF0v0FQzEI?feature=autoshare&version=3&autohide=1&autoplay=1", 
          "www.youtube.com/watch?v=kFF0v0FQzEI&list=PLuV2ACKGzAMsG-pem75yNYhBvXZcl-mj_&index=1", 
          "youtube.com/watch?v=kFF0v0FQzEI", 
          "http://www.youtube.com/watch?argv=xyz&v=kFF0v0FQzEI")

get_id = function(link) {
  if (stringr::str_detect(link, '/watch\\?')) {
    rgx = '(?<=\\?v=|&v=)[\\w]+'
  } else {
    rgx = '(?<=/)[\\w]+/?(?:$|\\?)'
  }
  stringr::str_extract(link, rgx)
}

ids = unname(sapply(links, get_id))
# [1] "kFF0v0FQzEI"  "kFF0v0FQzEI"  "kFF0v0FQzEI"  "kFF0v0FQzEI?" 
#     "kFF0v0FQzEI"  "kFF0v0FQzEI"  "kFF0v0FQzEI"

Regex to find Youtube Link in string

This would work for you,

\S*\bwww\.youtube\.com\S*

\S* matches zero or more non-space characters.

Code would be,

preg_match('~\S*\bwww\.youtube\.com\S*~', $str, $matches);

DEMO

And i made Some corrections to your original regex.

(?:https?://)?(?:www.)?(?:youtube.com|youtu.be)/(?:watch\?v=)?([^\s]+)