Extract Url from a String

Extract URL from string

John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace() as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

If you only wanted to match HTTP/HTTPS:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

How do you extract a url from a string using python?

There may be few ways to do this but the cleanest would be to use regex

>>> myString = "This is a link http://www.google.com"
>>> print re.search("(?P<url>https?://[^\s]+)", myString).group("url")
http://www.google.com

If there can be multiple links you can use something similar to below

>>> myString = "These are the links http://www.google.com  and http://stackoverflow.com/questions/839994/extracting-a-url-in-python"
>>> print re.findall(r'(https?://[^\s]+)', myString)
['http://www.google.com', 'http://stackoverflow.com/questions/839994/extracting-a-url-in-python']
>>>

Detect and extract url from a string?

m.group(1) gives you the first matching group, that is to say the first capturing parenthesis. Here it's (https?|ftp|file)

You should try to see if there is something in m.group(0), or surround all your pattern with parenthesis and use m.group(1) again.

You need to repeat your find function to match the next one and use the new group array.

Extracting for URL from string using regex

Your regex is incorrect.

Correct regex for extracting URl : /(https?:\/\/[^ ]*)/

Check out this fiddle.

Here is the snippet.

var urlRegex = /(https?:\/\/[^ ]*)/;
var input = "https://medium.com/aspen-ideas/there-s-no-blueprint-26f6a2fbb99c random stuff sd";var url = input.match(urlRegex)[1];alert(url);

Extracting a URL in Python

In response to the OP's edit I hijacked Find Hyperlinks in Text using Python (twitter related) and came up with this:

import re

myString = "This is my tweet check it out http://example.com/blah"

print(re.search("(?P<url>https?://[^\s]+)", myString).group("url"))

Extract URL's from a string using PHP

REGEX is the answer for your problem. Taking the Answer of Object Manipulator.. all it's missing is to exclude "commas", so you can try this code that excludes them and gives 3 separated URL's as output:

$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";

preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $string, $match);

echo "<pre>";
print_r($match[0]); 
echo "</pre>";

and the output is

Array
(
    [0] => http://google.com
    [1] => https://www.youtube.com/watch?v=K_m7NEDMrV0
    [2] => https://instagram.com/hellow/
)

PHP regex extract url with pattern from string

You can repeat all the allowed characters before and after matching /products/ using the same optional character class. As the character class is quite long, you could shorten the notation by wrapping it in a capture group and recurse the first subpattern as (?1)

Note that you don't have to escape the forward slash using a different separator.

$re = '`\b(?:(?:https?|ftp)://|www\.)([-a-z0-9+&@#/%?=~_|!:,.;]*)/products/(?1)[-a-z0-9+&@#/%=~_|]`';

$str = <<<EOF
  http://example.com/products/1/abc
  This string is valid - http://example.com/products/1
  This string is not valid - http://example.com/order/1
EOF;

preg_match_all($re, $str, $matches);
print_r($matches[0]);

Output

Array
(
    [0] => http://example.com/products/1/abc
    [1] => http://example.com/products/1
)

regex for extracting all urls from string

This should get you started:

\b(?:https?://)?(?:(?i:[a-z]+\.)+)[^\s,]+\b

Broken down, this says:

\b                   # a word boundary
(?:https?://)?       # http:// or https://, optional
(?:(?i:[a-z]+\.)+)   # any subdomain before
[^\s,]+              # neither whitespace nor comma
\b                   # another word boundary

See a demo on regex101.com.