Detect and Extract Url from a String

Detect and extract url from a string?

m.group(1) gives you the first matching group, that is to say the first capturing parenthesis. Here it's (https?|ftp|file)

You should try to see if there is something in m.group(0), or surround all your pattern with parenthesis and use m.group(1) again.

You need to repeat your find function to match the next one and use the new group array.

How do you extract a url from a string using python?

There may be few ways to do this but the cleanest would be to use regex

>>> myString = "This is a link http://www.google.com"
>>> print re.search("(?P<url>https?://[^\s]+)", myString).group("url")
http://www.google.com

If there can be multiple links you can use something similar to below

>>> myString = "These are the links http://www.google.com  and http://stackoverflow.com/questions/839994/extracting-a-url-in-python"
>>> print re.findall(r'(https?://[^\s]+)', myString)
['http://www.google.com', 'http://stackoverflow.com/questions/839994/extracting-a-url-in-python']
>>>

Extract URL from string

John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace() as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

If you only wanted to match HTTP/HTTPS:

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))

Detect URLs in text with JavaScript

First you need a good regex that matches urls. This is hard to do. See here, here and here:

...almost anything is a valid URL. There
are some punctuation rules for
splitting it up. Absent any
punctuation, you still have a valid
URL.

Check the RFC carefully and see if you
can construct an "invalid" URL. The
rules are very flexible.

For example ::::: is a valid URL.
The path is ":::::". A pretty
stupid filename, but a valid filename.

Also, ///// is a valid URL. The
netloc ("hostname") is "". The path
is "///". Again, stupid. Also
valid. This URL normalizes to "///"
which is the equivalent.

Something like "bad://///worse/////"
is perfectly valid. Dumb but valid.

Anyway, this answer is not meant to give you the best regex but rather a proof of how to do the string wrapping inside the text, with JavaScript.

OK so lets just use this one: /(https?:\/\/[^\s]+)/g

Again, this is a bad regex. It will have many false positives. However it's good enough for this example.

function urlify(text) {  var urlRegex = /(https?:\/\/[^\s]+)/g;  return text.replace(urlRegex, function(url) {    return '<a href="' + url + '">' + url + '</a>';  })  // or alternatively  // return text.replace(urlRegex, '<a href="$1">$1</a>')}
var text = 'Find me at http://www.example.com and also at http://stackoverflow.com';var html = urlify(text);
console.log(html)

How can I extract a URL from a sentence that is in a NSString?

Edit: I'm going to go out on a limb here and say you should probably use NSDataDetector as Dave mentions. Far less prone to error than regular expressions.


Take a look at regular expressions. You can construct a simple one to extract the URL using the NSRegularExpression class, or find one online that you can use. For a tutorial on using the class, see here.


The code you want essentially looks like this (using John Gruber's super URL regex):

NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:@"(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *someString = @"This is a sample of a http://example.com/efg.php?EFAei687e3EsA sentence with a URL within it.";
NSString *match = [someString substringWithRange:[expression rangeOfFirstMatchInString:someString options:NSMatchingCompleted range:NSMakeRange(0, [someString length])]];
NSLog(@"%@", match); // Correctly prints 'http://example.com/efg.php?EFAei687e3EsA'

That will extract the first URL in any string (of course, this does no error checking, so if the string really doesn't contain any URL's it won't work, but take a look at the NSRegularExpression class to see how to get around it.

PHP regex extract url with pattern from string

You can repeat all the allowed characters before and after matching /products/ using the same optional character class. As the character class is quite long, you could shorten the notation by wrapping it in a capture group and recurse the first subpattern as (?1)

Note that you don't have to escape the forward slash using a different separator.

$re = '`\b(?:(?:https?|ftp)://|www\.)([-a-z0-9+&@#/%?=~_|!:,.;]*)/products/(?1)[-a-z0-9+&@#/%=~_|]`';

$str = <<<EOF
http://example.com/products/1/abc
This string is valid - http://example.com/products/1
This string is not valid - http://example.com/order/1
EOF;

preg_match_all($re, $str, $matches);
print_r($matches[0]);

Output

Array
(
[0] => http://example.com/products/1/abc
[1] => http://example.com/products/1
)

Flutter Dart: RegEx to extract URLs from a String

Getting just the https? and ftp url's that are in quotes is this :

r"([\"'])\s*((?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-zA-Z0-9\u00a1-\uffff]+-?)*[a-zA-Z0-9\u00a1-\uffff]+)(?:\.(?:[a-zA-Z0-9\u00a1-\uffff]+-?)*[a-zA-Z0-9\u00a1-\uffff]+)*(?:\.(?:[a-zA-Z\u00a1-\uffff]{2,})))|localhost)(?::\d{2,5})?(?:\/(?:(?!\1|\s)[\S\s])*)?)\s*\1"

Where the Url is captured in group 2.

https://regex101.com/r/UPmLBl/1



Related Topics



Leave a reply



Submit