How can I detect if a string contains a URL in java?
Please see: http://download.oracle.com/javase/6/docs/api/java/net/URL.html
import java.net.URL;
import java.net.MalformedURLException;
// Replaces URLs with html hrefs codes
public class URLInString {
public static void main(String[] args) {
String s = args[0];
// separete input by spaces ( URLs don't have spaces )
String [] parts = s.split("\\s");
// Attempt to convert each item into an URL.
for( String item : parts ) try {
URL url = new URL(item);
// If possible then replace with anchor...
System.out.print("<a href=\"" + url + "\">"+ url + "</a> " );
} catch (MalformedURLException e) {
// If there was an URL that was not it!...
System.out.print( item + " " );
}
System.out.println();
}
}
Obtained from, How to detect the presence of URL in a string
Detect URLs in text with JavaScript
First you need a good regex that matches urls. This is hard to do. See here, here and here:
...almost anything is a valid URL. There
are some punctuation rules for
splitting it up. Absent any
punctuation, you still have a valid
URL.Check the RFC carefully and see if you
can construct an "invalid" URL. The
rules are very flexible.For example
:::::
is a valid URL.
The path is":::::"
. A pretty
stupid filename, but a valid filename.Also,
/////
is a valid URL. The
netloc ("hostname") is""
. The path
is"///"
. Again, stupid. Also
valid. This URL normalizes to"///"
which is the equivalent.Something like
"bad://///worse/////"
is perfectly valid. Dumb but valid.
Anyway, this answer is not meant to give you the best regex but rather a proof of how to do the string wrapping inside the text, with JavaScript.
OK so lets just use this one: /(https?:\/\/[^\s]+)/g
Again, this is a bad regex. It will have many false positives. However it's good enough for this example.
function urlify(text) { var urlRegex = /(https?:\/\/[^\s]+)/g; return text.replace(urlRegex, function(url) { return '<a href="' + url + '">' + url + '</a>'; }) // or alternatively // return text.replace(urlRegex, '<a href="$1">$1</a>')}
var text = 'Find me at http://www.example.com and also at http://stackoverflow.com';var html = urlify(text);
console.log(html)
Detecting a (naughty or nice) URL or link in a text string
I'm concentrating my answer on trying to avoid spammers. This leads to two sub-assumptions: the people using the system will therefore be actively trying to contravene your check and your goal is only to detect the presence of a URL, not to extract the complete URL. This solution would look different if your goal is something else.
I think your best bet is going to be with the TLD. There are the two-letter ccTLDs and the (currently) comparitively small list of others. These need to be prefixed by a dot and suffixed by either a slash or some word boundary. As others have noted, this isn't going to be perfect. There's no way to get "buyfunkypharmaceuticals . it" without disallowing the legitimate "I tried again. it doesn't work" or similar. All of that said, this would be my suggestion:
[^\b]\.([a-zA-Z]{2}|aero|asia|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|tel|travel)[\b/]
Things this will get:
- buyfunkypharmaceuticals.it
- google.com
- http://stackoverflo**w.com/**questions/700163/
It will of course break as soon as people start obfuscating their URLs, replacing "." with " dot ". But, again assuming spammers are your goal here, if they start doing that sort of thing, their click-through rates are going to drop another couple of orders of magnitude toward zero. The set of people informed enough to deobfuscate a URL and the set of people uninformed enough to visit spam sites have, I think, a miniscule intersection. This solution should let you detect all URLs that are copy-and-pasteable to the address bar, whilst keeping collateral damage to a bare minimum.
Related Topics
Spring Data JPA - How to Combine Multiple and and or Through Method Name
Does Polymorphism Apply on Class Attributes in Java
How to Access Owl Documents Using Xpath in Java
Program Freezes During Thread.Sleep() and with Timer
How to Convert a Java 8 Stream to an Array
Java Synchronized Method Lock on Object, or Method
How to Set Java_Home Environment Variable on MAC Os X 10.9
I Get Exception When Using Thread.Sleep(X) or Wait()
Map Enum in JPA with Fixed Values
Ignore Duplicates When Producing Map Using Streams
Closing Jdbc Connections in Pool
Connect 4 Check for a Win Algorithm
Make Jackson Interpret Single JSON Object as Array with One Element
What's the Difference Between Getpath(), Getabsolutepath(), and Getcanonicalpath() in Java
Specifying Java Version in Maven - Differences Between Properties and Compiler Plugin