How to Check for a Valid Url in Java

How to check for a valid URL in Java?

Consider using the Apache Commons UrlValidator class

UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://my favorite site!");

There are several properties that you can set to control how this class behaves, by default http, https, and ftp are accepted.

How to verify if a String in Java is a valid URL?

You can try to create a java.net.URL object out of it. If it is not a proper URL, a MalformedURLException will be thrown.

How to validate URL in java using regex?

This works:

Pattern p = Pattern.compile("(@)?(href=')?(HREF=')?(HREF=\")?(href=\")?(http://)?[a-zA-Z_0-9\\-]+(\\.\\w[a-zA-Z_0-9\\-]+)+(/[#&\\n\\-=?\\+\\%/\\.\\w]+)?");  

Matcher m = p.matcher("your url here");

How to check if given URL exists or not in Java and JavaScript

This piece of code tries to connect with specified url , if successfully connected, proceed ahead and prints "URL exists"; otherwise, UnknownHostException is thrown and you can handle the situation in catch block as shown:

import java.net.URL;
import java.net.URLConnection;
import java.net.UnknownHostException;

class URLExists
{
public static void main(String[] args)
{
try {
URL url = new URL("http://www.google.com");
URLConnection urlc = url.openConnection();
urlc.connect();//<--- throws UnknownHostException when unable to connect!!
System.out.println("URL exists");
}
catch(UnknownHostException e)
{
System.out.println("URL either doesn't exist or unable to connect at this moment");
}
catch(Exception e) {e.printStackTrace();}
}
}

URL valid characters. java to validate

Those examples are hostnames. They're not valid URLs in themselves.

Hostnames are made of .-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.

You can match this with a pattern like (assuming case-insensitive):

([a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])(\.[a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])*\.?

However this matches strings like 1.2.3.4 as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.

And then of course there's IDNA. Nowadays, 例え.テスト is a valid IDNA domain name, corresponding to xn--r8jz45g.xn--zckzah. If you want to allow those you'll need some Unicode support.

Summary: it's quite a bit more difficult than you might think. And that's just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn't going to hack it. Use a pre-existing library.

Regular expression to match URLs in Java

Try the following regex string instead. Your test was probably done in a case-sensitive manner. I have added the lowercase alphas as well as a proper string beginning placeholder.

String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

This works too:

String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

Note:

String regex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com>

String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>


Related Topics



Leave a reply



Submit