Url Constructor Doesn't Work with Some Characters

URL constructor doesn't work with some characters

The URL(string:) initializer doesn't take care of encoding the String to be a valid URL String, it assumes that the String is already encoded to only contain characters that are valid in a URL. Hence, you have to do the encoding if your String contains non-valid URL characters. You can achieve this by calling String.addingPercentEncoding(withAllowedCharacters:).

let unencodedUrlString = "áűáeqw"
guard let encodedUrlString = unencodedUrlString.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed), let url = URL(string: encodedUrlString) else { return }

You can change the CharacterSet depending on what part of your URL contains the characters that need encoding, I just used urlQueryAllowed for presentation purposes.

Not use = in java.net.URL

You have a problem with a special character appearing in your url. You need to replace > with %3E.

I think that you might still have some problems with that url, since you are putting a = into the parameter value. If your parser can still detect this as a value it will be fine, if not you will probably have to restructure your parameters.

How to encode URL to avoid special characters in Java?

URL construction is tricky because different parts of the URL have different rules for what characters are allowed: for example, the plus sign is reserved in the query component of a URL because it represents a space, but in the path component of the URL, a plus sign has no special meaning and spaces are encoded as "%20".

RFC 2396 explains (in section 2.4.2) that a complete URL is always in its encoded form: you take the strings for the individual components (scheme, authority, path, etc.), encode each according to its own rules, and then combine them into the complete URL string. Trying to build a complete unencoded URL string and then encode it separately leads to subtle bugs, like spaces in the path being incorrectly changed to plus signs (which an RFC-compliant server will interpret as real plus signs, not encoded spaces).

In Java, the correct way to build a URL is with the URI class. Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server. To decode a URL, construct a URI object using the single-string constructor and then use the accessor methods (such as getPath()) to retrieve the decoded components.

Don't use the URLEncoder class! Despite the name, that class actually does HTML form encoding, not URL encoding. It's not correct to concatenate unencoded strings to make an "unencoded" URL and then pass it through a URLEncoder. Doing so will result in problems (particularly the aforementioned one regarding spaces and plus signs in the path).

java.net.URI chokes on special characters in host part

Java 6 has IDN class to work with internationalized domain names. So, the following produces URI with encoded hostname:

URI u = new URI("http://" + IDN.toASCII("www.christlicheparteiösterreichs.at") + "/steiermark/");

How to find if url containing special characters or space exists in java

URLs with spaces in them are invalid. The correct way to create a properly encoded URL from a filename that may contain spaces is URI.toASCIIString(), and then passing that to new URL(), making sure to use a URI constructor that takes multiple arguments so the filename part gets encoded: see the Javadoc.

However I question the requirement. The best way to test whether any resource is available is to try to use it. In this case presumably you are going to read from the URL if it exists, so just do that and catch the FileNotFoundException.



Related Topics



Leave a reply



Submit