C/C++ Url Decode Library

Is there a common Java library that will handle URL encoding/decoding for a collection of strings?

The JDK URLDecoder wasn't implemented efficiently. Most notably, internally it relies on StringBuffer (which unnecessarily introduces synchronization in the case of URLDecoder). The Apache commons provides URLCodec, but it has also been reported to have similar issues in regards to performance but I haven't verified that's still the case in most recent version.

Mark A. Ziesemer wrote a post a while back regarding the issues and performance with URLDecoder. He logged some bug reports and ended up writing a complete replacement. Because this is SO, I'll quote some key excerpts here, but you should really read the entire source article here: http://blogger.ziesemer.com/2009/05/improving-url-coder-performance-java.html

Selected quotes:

Java provides a default implementation of this functionality in
java.net.URLEncoder and java.net.URLDecoder. Unfortunately, it is not
the best performing, due to both how the API was written as well as
details within the implementation. A number of performance-related
bugs have been filed on sun.com in relation to URLEncoder.

There is an alternative: org.apache.commons.codec.net.URLCodec from
Apache Commons Codec. (Commons Codec also provides a useful
implementation for Base64 encoding.) Unfortunately, Commons' URLCodec
suffers some of the same issues as Java's URLEncoder/URLDecoder.

...

Recommendations for both the JDK and Commons:

When constructing any of the "buffer" classes, e.g.
ByteArrayOutputStream, CharArrayWriter, StringBuilder, or
StringBuffer, estimate and pass-in an estimated capacity. The JDK's
URLEncoder currently does this for its StringBuffer, but should do
this for its CharArrayWriter instance as well. Common's URLCodec
should do this for its ByteArrayOutputStream instance. If the classes'
default buffer sizes are too small, they may have to resize by copying
into new, larger buffers - which isn't exactly a "cheap" operation. If
the classes' default buffer sizes are too large, memory may be
unnecessarily wasted.

Both implementations are dependent on Charsets, but only accept them
as their String name. Charset provides a simple and small cache for
name lookups - storing only the last 2 Charsets used. This should not
be relied upon, and both should accept Charset instances for other
interoperability reasons as well.

Both implementations only handle fixed-size inputs and outputs. The
JDK's URLEncoder only works with String instances. Commons' URLCodec
is also based on Strings, but also works with byte[] arrays. This is a
design-level constraint that essentially prevents efficient processing
of larger or variable-length inputs. Instead, the "stream-supporting"
interfaces such as CharSequence, Appendable, and java.nio's Buffer
implementations of ByteBuffer and CharBuffer should be supported.

...

Note that com.ziesemer.utils.urlCodec is over 3x as fast as the JDK
URLEncoder, and over 1.5x as fast as the JDK URLDecoder. (The JDK's
URLDecoder was faster than the URLEncoder, so there wasn't as much
room for improvement.)

I think your colleague is wrong to suggest URLDecode is not thread-safe. Other answers here explain in detail.

EDIT [2012-07-03] - Per later comment posted by OP

Not sure if you were looking for more ideas or not? You are correct that if you intend to operate on the list as an atomic collection, then you would have to synchronize all access to the list, including references outside of your method. However, if you are okay with the returned list contents potentially differing from the original list, then a brute force approach for operating on a "batch" of strings from a collection that might be modified by other threads could look something like this:

/**
* @param origList will be copied by this method so that origList can continue
* to be read/write by other threads.
* @return list containing decoded strings for each entry that was
in origList at time of copy.
*/
public List<String> decodeListOfStringSafely(List<String> origList)
throws UnsupportedEncodingException {
List<String> snapshotList = new ArrayList<String>(origList);
List<String> newList = new ArrayList<String>();

for (String urlStr : snapshotList) {
String decodedUrlStr = URLDecoder.decode(urlStr, "UTF8");
newList.add(decodedUrlStr);
}

return newList;
}

If that does not help, then I'm still not sure what you are after and you would be better served to create a new, more concise, question. If that is what you were asking about, then be careful because this example out of context is not a good idea for many reasons.

Encode/Decode URl In C++

You can check out this article and this

Encode:

std::string UriEncode(const std::string & sSrc)
{
const char DEC2HEX[16 + 1] = "0123456789ABCDEF";
const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
const int SRC_LEN = sSrc.length();
unsigned char * const pStart = new unsigned char[SRC_LEN * 3];
unsigned char * pEnd = pStart;
const unsigned char * const SRC_END = pSrc + SRC_LEN;

for (; pSrc < SRC_END; ++pSrc)
{
if (SAFE[*pSrc])
*pEnd++ = *pSrc;
else
{
// escape this char
*pEnd++ = '%';
*pEnd++ = DEC2HEX[*pSrc >> 4];
*pEnd++ = DEC2HEX[*pSrc & 0x0F];
}
}

std::string sResult((char *)pStart, (char *)pEnd);
delete [] pStart;
return sResult;
}

Decode:

std::string UriDecode(const std::string & sSrc)
{
// Note from RFC1630: "Sequences which start with a percent
// sign but are not followed by two hexadecimal characters
// (0-9, A-F) are reserved for future extension"

const unsigned char * pSrc = (const unsigned char *)sSrc.c_str();
const int SRC_LEN = sSrc.length();
const unsigned char * const SRC_END = pSrc + SRC_LEN;
// last decodable '%'
const unsigned char * const SRC_LAST_DEC = SRC_END - 2;

char * const pStart = new char[SRC_LEN];
char * pEnd = pStart;

while (pSrc < SRC_LAST_DEC)
{
if (*pSrc == '%')
{
char dec1, dec2;
if (-1 != (dec1 = HEX2DEC[*(pSrc + 1)])
&& -1 != (dec2 = HEX2DEC[*(pSrc + 2)]))
{
*pEnd++ = (dec1 << 4) + dec2;
pSrc += 3;
continue;
}
}

*pEnd++ = *pSrc++;
}

// the last 2- chars
while (pSrc < SRC_END)
*pEnd++ = *pSrc++;

std::string sResult(pStart, pEnd);
delete [] pStart;
return sResult;
}

How to decode an URI with UTF-8 characters in C++

There is nothing wrong with your decoding. The printing of the decoded URL is the problem. The output device that you print to is configured to accept strings encoded in ISO-8859-1, not in UTF-8.

Either configure the output device to accept strings encoded in UTF-8 or convert the decoded URL from UTF-8 to ISO-8859-1.

How to do URL decoding in Java?

This does not have anything to do with character encodings such as UTF-8 or ASCII. The string you have there is URL encoded. This kind of encoding is something entirely different than character encoding.

Try something like this:

try {
String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
// not going to happen - value came from JDK's own StandardCharsets
}

Java 10 added direct support for Charset to the API, meaning there's no need to catch UnsupportedEncodingException:

String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8);

Note that a character encoding (such as UTF-8 or ASCII) is what determines the mapping of characters to raw bytes. For a good intro to character encodings, see this article.

How to URL Decode in iOS - Objective C

NSString *path = [[@"path+with+spaces"
stringByReplacingOccurrencesOfString:@"+" withString:@" "]
stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

Note that the plus-for-space substitution is only used in application/x-www-form-urlencoded data - the query string part of a URL, or the body of a POST request.

Java library for URL encoding if necessary (like a browser)

What every web developer must know about URL encoding

Url Encoding Explained

Why do I need URL encoding?

The URL specification RFC 1738 specifies that only a small set of characters 
can be used in a URL. Those characters are:

A to Z (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
a to z (abcdefghijklmnopqrstuvwxyz)
0 to 9 (0123456789)
$ (Dollar Sign)
- (Hyphen / Dash)
_ (Underscore)
. (Period)
+ (Plus sign)
! (Exclamation / Bang)
* (Asterisk / Star)
' (Single Quote)
( (Open Bracket)
) (Closing Bracket)

How does URL encoding work?

All offending characters are replaced by a % and a two digit hexadecimal value 
that represents the character in the proper ISO character set. Here are a
couple of examples:

$ (Dollar Sign) becomes %24
& (Ampersand) becomes %26
+ (Plus) becomes %2B
, (Comma) becomes %2C
: (Colon) becomes %3A
; (Semi-Colon) becomes %3B
= (Equals) becomes %3D
? (Question Mark) becomes %3F
@ (Commercial A / At) becomes %40

Simple Example:

import java.util.logging.Level;
import java.util.logging.Logger;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;

public class TextHelper {
private static ScriptEngine engine = new ScriptEngineManager()
.getEngineByName("JavaScript");

/**
* Encoding if need escaping %$&+,/:;=?@<>#%
*
* @param str should be encoded
* @return encoded Result
*/
public static String escapeJavascript(String str) {
try {
return engine.eval(String.format("escape(\"%s\")",
str.replaceAll("%20", " "))).toString()
.replaceAll("%3A", ":")
.replaceAll("%2F", "/")
.replaceAll("%3B", ";")
.replaceAll("%40", "@")
.replaceAll("%3C", "<")
.replaceAll("%3E", ">")
.replaceAll("%3D", "=")
.replaceAll("%26", "&")
.replaceAll("%25", "%")
.replaceAll("%24", "$")
.replaceAll("%23", "#")
.replaceAll("%2B", "+")
.replaceAll("%2C", ",")
.replaceAll("%3F", "?");
} catch (ScriptException ex) {
Logger.getLogger(TextHelper.class.getName())
.log(Level.SEVERE, null, ex);
return null;
}
}

GWT: library for encoding/decoding arbitrary data in URL fragments

You may want to investigate gwt-platform, it includes features for reading/modifying parameters in the fragment, as well as a ton of other great MVP features, like EventBus, Presenters, even easier async loading of JS, etc. It looks pretty awesome.

Specifically, check out the "Using URL parameters" section of this guide.



Related Topics



Leave a reply



Submit