Best Way to Encode Text Data for Xml in Java

Best way to encode text data for XML in Java?

Very simply: use an XML library. That way it will actually be right instead of requiring detailed knowledge of bits of the XML spec.

Best way to encode text data for XML

System.XML handles the encoding for you, so you don't need a method like this.

What's a good way of encoding arbitrary text into XML in a human-readable way?

xml already supports this, you do not need to do anything special and you certainly do not need to use CDATA. just use a decent library, make sure you are using UTF-8 encoding, and add a text node. if something is "losing" newlines then it's a bug. xml already has an "encoding" (escaping) that is relatively human readable. it's also standard which makes it much more useful than inventing your own.

see, for example https://stackoverflow.com/a/1140802/181772

Escaping special character when generating an XML in Java

You can use apache common lang library to escape a string.

org.apache.commons.lang.StringEscapeUtils

String escapedXml = StringEscapeUtils.escapeXml("the data might contain & or ! or % or ' or # etc");

But what you are looking for is a way to convert any string into a valid XML tag name. For ASCII characters, XML tag name must begin with one of _:a-zA-Z and followed by any number of character in _:a-zA-Z0-9.-

I surely believe there is no library to do this for you so you have to implement your own function to convert from any string to match this pattern or alternatively make it into a value of attritbue.

<property name="no more need to be encoded, it should be handled by XML library">0.0</property>

What is the best way to create XML files in Java?

If you just want to write an XML document having exact control over the creating of elements, attributes and other document components, you may use the XMLStreamWriter from the StAX API.

How to retrieve the encoding of an XML file to parse it correctly? (Best Practice)

If you trust the creator of the XML to have set the encoding correctly in the XML declaration, you can sniff it as you're doing. However, be aware that it can be wrong; it can disagree with the actual encoding.

If you want to detect the encoding directly, independently of the (potentially wrong) XML declaration encoding setting, use a library such as ICU CharsetDetector or the older jChardet.

ICU CharsetDetector:

CharsetDetector detector;
CharsetMatch match;
byte[] byteData = ...;

detector = new CharsetDetector();

detector.setText(byteData);
match = detector.detect();

jChardet:

    // Initalize the nsDetector() ;
int lang = (argv.length == 2)? Integer.parseInt(argv[1])
: nsPSMDetector.ALL ;
nsDetector det = new nsDetector(lang) ;

// Set an observer...
// The Notify() will be called when a matching charset is found.

det.Init(new nsICharsetDetectionObserver() {
public void Notify(String charset) {
HtmlCharsetDetector.found = true ;
System.out.println("CHARSET = " + charset);
}
});

URL url = new URL(argv[0]);
BufferedInputStream imp = new BufferedInputStream(url.openStream());

byte[] buf = new byte[1024] ;
int len;
boolean done = false ;
boolean isAscii = true ;

while( (len=imp.read(buf,0,buf.length)) != -1) {

// Check if the stream is only ascii.
if (isAscii)
isAscii = det.isAscii(buf,len);

// DoIt if non-ascii and not done yet.
if (!isAscii && !done)
done = det.DoIt(buf,len, false);
}
det.DataEnd();

if (isAscii) {
System.out.println("CHARSET = ASCII");
found = true ;
}

Encode/encrypt XML in HTML value in of a hidden field.

I guess the default way is base64. It is not really encrypted but also not simply readable. But anybody who knows base64 can decode it.
In Java 8 it would be as simple as:

String base64encodedString = Base64.getEncoder().encodeToString("blabla".getBytes("utf-8"));

And

String base64decodedString = new 
String(Base64.getDecoder().decode("dGVzdA=="), StandardCharsets.UTF_8);

Then you don't even need to do the " stuff.
If you want to do real encryption because it's really secret stuff you have to add the exncrption in between getting this bytes of the string and creating the base64 string. Either way you end up with base64 because it's imho the simplest way to convert binary to string.



Related Topics



Leave a reply



Submit