Why Can't I Use \U000D and \U000A as Cr and Lf in Java

Why can't I use \u000D and \u000A as CR and LF in Java?

Unicode escapes are pre-processed before the compiler is run. Therefore, if you put \u000A in a String literal like this:

String someString = "foo\u000Abar";

It will be compiled exactly as if you wrote:

String someString = "foo
bar";

Stick to \r (carriage return; 0x0D) and \n (line feed; 0x0A)

Bonus: You can always have fun with this, especially given the limitations on most syntax highlighters. Next time you've got a sec, try running this code:

public class FalseIsTrue {
public static void main(String[] args) {
if ( false == true ) { //these characters are magic: \u000a\u007d\u007b
System.out.println("false is true!");
}
}
}

A commented statement with a unicode new line character

A little-known feature of the Java language is that Unicode escape sequences are processed anywhere in source code, before any other parsing.

That's a real newline.

You can even write an entire Java program out of nothing but escape codes.

Why is executing Java code in comments with certain Unicode characters allowed?

Unicode decoding takes place before any other lexical translation. The key benefit of this is that it makes it trivial to go back and forth between ASCII and any other encoding. You don't even need to figure out where comments begin and end!

As stated in JLS Section 3.3 this allows any ASCII based tool to process the source files:

[...] The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. [...]

This gives a fundamental guarantee for platform independence (independence of supported character sets) which has always been a key goal for the Java platform.

Being able to write any Unicode character anywhere in the file is a neat feature, and especially important in comments, when documenting code in non-latin languages. The fact that it can interfere with the semantics in such subtle ways is just an (unfortunate) side-effect.

There are many gotchas on this theme and Java Puzzlers by Joshua Bloch and Neal Gafter included the following variant:

Is this a legal Java program? If so, what does it print?

\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0020\u0020\u0020
\u0063\u006c\u0061\u0073\u0073\u0020\u0055\u0067\u006c\u0079
\u007b\u0070\u0075\u0062\u006c\u0069\u0063\u0020\u0020\u0020
\u0020\u0020\u0020\u0020\u0073\u0074\u0061\u0074\u0069\u0063
\u0076\u006f\u0069\u0064\u0020\u006d\u0061\u0069\u006e\u0028
\u0053\u0074\u0072\u0069\u006e\u0067\u005b\u005d\u0020\u0020
\u0020\u0020\u0020\u0020\u0061\u0072\u0067\u0073\u0029\u007b
\u0053\u0079\u0073\u0074\u0065\u006d\u002e\u006f\u0075\u0074
\u002e\u0070\u0072\u0069\u006e\u0074\u006c\u006e\u0028\u0020
\u0022\u0048\u0065\u006c\u006c\u006f\u0020\u0077\u0022\u002b
\u0022\u006f\u0072\u006c\u0064\u0022\u0029\u003b\u007d\u007d

(This program turns out to be a plain "Hello World" program.)

In the solution to the puzzler, they point out the following:

More seriously, this puzzle serves to reinforce the lessons of the previous three: Unicode escapes are essential when you need to insert characters that can’t be represented in any other way into your program. Avoid them in all other cases.


Source: Java: Executing code in comments?!

Java Unicode translation

The specification states that a Java compiler must convert Unicode escapes to their corresponding characters before doing anything else, to allow for things like non-ASCII characters in identifiers to be protected (via native2ascii) when the code is stored or sent over a channel that is not 8-bit clean.

This rule applies globally, in particular you can even escape comment markers using Unicode escapes. For example the following two snippets are identical:

// Deal with opening and closing comment characters /*, etc.
myRisquéParser.handle("/*", "*/");

\u002F\u002F Deal with opening and closing comment characters /*, etc.
myRisqu\u00E9Parser.handle("/*", "*/");

If the compiler were to try and remove comments before handling Unicode escapes it would end up stripping everything from the /*, etc. to the handle("/*", "*/, leaving

\u002F\u002F Deal with opening and closing comment characters ");

which would then be unescaped to one single line comment, and then removed at the next stage of parsing. Thus generating no compiler error or warning but silently dropping a whole line of code...

How do I put a \n into a property file's value?

Simply use \n for this purpose

You are escaping the escape character using \\ hence you are getting \n printed

Split Java String by New Line

This should cover you:

String lines[] = string.split("\\r?\\n");

There's only really two newlines (UNIX and Windows) that you need to worry about.

Getting error on commented line

Unicode characters are parsed very early in the Java compilation, anyway \u000d isn't a valid character.

// The other style comments work.
/*('\u000d'); */

Edit

\u000d is converted into a newline, which ends your comment...

//('\u000d');

gets converted into

    //('
') // <-- bare line with ')

which isn't a valid character constant.



Related Topics



Leave a reply



Submit