Jdk Dateformatter Parsing Dayofweek in German Locale, Java8 VS Java9

JDK dateformatter parsing DayOfWeek in German locale, java8 vs java9

This seems to be there in java-9 due to the current implementation of CLDR date-time-patterns with the implementation of JEP - 252 which states that

Use locale data from the Unicode Consortium's Common Locale Data
Repository (CLDR) by default.

Localized patterns for the formatting and translation of display
strings, such as the locale name, may be different in some locales.

To enable behavior compatible with JDK 8, set the system
property java.locale.providers to a value with COMPAT ahead of CLDR.


And to second the data part of it, the international components for Unicode in German locale which has the following relevant information can justify that the behavior is intentional -

Sample Image

Edit/Note: As linked in the comments, the migration guide states a similar warning for such implementations -

If your application starts successfully, look carefully at your tests
and ensure that the behavior is the same as on JDK 8. For example, a
few early adopters have noticed that their dates and currencies are
formatted differently. See Use CLDR Locale Data by Default.

SimpleDateFormat .format() gives different results in Java8 vs. Java11

TL;DR

Run your Java 9 and later (including Java 11) with the system property java.locale.providers defined like this:

java -Djava.locale.providers=COMPAT,CLDR YourApp

Now output is without the dots, in the same format as on Java 8, for example:

Tue, 16 Jul 2019 14:24:15 AEST

CLDR

Java gets its locale data, including abbreviations used for days of the week and for months in different languages, from up to four sources. Up to Java 8 Java’s own locale data were the default. From Java 8 locale data from Unicode Common Locale Data Repository (CLDR; see links at the bottom) are included too, and from Java 9 they are the default. Java’s own data are still included and accessible by specifying COMPAT in the above system property. We need to put it first in the string as the sources are tried in turn.

One might have expected that another (and perhaps even a nicer) solution would be to use CLDR in all Java versions. Curiously this doesn’t give us the same format on all Java versions in this case. Here is the output when setting the property to CLDR,JRE (JRE is the old name for COMPAT, on Java 8 we need to use this instead).

On Java 8:

Tue, 16 Jul 2019 14:35:02 AEST

On Java 9 and 11:

Tue., 16 Jul. 2019 14:35:52 AEST

CLDR comes in versions, and not the same version is included with the different Java versions.

java.time

Here’s the snippet I have used for the above outputs.

    DateTimeFormatter formatter = DateTimeFormatter.ofPattern(
"EEE, dd MMM yyyy HH:mm:ss zzz", Locale.forLanguageTag("en-AU"));
ZonedDateTime now = ZonedDateTime.now(ZoneId.of("Australia/Sydney"));
System.out.println(now.format(formatter));

I am using and recommending java.time, the modern Java date and time API. The date-time classes that you used, SimpleDateFormat and Date, are long outdated and were always poorly designed, so I recommend avoiding them. On Java 8 and later there’s certainly no reason why we should use them, and java.time has been backported to Java 6 and 7 too.

Links

  • CLDR - Unicode Common Locale Data Repository
  • Wikipedia article: Common Locale Data Repository
  • Use CLDR Locale Data by Default in Java Platform, Standard Edition Oracle JDK 9 Migration Guide
  • LocaleServiceProvider documentation spelling out the possible locale data sources: CLDR, COMPAT and more.
  • Oracle tutorial: Date Time explaining how to use java.time.

Parsing of weekdays not working for locale german

Although in English there is no difference between the name of the day when it is used alone and the name as it is used within the context of a date, in German, apparently, there is.

The pattern eee corresponds to TextStyle.SHORT, while the pattern ccc corresponds to TextStyle.SHORT_STANDALONE. Thus, if you try to parse a day name that was created by TextStyle.SHORT_STANDALONE with eee in the languages where it matters, the parsing will fail.

The way to go is ccc for the standalone version.

The documentation mentioning this is actually in the DateTimeFormatterBuilder API rather than DateTimeFormatter's.

Java Time parse Dates with short day names

The modern Date-Time API is very particular about the pattern. So, it is almost impossible to create a single pattern that you can use to parse all types of strings. However, one of the greatest features of DateTimeFormatter is its flexibility to work with optional patterns, specified using the square bracket e.g. the following demo uses E, d [MMMM][MMM][M] u H:m:s Z which has three optional patterns for the month.

Demo:

import java.time.DateTimeException;
import java.time.Instant;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.stream.Stream;

public class Main {
public static void main(String[] args) {
Stream.of(
"So., 18 Juli 2021 15:24:00 +0200",
"ven., 16 avr. 2021 15:24:00 +0200",
"vr, 16 apr. 2021 15:24:00 +0200",
"vr, 16 07 2021 15:24:00 +0200"
).forEach(s -> {
Stream.of(
Locale.GERMANY,
Locale.FRANCE,
new Locale("nl", "NL")
).forEach( locale -> {
try {
System.out.println("Parsed '" + s + "' using the locale, " + locale + " => " + parseToInstant(s, locale));
}catch(DateTimeException e) {
//....
}
});
});
}

static Instant parseToInstant(String strDateTime, Locale locale) {
return DateTimeFormatter.ofPattern("E, d [MMMM][MMM][M] u H:m:s Z").withLocale(locale).parse(strDateTime,
Instant::from);
}
}

Output:

Parsed 'So., 18 Juli 2021 15:24:00 +0200' using the locale, de_DE => 2021-07-18T13:24:00Z
Parsed 'ven., 16 avr. 2021 15:24:00 +0200' using the locale, fr_FR => 2021-04-16T13:24:00Z
Parsed 'vr, 16 apr. 2021 15:24:00 +0200' using the locale, nl_NL => 2021-04-16T13:24:00Z
Parsed 'vr, 16 07 2021 15:24:00 +0200' using the locale, nl_NL => 2021-07-16T13:24:00Z

ONLINE DEMO

Learn more about the Date-Time patterns from DateTimeFormatterBuilder.

Java DateTimeFormatter and LocalDateTime.parse trouble

In the formatter builder, Norwegian language is set by this line

.toFormatter(Locale.forLanguageTag("no"));

You can set locale to English by using the language tag en or you should provide Norwegian month names (with a dot at the end for shortened variants) like jan., feb., mar., apr., mai (the dot is not required since it's a full month name), etc.


EDIT:
After additional research, I've found that you can parse Norwegian months without an additional dot at the end. To accomplish that, you need to use a standalone format for a month (LLL instead of MMM).

So, your code will look like that

public getDateForIception() {
String tid = driver.findElement(By.cssSelector("div.hendelse-tid.hb-tekst--ingenBryting"))
.getText().replaceAll("(?<=[A-Za-z]{3})[.a-z]{1,2}", "");
if(tid.split("\\.")[0].length() == 1) {
tid = "0" + tid;
}

return DatoUtils.parseDatoLocalDateTime(tid, "dd. LLL yyyy HH:mm");
}

public static LocalDateTime parseDatoLocalDateTime(String datoString, String pattern) {
DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.parseCaseInsensitive()
.appendPattern(pattern)
.toFormatter(Locale.forLanguageTag("no"));
return LocalDateTime.parse(datoString, formatter);
}

September's short form Sep no longer parses in Java 17 in en_GB locale

It seems to be that in the en_GB locale, the short form of September is now "Sept", not "Sep". All the other months are the same 3 letters abbreviations as in en_US. Kind of makes sense. As a Brit, "Sep" looks wrong to me.

This is the ticket: https://bugs.openjdk.java.net/browse/JDK-8251317

It wasn't a conscious decision by the JDK authors. The locale data used by default in Java comes from Common Locale Data Repository (CLDR), which is a project by the Unicode Consortium. Newer versions of Java come with newer versions of the CLDR. So you may occasionally see a change in locale behavior. So the change you encountered is a feature, not a bug.

Yours is just one of many small tweaks.

Here's the specific change in the PR which broke it for you:
https://github.com/openjdk/jdk/pull/1279/files#diff-97210acd6f77c4f4979c43445d60ba1c369f058230e41177dceca697800b1fa2R116

Java unable to parse date when using dots . instead of dashes -

java.time

I strongly agree with the comments recommending java.time, the modern Java date and time API, for your date and time work.

Use this formatter:

private static final DateTimeFormatter FORMATTER
= DateTimeFormatter.ofPattern("dd.MMMuuuu HH:mm:ss", Locale.GERMAN);

Do:

    LocalDateTime dateTime = LocalDateTime.parse("01.Jan.2017 00:47:13", FORMATTER);
System.out.println(dateTime);

Output when run on Java 11 with German locale:

2017-01-01T00:47:13

You may wonder why compared to your format pattern string I have left out the second dot? On Java 11 (and probably near Java versions too, possibly from Java 9 through 16) German month abbreviations are with a dot to signify abbreviation, so Jan. for Januar (January), etc. So in my format MMM matches Jan. and then uuuu matches 2017.

For the fuller story Java gets its locale data including the month abbreviations used in different locales from up to four sources, and not all sources agree what German month abbreviations look like. Since Java 9 the default is CLDR,COMPAT meaning that locale data from CLDR, the Unicode Common Locale Data Repository, are preferred. And these include the dots I menitoned. You can get different results by setting the system property java.locale-providers to a value that does not begin with CLDR.

What went wrong in your code?

I have given you a hint already: In some Java versions German month abbreviations are with a dot. So in your example with dots as separators your SimpleDateFormat matched dd.MMM (without the second dot) to 01.Jan. (with the second dot). According to the format a dot should now come, but since that dot had already been consumed, SimpleDateFOrmat looked at 2017, decided it wasn’t a dot and threw the exception that you saw.

The really surprising behaviour was in your first example where SimpleDateFormat was able to parse 01-Jan-2017 00:47:13 without any dots even though it believes that the month abbreviation should end in a dot. I have seen literally hundreds of examples of surprising behaviour of SimpleDateFormat before, but never any akin to this one.

And all of these surprises are what make me say: By all means avoid using SimpleDateFormat.

If you’re skeptical, I don’t blame you. So to demonstrate:

    SimpleDateFormat formatter = new SimpleDateFormat("MMMyyyy", Locale.GERMAN);

System.out.println(formatter.format(0L));

System.out.println(formatter.parse("Jan2017"));
System.out.println(formatter.parse("Jan.2017"));

Output, still on Java 11:

Jan.1970
Sun Jan 01 00:00:00 CET 2017
Sun Jan 01 00:00:00 CET 2017

We see that SimpleDateFormat formats the month with a dot, and is able to parse strings both with and without dots.

This still isn’t the full story.

    SimpleDateFormat formatter = new SimpleDateFormat("MMM", Locale.GERMAN);

System.out.println(formatter.format(0L));

Output:

Jan

This time the month abbreviation was formatted without the dot. I got no idea what is going on. I repeat, forget about the confusing SimpleDateFormat class. It’s a notorious troublemaker.

Links

  • Oracle tutorial: Date Time explaining how to use java.time.
  • JDK dateformatter parsing DayOfWeek in German locale, java8 vs java9, a question about German locale data from different sources.
  • How to parse month full form string using DateFormat in Java?, a question about forgetting to specofy locale for parsing.


Related Topics



Leave a reply



Submit