File: UnsafeLocaleUsage.md

package info (click to toggle)
error-prone-java 2.18.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 23,204 kB
  • sloc: java: 222,992; xml: 1,319; sh: 25; makefile: 7
file content (78 lines) | stat: -rw-r--r-- 2,616 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
The Java `Locale` API is broken in a few ways that should be avoided, with some
examples of error prone issues below:

#### Constructors

The constructors don't validate the parameters at all, they just "trust" it
100%.

For example:

```
Locale locale = new Locale("en_AU");
toString()        : "en_au"
getLanguage()     : "en_au"
locale.getCountry : ""

locale = new Locale("somethingBad#!34, too long, and clearly not a locale ID");
toString()    : "somethingbad#!34, too long, and clearly not a locale id"
getLanguage() : "somethingbad#!34, too long, and clearly not a locale id"
getCountry()  : ""
```

As you can see, the full string is interpreted as language, and the country is
empty.

For `new Locale("zh", "tw", "#Hant")` you get:

```
toString()    : zh_TW_#Hant
getLanguage() : zh
getCountry()  : TW
getScript()   :
getVariant()  : #Hant
```

And for `Locale.forLanguageTag("zh-hant-tw")` you get a different result:

```
toString()    : zh_TW_#Hant
getLanguage() : zh
getCountry()  : TW
getScript()   : Hant
getVariant()  :
```

We can see that while the `toString()` value for both locales are equivalent,
the individual parts are different. More specifically, the first locale is
incorrect since `#Hant` is supposed to be the script for the locale rather than
the variant. \
There's no reliable way of getting a correct result through a `Locale`
constructor, so we should prefer using `Locale.forLanguageTag()` (and the IETF
BCP 47 format) for correctness.

**Note:** You might see a `.replace("_", "-")` appended to a suggested fix for
the error prone checker for this bug pattern. This is sanitization measure to
handle the fact that `Locale.forLanguageTag()` accepts the "minus form" of a tag
(`en-US`) but not the "underscore form" (`en_US`). It will silently default to
`Locale.ROOT` if the latter form is passed in.

#### toString()

This poses the inverse of the constructor problem

```
Locale myLocale = Locale.forLanguageTag("zh-hant-tw")
String myLocaleStr = myLocale.toString() // zh_TW_#Hant
Locale derivedLocale = ??? // Not clean way to get a correct locale from this string
```

The `toString()` implementation for `Locale` isn't necessarily incorrect in
itself. \
It is intended to be _"concise but informative representation that is easy for a
person to read"_ (see documentation at
[Object.toString()](https://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#toString\(\))).

So it is not intended to produce a value that can be turned back into a
`Locale`. It is not a serialization format. \
It often produces a value that _looks_ like a locale identifier, but it is not.