Context about locale

  • ISO 639 - defines locale code for each language

  • ISO 639-1 - use 2 digit to represent language

    2 digit (26*26=676) cannot represent all languages (6000+), so

  • ISO 639-2 - use 3 digit to represent language

But this only represents languages, languages changes in different ways of writing, the region they live, for example, the Traditional Chinese in Taiwan are different than the one people using in Hong Kong and Macau. Slovene has a Russian & Nadiza dialect, etc.

So here comes RFC 4646, this document suggests we do this:

language-script-region
  • language: ISO 639 defined locale code: zh (Chinese)
  • script: Ways of writing: zh-Hans (Simplified Chinese)
  • region: Where the language being used zh-Hans-SG (Simplified Chinese in Singapore)

It has more beyond language, script, and region (See "The syntax of the language tag" in RFC 4646).

One thing to notice is that all tags to represent the language needs to be IANA verified.

List of IANA verified tags can be found here:

https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

And locale from ICU: http://www.localeplanet.com/icu/