List of All Locales and Their Short Codes

List of All Locales and Their Short Codes?

The importance of locales is that your environment/os can provide formatting functionality for all installed locales even if you don't know about them when you write your application. My Windows 7 system has 211 locales installed (listed below), so you wouldn't likely write any custom code or translation specific to this many locales.

Edit: The original list of locales has been edited to add additional locales that were not included before. Now 228 listed.

The most important thing for various versions of English is in formatting numbers and dates. Other differences are significant to the extent that you want and able to cater to specific variations.

af-ZA
am-ET
ar-AE
ar-BH
ar-DZ
ar-EG
ar-IQ
ar-JO
ar-KW
ar-LB
ar-LY
ar-MA
arn-CL
ar-OM
ar-QA
ar-SA
ar-SD
ar-SY
ar-TN
ar-YE
as-IN
az-az
az-Cyrl-AZ
az-Latn-AZ
ba-RU
be-BY
bg-BG
bn-BD
bn-IN
bo-CN
br-FR
bs-Cyrl-BA
bs-Latn-BA
ca-ES
co-FR
cs-CZ
cy-GB
da-DK
de-AT
de-CH
de-DE
de-LI
de-LU
dsb-DE
dv-MV
el-CY
el-GR
en-029
en-AU
en-BZ
en-CA
en-cb
en-GB
en-IE
en-IN
en-JM
en-MT
en-MY
en-NZ
en-PH
en-SG
en-TT
en-US
en-ZA
en-ZW
es-AR
es-BO
es-CL
es-CO
es-CR
es-DO
es-EC
es-ES
es-GT
es-HN
es-MX
es-NI
es-PA
es-PE
es-PR
es-PY
es-SV
es-US
es-UY
es-VE
et-EE
eu-ES
fa-IR
fi-FI
fil-PH
fo-FO
fr-BE
fr-CA
fr-CH
fr-FR
fr-LU
fr-MC
fy-NL
ga-IE
gd-GB
gd-ie
gl-ES
gsw-FR
gu-IN
ha-Latn-NG
he-IL
hi-IN
hr-BA
hr-HR
hsb-DE
hu-HU
hy-AM
id-ID
ig-NG
ii-CN
in-ID
is-IS
it-CH
it-IT
iu-Cans-CA
iu-Latn-CA
iw-IL
ja-JP
ka-GE
kk-KZ
kl-GL
km-KH
kn-IN
kok-IN
ko-KR
ky-KG
lb-LU
lo-LA
lt-LT
lv-LV
mi-NZ
mk-MK
ml-IN
mn-MN
mn-Mong-CN
moh-CA
mr-IN
ms-BN
ms-MY
mt-MT
nb-NO
ne-NP
nl-BE
nl-NL
nn-NO
no-no
nso-ZA
oc-FR
or-IN
pa-IN
pl-PL
prs-AF
ps-AF
pt-BR
pt-PT
qut-GT
quz-BO
quz-EC
quz-PE
rm-CH
ro-mo
ro-RO
ru-mo
ru-RU
rw-RW
sah-RU
sa-IN
se-FI
se-NO
se-SE
si-LK
sk-SK
sl-SI
sma-NO
sma-SE
smj-NO
smj-SE
smn-FI
sms-FI
sq-AL
sr-BA
sr-CS
sr-Cyrl-BA
sr-Cyrl-CS
sr-Cyrl-ME
sr-Cyrl-RS
sr-Latn-BA
sr-Latn-CS
sr-Latn-ME
sr-Latn-RS
sr-ME
sr-RS
sr-sp
sv-FI
sv-SE
sw-KE
syr-SY
ta-IN
te-IN
tg-Cyrl-TJ
th-TH
tk-TM
tlh-QS
tn-ZA
tr-TR
tt-RU
tzm-Latn-DZ
ug-CN
uk-UA
ur-PK
uz-Cyrl-UZ
uz-Latn-UZ
uz-uz
vi-VN
wo-SN
xh-ZA
yo-NG
zh-CN
zh-HK
zh-MO
zh-SG
zh-TW
zu-ZA

Map array of locales and change each locale to language name

How about an object.

const clAlias = {
"en-US": "English",
"de-DE": "Deutsch"
}
//...
al.map(x => clAlias[x])

Is there a way to get the list of available locales in PHP?

Part of the confusion here is that PHP has two concepts called "locale" that are pretty much totally separate.

The first is the older one, which basically just uses the C locale features. That's what's behind setlocale and the locale support in some of PHP's functions (like money_format for example). This is what other answers that mention running locale -a on the command line and using setlocale are talking about.

PHP's Locale class and the other related functionality from the intl extension is newer, and doesn't work the same way. Instead of using the libc locale stuff, it uses a library called ICU, which ships its own locale data. PHP does provide a method to determine which locales are supported by this system: ResourceBundle::getLocales. The documentation is a little wooly here, but you can call this as a static method and pass the blank string to use ICU's default resources, thus getting a list of the supported locales for intl:

ResourceBundle::getLocales('');

What is a good definition for language code and locale codes?

There are several systems for locale identifiers. Many of them are similar at the first glance, but not when you go deeper:

Some examples (Serbian-Serbia with Latin Script, Japanese-Japan with radical sorting):

  • UTS-35, ICU, Mac OS X, Flash: sr-Latn-RS, ja-JP@collation=radical
  • Newer UTS-35, BCP 47 extension U: sr-Latn-RS, ja-JP-u-co-unihan
  • Win 2000, XP: 0x81a, 0x10411
  • Vista, Win 7: sr-Latn-CS, ja-JP_radical
  • Java: sr_CS, ja_JP
  • Java 7: sr_RS, ja_JP
  • Linux: sr_RS@latin, ja_JP.utf8

Think of it like different ways to talk about colors (RGB, CMYB, HSV, Pantone, etc.)

So - vs. _ does not make sense unless you specify what the is the environment you are using. Use - and Java will not understand it, use _ and Windows will not understand it.
ICU (and systems build on top of it) accept both - and _, but produce the _ style.

There is no ISO that covers the combination of language-country. But there are ISOs that cover the various parts (language, country, script).
The exact version of the ISO also depends on the system used for locale identifiers.


In general you should accept both _ and -, and generate only one ("be liberal in what you accept and strict in what you emit") (like ICU).

If you communicate with systems using another type of locale identifier, you will have to map to/from your system. That will force you to use _ or -.
Some of the mappings will be lossy (there is no way to specify alternate calendars in Windows, Linux; or alternate sorting or scripts in Java older than 7, etc.) and round-tripping might not be possible (somewhat similar to conversions RGB-CMYK).

Addition: things are different not only between systems, but they can change in time. For instance Java 7 added support for sr_RS and for scripts, Windows keeps adding support for more locales, new countries get created (Sudan split, Russia, Serbia) or disappear (East Germany, U.S.S.R, Yugoslavia) and so on.

For internal representation you might want to choose the most powerful one, that can represent everything, and that is UTS-35 / BCP 47 (also used by CLDR and ICU).

Where can I find a list of language + region codes?

This can be found at Unicode's Common Locale Data Repository. Specifically, a JSON file of this information is available in their cldr-json repo

What is the list of supported languages/locales on iOS like Android?

According to documentation and this article it looks like iOS supports all ISO-639 languages codes.

Is there a simple way to get the language code from a country code in PHP

As other have pointed out, there is no built-in function as this likely due to the reality of many countries having multiple languages. So unfortunately, I can't point you to a library that does this, but I did go ahead and write a little function which does what you want.

There are two caveats, one being if it isn't provided a language it will just pick the first locale in the list. To get around this, you'd have to put some logic around the function call to provide it with the appropriate language. The other is that it needs to have php5-intl installed.

<?php

/**
/* Returns a locale from a country code that is provided.
/*
/* @param $country_code ISO 3166-2-alpha 2 country code
/* @param $language_code ISO 639-1-alpha 2 language code
/* @returns a locale, formatted like en_US, or null if not found
/**/
function country_code_to_locale($country_code, $language_code = '')
{
// Locale list taken from:
// http://stackoverflow.com/questions/3191664/
// list-of-all-locales-and-their-short-codes
$locales = array('af-ZA',
'am-ET',
'ar-AE',
'ar-BH',
'ar-DZ',
'ar-EG',
'ar-IQ',
'ar-JO',
'ar-KW',
'ar-LB',
'ar-LY',
'ar-MA',
'arn-CL',
'ar-OM',
'ar-QA',
'ar-SA',
'ar-SY',
'ar-TN',
'ar-YE',
'as-IN',
'az-Cyrl-AZ',
'az-Latn-AZ',
'ba-RU',
'be-BY',
'bg-BG',
'bn-BD',
'bn-IN',
'bo-CN',
'br-FR',
'bs-Cyrl-BA',
'bs-Latn-BA',
'ca-ES',
'co-FR',
'cs-CZ',
'cy-GB',
'da-DK',
'de-AT',
'de-CH',
'de-DE',
'de-LI',
'de-LU',
'dsb-DE',
'dv-MV',
'el-GR',
'en-029',
'en-AU',
'en-BZ',
'en-CA',
'en-GB',
'en-IE',
'en-IN',
'en-JM',
'en-MY',
'en-NZ',
'en-PH',
'en-SG',
'en-TT',
'en-US',
'en-ZA',
'en-ZW',
'es-AR',
'es-BO',
'es-CL',
'es-CO',
'es-CR',
'es-DO',
'es-EC',
'es-ES',
'es-GT',
'es-HN',
'es-MX',
'es-NI',
'es-PA',
'es-PE',
'es-PR',
'es-PY',
'es-SV',
'es-US',
'es-UY',
'es-VE',
'et-EE',
'eu-ES',
'fa-IR',
'fi-FI',
'fil-PH',
'fo-FO',
'fr-BE',
'fr-CA',
'fr-CH',
'fr-FR',
'fr-LU',
'fr-MC',
'fy-NL',
'ga-IE',
'gd-GB',
'gl-ES',
'gsw-FR',
'gu-IN',
'ha-Latn-NG',
'he-IL',
'hi-IN',
'hr-BA',
'hr-HR',
'hsb-DE',
'hu-HU',
'hy-AM',
'id-ID',
'ig-NG',
'ii-CN',
'is-IS',
'it-CH',
'it-IT',
'iu-Cans-CA',
'iu-Latn-CA',
'ja-JP',
'ka-GE',
'kk-KZ',
'kl-GL',
'km-KH',
'kn-IN',
'kok-IN',
'ko-KR',
'ky-KG',
'lb-LU',
'lo-LA',
'lt-LT',
'lv-LV',
'mi-NZ',
'mk-MK',
'ml-IN',
'mn-MN',
'mn-Mong-CN',
'moh-CA',
'mr-IN',
'ms-BN',
'ms-MY',
'mt-MT',
'nb-NO',
'ne-NP',
'nl-BE',
'nl-NL',
'nn-NO',
'nso-ZA',
'oc-FR',
'or-IN',
'pa-IN',
'pl-PL',
'prs-AF',
'ps-AF',
'pt-BR',
'pt-PT',
'qut-GT',
'quz-BO',
'quz-EC',
'quz-PE',
'rm-CH',
'ro-RO',
'ru-RU',
'rw-RW',
'sah-RU',
'sa-IN',
'se-FI',
'se-NO',
'se-SE',
'si-LK',
'sk-SK',
'sl-SI',
'sma-NO',
'sma-SE',
'smj-NO',
'smj-SE',
'smn-FI',
'sms-FI',
'sq-AL',
'sr-Cyrl-BA',
'sr-Cyrl-CS',
'sr-Cyrl-ME',
'sr-Cyrl-RS',
'sr-Latn-BA',
'sr-Latn-CS',
'sr-Latn-ME',
'sr-Latn-RS',
'sv-FI',
'sv-SE',
'sw-KE',
'syr-SY',
'ta-IN',
'te-IN',
'tg-Cyrl-TJ',
'th-TH',
'tk-TM',
'tn-ZA',
'tr-TR',
'tt-RU',
'tzm-Latn-DZ',
'ug-CN',
'uk-UA',
'ur-PK',
'uz-Cyrl-UZ',
'uz-Latn-UZ',
'vi-VN',
'wo-SN',
'xh-ZA',
'yo-NG',
'zh-CN',
'zh-HK',
'zh-MO',
'zh-SG',
'zh-TW',
'zu-ZA',);

foreach ($locales as $locale)
{
$locale_region = locale_get_region($locale);
$locale_language = locale_get_primary_language($locale);
$locale_array = array('language' => $locale_language,
'region' => $locale_region);

if (strtoupper($country_code) == $locale_region &&
$language_code == '')
{
return locale_compose($locale_array);
}
elseif (strtoupper($country_code) == $locale_region &&
strtolower($language_code) == $locale_language)
{
return locale_compose($locale_array);
}
}

return null;
}
?>

Locale codes for iPhone lproj folders

You can just call them English.lproj, Spanish.lproj, etc.

The "abbreviated names" are actually IETF language tags (i.e. BCP 47), except that you use pt_PT.lproj instead of pt-PT.lproj.


The actual interpretation routine is in https://github.com/apple/swift-corelibs-foundation/blob/master/CoreFoundation/PlugIn.subproj/CFBundle_Locale.c, determined by the CFBundleGetLocalizationInfoForLocalization function. Replicated here:

| lproj identifiers              | L#  | C#  | Display name               |
|:-------------------------------|:----|:----|:---------------------------|
| en_US = en = English | 0 | 0 | English (United States) |
| en_GB | 0 | 2 | English (United Kingdom) |
| en_AU | 0 | 15 | English (Australia) |
| en_CA | 0 | 82 | English (Canada) |
| en_SG | 0 | 100 | English (Singapore) |
| en_IE | 0 | 108 | English (Ireland) |
| fr_FR = fr = French | 1 | 1 | French (France) |
| fr_CA | 1 | 11 | French (Canada) |
| fr_CH | 1 | 18 | French (Switzerland) |
| fr_BE | 1 | 98 | French (Belgium) |
| de_DE = de = German | 2 | 3 | German (Germany) |
| de_CH | 2 | 19 | German (Switzerland) |
| de_AT | 2 | 92 | German (Austria) |
| it_IT = it = Italian | 3 | 4 | Italian (Italy) |
| it_CH | 3 | 36 | Italian (Switzerland) |
| nl_NL = nl = Dutch | 4 | 5 | Dutch (Netherlands) |
| nl_BE | 34 | 6 | Dutch (Belgium) |
| sv_SE = sv = Swedish | 5 | 7 | Swedish (Sweden) |
| es_ES = es = Spanish | 6 | 8 | Spanish (Spain) |
| es_XL | 6 | 86 | Spanish (Latin America) |
| da_DK = da = Danish | 7 | 9 | Danish (Denmark) |
| pt_BR = pt = Portuguese | 8 | 71 | Portuguese (Brazil) |
| pt_PT | 8 | 10 | Portuguese (Portugal) |
| nb_NO = nb = no = Norwegian | 9 | 12 | Norwegian Bokmål (Norway) |
| nn_NO = nn = Nynorsk | 151 | 101 | Norwegian Nynorsk (Norway) |
| he_IL = he = Hebrew | 10 | 13 | Hebrew (Israel) |
| ja_JP = ja = Japanese | 11 | 14 | Japanese (Japan) |
| ar = Arabic | 12 | 16 | Arabic |
| fi_FI = fi = Finnish | 13 | 17 | Finnish (Finland) |
| el_GR = el = Greek | 14 | 20 | Greek (Greece) |
| el_CY | 14 | 23 | Greek (Cyprus) |
| is_IS = is = Icelandic | 15 | 21 | Icelandic (Iceland) |
| mt_MT = mt = Maltese | 16 | 22 | Maltese (Malta) |
| tr_TR = tr = Turkish | 17 | 24 | Turkish (Turkey) |
| hr_HR = hr = Croatian | 18 | 68 | Croatian (Croatia) |
| zh_TW = zh-Hant | 19 | 53 | Chinese (Taiwan) |
| zh_CN = zh = zh-Hans = Chinese | 33 | 52 | Chinese (China) |
| ur_PK = ur = Urdu | 20 | 34 | Urdu (Pakistan) |
| ur_IN | 20 | 96 | Urdu (India) |
| hi_IN = hi = Hindi | 21 | 33 | Hindi (India) |
| th_TH = th = Thai | 22 | 54 | Thai (Thailand) |
| ko_KR = ko = Korean | 23 | 51 | Korean (South Korea) |
| lt_LT = lt = Lithuanian | 24 | 41 | Lithuanian (Lithuania) |
| pl_PL = pl = Polish | 25 | 42 | Polish (Poland) |
| hu_HU = hu = Hungarian | 26 | 43 | Hungarian (Hungary) |
| et_EE = et = Estonian | 27 | 44 | Estonian (Estonia) |
| lv_LV = lv = Latvian | 28 | 45 | Latvian (Latvia) |
| se = Sami | 29 | 46 | Northern Sami |
| fo_FO = fo = Faroese | 30 | 47 | Faroese (Faroe Islands) |
| fa_IR = fa = Farsi | 31 | 48 | Persian (Iran) |
| ru_RU = ru = Russian | 32 | 49 | Russian (Russia) |
| ga_IE = ga = Irish | 35 | 50 | Irish (Ireland) |
| sq = Albanian | 36 | -1 | Albanian |
| ro_RO = ro = Romanian | 37 | 39 | Romanian (Romania) |
| cs_CZ = cs = Czech | 38 | 56 | Czech (Czech Republic) |
| sk_SK = sk = Slovak | 39 | 57 | Slovak (Slovakia) |
| sl_SI = sl = Slovenian | 40 | 66 | Slovenian (Slovenia) |
| yi = Yiddish | 41 | -1 | Yiddish |
| sr_CS = sr = Serbian | 42 | 65 | Serbian (Serbia) |
| mk_MK = mk = Macedonian | 43 | 67 | Macedonian (Macedonia) |
| bg_BG = bg = Bulgarian | 44 | 72 | Bulgarian (Bulgaria) |
| uk_UA = uk = Ukrainian | 45 | 62 | Ukrainian (Ukraine) |
| be_BY = be = Byelorussian | 46 | 61 | Belarusian (Belarus) |
| uz_UZ = uz = Uzbek | 47 | 99 | Uzbek (Uzbekistan) |
| kk = Kazakh | 48 | -1 | Kazakh |
| hy_AM = hy = Armenian | 51 | 84 | Armenian (Armenia) |
| ka_GE = ka = Georgian | 52 | 85 | Georgian (Georgia) |
| mo = Moldavian | 53 | -1 | Moldavian |
| ky = Kirghiz | 54 | -1 | Kyrgyz |
| tg = Tajiki | 55 | -1 | Tajik |
| tk = Turkmen | 56 | -1 | Turkmen |
| mn = Mongolian | 58 | -1 | Mongolian |
| ps = Pashto | 59 | -1 | Pashto |
| ku = Kurdish | 60 | -1 | Kurdish |
| ks = Kashmiri | 61 | -1 | Kashmiri |
| sd = Sindhi | 62 | -1 | Sindhi |
| bo = Tibetan | 63 | 105 | Tibetan |
| ne_NP = ne = Nepali | 64 | 106 | Nepali (Nepal) |
| sa = Sanskrit | 65 | -1 | Sanskrit |
| mr_IN = mr = Marathi | 66 | 104 | Marathi (India) |
| bn = Bengali | 67 | 60 | Bengali |
| as = Assamese | 68 | -1 | Assamese |
| gu_IN = gu = Gujarati | 69 | 94 | Gujarati (India) |
| pa = Punjabi | 70 | 95 | Punjabi |
| or = Oriya | 71 | -1 | Oriya |
| ml = Malayalam | 72 | -1 | Malayalam |
| kn = Kannada | 73 | -1 | Kannada |
| ta = Tamil | 74 | -1 | Tamil |
| te = Telugu | 75 | -1 | Telugu |
| si = Sinhalese | 76 | -1 | Sinhala |
| my = Burmese | 77 | -1 | Burmese |
| km = Khmer | 78 | -1 | Khmer |
| lo = Lao | 79 | -1 | Lao |
| vi_VN = vi = Vietnamese | 80 | 97 | Vietnamese (Vietnam) |
| id = Indonesian | 81 | -1 | Indonesian |
| tl = Tagalog | 82 | -1 | Tagalog |
| ms = Malay | 83 | -1 | Malay |
| am = Amharic | 85 | -1 | Amharic |
| ti = Tigrinya | 86 | -1 | Tigrinya |
| om = Oromo | 87 | -1 | Oromo |
| so = Somali | 88 | -1 | Somali |
| sw = Swahili | 89 | -1 | Swahili |
| rw = Kinyarwanda | 90 | -1 | Kinyarwanda |
| rn = Rundi | 91 | -1 | Rundi |
| Nyanja | 92 | -1 | Nyanja |
| mg = Malagasy | 93 | -1 | Malagasy |
| eo = Esperanto | 94 | 103 | Esperanto |
| cy = Welsh | 128 | 79 | Welsh |
| eu = Basque | 129 | -1 | Basque |
| ca_ES = ca = Catalan | 130 | 73 | Catalan (Spain) |
| la = Latin | 131 | -1 | Latin |
| qu = Quechua | 132 | -1 | Quechua |
| gn = Guarani | 133 | -1 | Guarani |
| ay = Aymara | 134 | -1 | Aymara |
| tt = Tatar | 135 | -1 | Tatar |
| ug = Uighur | 136 | -1 | Uyghur |
| dz_BT = dz = Dzongkha | 137 | 83 | Dzongkha (Bhutan) |
| jv = Javanese | 138 | -1 | Javanese |
| su = Sundanese | 139 | -1 | Sundanese |
| gl = Galician | 140 | -1 | Galician |
| af_ZA = af = Afrikaans | 141 | 102 | Afrikaans (South Africa) |
| br = Breton | 142 | 77 | Breton |
| iu_CA = iu = Inuktitut | 143 | 78 | Inuktitut (Canada) |
| gd = Scottish | 144 | 75 | Scottish Gaelic |
| gv = Manx | 145 | 76 | Manx |
| to_TO = to = Tongan | 147 | 88 | Tongan (Tonga) |
| grc | 148 | 40 | Ancient Greek |
| kl = Greenlandic | 149 | 107 | Kalaallisut |
| az = Azerbaijani | 150 | -1 | Azerbaijani |

Here:

  • L# is the language code and C# is the country code. I consider two identifier identical if they share the same language and country code.
  • I have only listed strings appearing the source file. It also recognizes something like zh_HK and Traditional Chinese (both have same code number as zh_TW), probably through the more sophisticated CFLocale list.

As of iOS 10.3.1, the following list of lproj names are actually used by Apple:

  • Danish, Dutch, English, French, German, Italian, Japanese, Polish, Portuguese, Russian, Spanish, Swedish
  • ar, bo, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, hu, id, it, ja, ko, ms, nb, nl, no, pa, pl, pt, ro, ru, sk, sv, th, tr, uk, ur, vi, chr (Note: chr = Cherokee)
  • en_AU, en_CA, en_CN, en_GB, en_ID, en_IN, en_JP, en_MY, en_NZ, en_SG
  • es_419, es_AR, es_CL, es_CO, es_CR, es_GT, es_MX, es_PA, es_PE, es_US
  • ar_SA, da_DK, de_AT, de_CH, fi_FI, fr_BE, fr_CA, fr_CH, he_IL, it_CH, ms_MY, nb_NO, nl_BE, nl_NL, pt_BR, pt_PT, ru_RU, sv_SE, th_TH, tr_TR, yue_CN, zh_CN, zh_HK, zh_TW


Related Topics



Leave a reply



Submit