1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
|
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import collections
import re
"""
This script van generate a dictionary of language names.
This dictionary looks as follows:
language_names = {
"C": {
"nl": "Dutch",
"de": "German",
"en": "English",
},
"nl": {
"nl": "Nederlands",
"de": "Duits",
"en": "Engels",
},
}
Etcetera.
It can be created from:
- the 'all_languages' file that is part of KDE (currently the only option).
This generate.py script writes the dictionary to a file named
data.py.
This script needs not to be installed to be able to use the language_names package.
"""
# Here you should name the language names to be extracted.
# If empty, all are used. "C" must be named.
# lang_names = []
lang_names = [
"C", "en", "de", "fr", "es", "nl", "pl", "pt_BR",
"cs", "ru", "hu", "gl", "it", "tr", "uk",
"ja", "zh_CN", "zh_HK", "zh_TW",
]
def generate_kde(fileName="/usr/share/locale/all_languages"):
"""Uses the KDE file to extract language names.
Returns the dictionary. All strings are in unicode form.
"""
langs = collections.defaultdict(dict)
group = None
with codecs.open(fileName, "r", "utf-8") as langfile:
for line in langfile:
line = line.strip()
m = re.match(r"\[([^]]+)\]", line)
if m:
group = m.group(1)
elif group and group != 'x-test':
m = re.match(r"Name(?:\[([^]]+)\])?\s*=(.*)$", line)
if m:
lang, name = m.group(1) or "C", m.group(2)
langs[lang][group] = name
# correct KDE mistake
langs["cs"]["gl"] = "Galicijský"
langs["zh_HK"]["gl"] = "加利西亞語"
langs["zh_HK"]["zh_HK"] = "繁體中文(香港)"
return dict(langs)
def makestring(text):
"""Returns the text wrapped in quotes, usable as Python input (expecting unicode_literals)."""
return '"' + re.sub(r'([\\"])', r'\\\1', text) + '"'
def write_dict(langs):
"""Writes the dictionary file to the 'data.py' file."""
keys = sorted(filter(lambda k: k in langs, lang_names) if lang_names else langs)
with codecs.open("data.py", "w", "utf-8") as output:
output.write("# -*- coding: utf-8;\n\n")
output.write("# Do not edit, this file is generated. See generate.py.\n")
output.write("\nfrom __future__ import unicode_literals\n")
output.write("\n\n")
output.write("language_names = {\n")
for key in keys:
output.write('{0}: {{\n'.format(makestring(key)))
for lang in sorted(langs[key]):
output.write(' {0}:{1},\n'.format(makestring(lang), makestring(langs[key][lang])))
output.write('},\n')
output.write("}\n\n# End of data.py\n")
if __name__ == "__main__":
langs = generate_kde()
langs['zh'] = langs['zh_CN']
write_dict(langs)
|