10.10 Internationalization
Most programs present some information to
users as text. Such text should be understandable and acceptable to
the user. For example, in some countries and cultures, the date
"March 7" can be concisely
expressed as "3/7". Elsewhere,
"3/7" indicates
"July 3", and the string that means
"March 7" is
"7/3". In Python, such cultural
conventions are handled with the help of standard module
locale.
Similarly, a greeting can be expressed in one natural language by the
string "Benvenuti", while in
another language the string to use is
"Welcome". In Python, such
translations are handled with the help of standard module
gettext.
Both kinds of issues are commonly called
internationalization (often abbreviated
i18n, as there are 18 letters between
i and n in the full
spelling). This is actually a misnomer, as the issues also apply to
programs used within one nation by users of different languages or
cultures.
10.10.1 The locale Module
Python's support for cultural conventions is
patterned on that of C, slightly simplified. In this architecture, a
program operates in an environment of cultural conventions known as a
locale. The locale setting permeates the program
and is typically set early on in the program's
operation. The locale is not thread-specific, and module
locale is not thread-safe. In a multithreaded
program, set the program's locale before starting
secondary threads.
If a program does not call locale.setlocale, the
program operates in a neutral locale known as the C locale. The C
locale is named from this architecture's origins in
the C language, and is similar, but not identical, to the U.S.
English locale. Alternatively, a program can find out and accept the
user's default locale. In this case, module
locale interacts with the operating system (via
the environment, or in other system-dependent ways) to establish the
user's preferred locale. Finally, a program can set
a specific locale, presumably determining which locale to set on the
basis of user interaction, or via persistent configuration settings
such as a program initialization file.
A locale setting is normally performed across the board, for all
relevant categories of cultural conventions. This wide-spectrum
setting is denoted by the constant attribute
LC_ALL of module locale.
However, the cultural conventions handled by module
locale are grouped into categories, and in some
cases a program can choose to mix and match categories to build up a
synthetic composite locale. The categories are identified by the
following constant attributes of module locale:
- LC_COLLATE
-
String sorting: affects functions strcoll and
strxfrm in
locale
- LC_CTYPE
-
Character types: affects aspects of module string
(and string methods) that have to do with letters, lowercase, and
uppercase
- LC_MESSAGES
-
Messages: may affect messages displayed by the operating system, for
example function os.strerror and module
gettext
- LC_MONETARY
-
Formatting of currency values: affects function
locale.localeconv
- LC_NUMERIC
-
Formatting of numbers: affects functions atoi,
atof, format,
localeconv, and str in
locale
- LC_TIME
-
Formatting of times and dates: affects function
time.strftime
The settings of some categories (denoted by the constants
LC_CTYPE, LC_TIME, and
LC_MESSAGES) affect some of the behavior of other
modules (string, time,
os, and gettext, as indicated).
The settings of other categories (denoted by the constants
LC_COLLATE, LC_MONETARY, and
LC_NUMERIC) affect only some functions of
locale.
Module locale supplies functions to query, change,
and manipulate locales, as well as functions that implement the
cultural conventions of locale categories
LC_COLLATE, LC_MONETARY, and
LC_NUMERIC.
Converts string str to a floating-point
value according to the current LC_NUMERIC setting.
Converts string str to an integer
according to the LC_NUMERIC setting.
format(fmt,num,grouping=0)
|
|
Returns the string obtained by formatting number
num according to the format string
fmt and the LC_NUMERIC
setting. Except for cultural convention issues, the result is like
fmt%num.
If grouping is true,
format also groups digits in the result string
according to the LC_NUMERIC setting. For example:
>>> locale.setlocale(locale.LC_NUMERIC,'en')
'English_United States.1252'
>>> locale.format('%s',1000*1000)
'1000000'
>>> locale.format('%s',1000*1000,1)
'1,000,000' When the numeric locale is U.S. English, and argument
grouping is true,
format supports the convention of grouping digits
by threes with commas.
getdefaultlocale(envvars=['LANGUAGE','LC_ALL',
'LC_TYPE','LANG'])
|
|
Examines the environment variables whose names are specified by
argument envvars, in order. The first
variable found in the environment determines the default locale.
getdefaultlocale returns a pair of strings
(lang,encoding)
compliant with RFC 1766 (except for the 'C'
locale), such as ('en_US','ISO8859-1'). Each item
of the pair may be None if
gedefaultlocale is unable to discover what value
the item should have.
getlocale(category=LC_TYPE)
|
|
Returns a pair of strings
(lang,encoding)
with the current setting for the given
category. The category cannot be
LC_ALL.
Returns a dictionary d containing the
cultural conventions specified by categories
LC_NUMERIC and LC_MONETARY of
the current locale. While LC_NUMERIC is best used
indirectly via other functions of module locale,
the details of LC_MONETARY are accessible only
through d. Currency formatting is
different for local and international use. The U.S. currency symbol,
for example, is '$' for local use only.
'$' would be ambiguous in international use, since
the same symbol is also used for other currencies called
"dollars" (Canadian, Australian,
Hong Kong, etc.). In international use, therefore, the U.S. currency
symbol is the unambiguous string 'USD'. The keys
into d to use for currency formatting are
the following strings:
- 'currency_symbol'
-
Currency symbol to use locally
- 'frac_digits'
-
Number of fractional digits to use locally
- 'int_curr_symbol'
-
Currency symbol to use internationally
- 'int_frac_digits'
-
Number of fractional digits to use internationally
- 'mon_decimal_point'
-
String to use as the "decimal
point" for monetary values
- 'mon_grouping'
-
List of digit grouping numbers for monetary values
- 'mon_thousands_sep'
-
String to use as digit-groups separator for monetary values
- 'negative_sign', 'positive_sign'
-
String to use as the sign symbol for negative (positive) monetary
values
- 'n_cs_precedes', 'p_cs_precedes'
-
True if the currency symbol comes before negative (positive) monetary
values
- 'n_sep_by_space', 'p_sep_by_space'
-
True if a space goes between sign and negative (positive) monetary
values
- 'n_sign_posn', 'p_sign_posn'
-
Numeric code to use to format negative (positive) monetary values:
- 0
-
The value and the currency symbol are placed inside parentheses
- 1
-
The sign is placed before the value and the currency symbol
- 2
-
The sign is placed after the value and the currency symbol
- 3
-
The sign is placed immediately before the value
- 4
-
The sign is placed immediately after the value
- CHAR_MAX
-
The current locale does not specify any convention for this formatting
d['mon_grouping'] is a
list of numbers of digits to group when formatting a monetary value.
When
d['mon_grouping'][-1]
is 0, there is no further grouping beyond the
indicated numbers of digits. When
d['mon_grouping'][-1]
is locale.CHAR_MAX, grouping continues
indefinitely, as if
d['mon_grouping'][-2]
were endlessly repeated. locale.CHAR_MAX is a
constant used as the value for all entries in
d for which the current locale does not
specify any convention.
Returns a string, suitable as an argument to
setlocale, that is the normalized equivalent to
localename. If
normalize cannot normalize string
localename, then
normalize returns
localename unchanged.
resetlocale(category=LC_ALL)
|
|
Sets the locale for category to the
default given by getdefaultlocale.
setlocale(category,locale=None)
|
|
Sets the locale for category to the given
locale, if not None,
and returns the setting (the existing one when
locale is None;
otherwise, the new one). locale can be a
string, or a pair of strings
(lang,encoding).
When locale is the empty string
'', setlocale sets the
user's default locale.
Like
locale.format('%f',num).
Like
cmp(str1,str2),
but according to the LC_COLLATE setting.
Returns a string sx such that the built-in
comparison (e.g., by cmp) of strings so
transformed is equivalent to calling
locale.strcoll on the original strings.
strxfrm lets you use the decorate-sort-undecorate
(DSU) idiom for sorts that involve locale-conformant string
comparisons. However, if all you need is to sort a list of strings in
a locale-conformant way,
strcoll's simplicity can make it
faster. The following example shows two ways of performing such a
sort; in this case, the simple variant is often faster than the DSU
one:
import locale
# simpler and often faster
def locale_sort_simple(list_of_strings):
list_of_strings.sort(locale.strcoll)
# less simple and often slower
def locale_sort_DSU(list_of_strings):
auxiliary_list = [(locale.strxfrm(s),s) for s in
list_of_strings]
auxiliary_list.sort( )
list_of_strings[:] = [s for junk, s in auxiliary_list]
10.10.2 The gettext Module
A
key issue in internationalization is the ability to use text in
different natural languages, a task also called
localization. Python supports localization via
module gettext, inspired by GNU
gettext. Module gettext is
optionally able to use the latter's infrastructure
and APIs, but is simpler and more general. You do not need to install
or study GNU gettext to use
Python's gettext
effectively.
10.10.2.1 Using gettext for localization
gettext does not deal with automatic translation
between natural languages. Rather, gettext helps
you extract, organize, and access the text messages that your program
uses. Use each string literal subject to translation, also known as a
message, as the argument of a function named
_ (underscore) rather than using it directly.
gettext normally installs a function named
_ in the _ _builtin_ _ module.
To ensure that your program can run with or without
gettext, conditionally define a do-nothing
function, also named _, that just returns its
argument unchanged. Then, you can safely use
_('message')
wherever you would normally use the literal
'message'.
The following example shows how to start a module for conditional use
of gettext:
try: _
except NameError:
def _(s): return s
def greet( ): print _('Hello world')
If some other module has installed gettext before
you run the previous code, function greet outputs
a properly localized greeting. Otherwise, greet
outputs the string 'Hello
world' unchanged.
Edit your sources, decorating all message literals with function
_. Then, use any of various tools to extract
messages into a text file (normally named
messages.pot), and distribute the file to the
people who translate messages into the natural languages you support.
Python supplies a script pygettext.py (in
directory Tools/i18n in the Python source
distribution) to perform message extraction on your Python sources.
Each translator edits messages.pot and produces
a text file of translated messages with extension
.po. Compile the .po files
into binary files with extension .mo, suitable
for fast searching, using any of various tools. Python supplies a
script Tools/i18n/msgfmt.py usable for this
purpose. Finally, install each .mo file with a
suitable name in an appropriate directory.
Conventions about which directories and names are suitable and
appropriate differ among platforms and applications.
gettext's default is subdirectory
share/locale/<lang>/LC_MESSAGES/ of
directory sys.prefix, where
<lang> is the language's
code (normally two letters). Each file is typically named
<name>.mo, where
<name> is the name of your application or
package.
Once you have prepared and installed your .mo
files, you normally execute from somewhere in your application code
such as the following:
import os, gettext
os.environ.setdefault('LANG', 'en') # application-default language
gettext.install('your_application_name')
This ensures that calls such as _('message')
henceforward return the appropriate translated strings. You can
choose different ways to access gettext
functionality in your program, for example if you also need to
localize C-coded extensions, or to switch back and forth between
different languages during a run. Another important consideration is
whether you're localizing a whole application, or
just a package that is separately distributed.
10.10.2.2 Essential gettext functions
Module gettext
supplies many functions; this section documents the ones that are
most often used.
install(domain,localedir=None,unicode=False)
|
|
Installs in Python's built-in namespace a function
named _ that performs translations specified by
file <lang>/LC_MESSAGES/<domain>.mo
in directory localedir, with language code
<lang> as per
getdefaultlocale. When
localedir is None,
install uses directory
os.path.join(sys.prefix,'share','locale'). When
unicode is true, function
_ accepts and returns Unicode strings rather than
plain strings.
translation(domain,localedir=None,languages=None)
|
|
Searches for a .mo file similarly to function
install. When languages
is None, translation looks in
the environment for the lang to use, like
install. However,
languages can also be a list of one or
more lang names separated by colons
(:), in which case translation
uses the first of these names for which it finds a
.mo file. Returns an instance object that
supplies methods gettext (to translate a plain
string), ugettext (to translate a Unicode string),
and install (to install gettext
or ugettext under name _ into
Python's built-in namespace).
Function translation offers more detailed control
than install, which is like
translation(domain,localedir).install(unicode).
With translation, you can localize a single
package without affecting the built-in namespace by binding name
_ on a per-module basis, for example with:
_ = translation(domain).ugettext translation also lets you switch globally between
several languages, since you can pass an explicit
languages argument, keep the resulting
instance, and call the install method of the
appropriate language as needed:
import gettext
translators = { }
def switch_to_language(lang, domain='my_app',
use_unicode=False):
if not translators.has_key(lang):
translators[lang] = \
gettext.translation(domain, languages=lang)
translators[lang].install(use_unicode)
|