You use Globalization support to store and retrieve data in native language and format. Oracle supports a wide variety of languages and character sets. Globalization support lets you communicate with end users in their native language using their familiar date formats, number formats and sorting sequence. Oracle uses Unicode (a worldwide encoding standard for computing usage) to support the languages.
You define the database character set when you create the database using the CHARACTER SET clause of the CREATE DATABASE statement. The character set stores table names, column names and so on in the dictionary, stores PL/SQL in memory, etc. If you do not specify the character set Oracle uses the US7ASCII character set. US7ASCII is a 7 bit ASCII character set that uses a single byte to store a character set, and it represent 128 characters.
Other widely used single-byte character sets are WE8ISO8859P1 and UTF8. These character sets use 8 bits to represent a character and can represent 256 characters. Oracle also supports multi byte character encoding. Multi byte character encoding is used to represent such languages as Chinese, Japanese, etc. Multi byte encoding schemes can be fixed width encoding schemes or variable width encoding schemes. In a variable width encoding scheme certain characters are represented using 1 byte, and two or more bytes are used to represent other characters.
You can change the database character set only if the new character set is a superset of the current character set.
Oracle lets choose an additional character set that enhances the character processing capabilities. You can specify the second character set when you create the database using the NATIONAL CHARACTER SET clause. If you do not specify NATIONAL CHARACTER SET, Oracle uses the Unicode character set AF16UTF16. The national character set stores data in NCHAR, NVARCHAR2, and NCLOB data type columns.
The national character set can be either AF16UTF1 or UTF8. AF16UTF16 is the default. AF16UTF16 and UTF8 are Unicode character sets. You can not specify AF16UTF16 as the database character set. When choosing a multi byte character set for your database remember that by default the VARCHAR2, CHAR, NVARCHAR2 and NCHAR data types specify the maximum length in bytes, not in characters. You can change this default behavior by setting the NLS_LENGTH_SEMANTICS=CHAR or by providing the semantic information along the column definition (as in VARCHAR2(20 CHAR)).
The client machine can specify a character set different from the database character set by using local environment variables. The database character set should be a superset to the client character set. Oracle converts the character set automatically, but there is some overhead associated with the conversion.
Certain character sets support multiple languages. For example, WE8ISO8859P1 can support all European languages.
The Unicode character
set
Unicode is a universal character encoding scheme that allows you to store information from any major language using a single character set. Unicode provides a unique code value for every character, regardless of the platform, language or program. Unicode has both 16-bit and 8-bit encoding.
UTF-16 is the 16 bit encoding of Unicode. It is a fixed width multi byte encoding in which the character codes 0x00 through 0x7F have the same meaning as they do in ASCII. One Unicode character is 2 bytes in this encoding. Characters from both European and Asian scripts are presented in 2 bytes. AF16UTF16 is UTF-16 encoded character set.
UTF-8 is the 8-bit encoding of Unicode. It is a variable-width multi byte encoding in which the character set. One unicode character can be 1 byte, 2 bytes or 3 bytes in this encoding.
Using NLS parameters
You can customize the NLS parameters in the following ways:
- By specifying in the initialization file, which will be used at the instance startup (NLS_DATE_FORMAT = YYYY-MM-DD).
- By setting the parameter as an environment variable (csh: setenv NLS_DATE_FORMAT YYYY-MM-DD or using Microsft Windows Registry.
- By setting the parameter in the Oracle session using ALTER SESSION SET NLS_DATEFORMAT = YYYY-MM-DD
- By using certain SQL functions (TO_CHAR (SYSDATE, YYYY-MM-DD, NLS_DATE_LANGUAGE=AMERICAN)).
The parameter specified in SQL functions have the highest priority, the next highest is the parameter ALTER SESSION, then environment variable, init parameters, and finally the database parameters.
You can not change certain parameters using ALTER SESSION and you can not specify certain parameters as environment variables.
NLS_LENGTH_SEMANTICS specified at the session level or as init parameter. Defines the character length semantics as byte or character. The default is BYTE. NLS_LENGTH_SEMANTICS does nto apply to tables on SYS and SYTEM they are always in BYTE semantic.
NLS_LANG specified only as environment variable. Has 3 parts the language, the territory and the character set. None of the parts are mandatory. The format to specify NLS_LANG is <LANGUAGE>_<TERRITORY>.<CHARACTER SET>. The language specifies the language be used for displaying Oracle messages, day names, month names, etc. The territory specifies the default date format, numeric formats, and monetary formats. The character set specifies the character set to be used by the client machine for example AMERICAN_AMERICA.WE8ISO8859P1.
NLS_LANGUAGE specified at the session level or as init parameter. Sets the language to be used. The session value overrides the NLS_LANG setting. The default values for parameters such as NLS_CURRENCY, NLS_ISO_CURRENCY, NLS_DATE_FORMAT, and NLS_NUMERIC_CHARACTERS are derived from NLS_TERRITORY.
NLS_DATE_FORMAT specified at the session level or as environment variable, or as init parameter. Sets default format for date displays.
NLS_DATE_LANGUAGE - specified at the session level or as environment variable, or as init parameter. Sets language explicitly date and month names.
NLS_TIMESTAMP_FORMAT - specified at the session level or as environment variable, or as init parameter. Sets default format for date displays.
NLS_CALENDAR specified at the session level, or as environment variable, or as init parameter. Sets the calendar Oracle uses.
NLS_NUMERIC_CHARACTERS specified at the session level, or as environment variable, or as an init parameter. Specifies the decimal character and group separator.
NLS_CURRENCY specified at the session level, or as environment variable or as init parameter. Specifies a currency symbol.
NLS_SO_CURRENCY specifies the ISO currency symbol.
NLS_DUAL_CURRENCY specifies an alternative currency symbol.
NLS_SORT language for sorting.
Obtaining NLS data
dictionary information
NLS_DATABASE_PARAMETERS shows NLS parameters defined for the database.
NLS_INSTANCE_PARAMETERS - NLS parameters defined for the instance.
NLS_SESSION_PARAMETERS - NLS parameters defined for the current session.
V$NLS_VALID_VALUES shows the allowed values for the language territory and character set definitions.
The following examples show NLS information from the data dictionary view and examples of changing session NLS values.
SQL> SELECT * FROM NLS_DATABASE_PARAMETERS;
Restrictions of
Globalization support
- Keywords are displayed only in English
- No fixed-width multi-byte character sets can be used as database character sets
- Variable-width character sets sue space more efficiently.
Display of the
characters on the screen
Database character sets Character sets defined at the time of database creation Data columns stored with data type CHAR, VARCHAR2, CLOB and LONG Store varying width multi-byte character sets |
National character sets Character sets defined at the timke of database creation Data columns stored with data type NCHAR, NVARCHAR2 and NCLOB Store varying width multi byte character sets Store fixed-width multi byte character sets |
If you have string operations, the string operations might be faster if you select the fixed width multi byte character set to represent the national character set.
Variable width character sets can save storage space.
Globalization support
parameters
Init parameters does not affect client
Environment variables
Alter session
The init parameters that specify the default language dependent behavior are NLS_LANGUAGE and NLS_TERRITORY.
You can use the NLS_LANGUAGE parameter to specify the value for the language-dependent session characteristics. One of the language-dependent session characteristics is the display of the error messages. The server messages are stored in binary frmat files. Multiple versions of these files exist, one ofr each supported language. The files use specific naming conventions. For example, the name of the file storing server messages in French is ORAF.MSB. Another rsession characteristic is the language used for the display of day and month names and their abbreviations. Language dependent session characteristics also include symbols AM, PM, BC. Etc. The language dependent session characteristics also include default sorting mechanism.
The NLS_TERRITORY parameter specifies the conventions for the default numeric formatting characteristics such as date format, local currency symbol, ISO currency symbol. Some other default conventions that are specified by the NLS_TERRITORY parameter are decimal character and group separator, the list separator the week start day, the credit and debit symbols and the ISO week number calculation.
Both NLS_LANGUAGE and NLS_TERRITORY can be specified either as init parameters or as ALTER SESSION parameters, but they can not be specified as environment variables.
Derived globalization
support parameters
The derived parameters of NLS_LANGUAGE are NLS_DATE_LANGUAGE (language for the day and month names) and NLS_SORT (linguistic sort). Both override NLS_LANGAUGE.
The derived parameters of NLS_TERRITORY are NLS_CURRENCY (, NLS_ISO_CURRENCY, NLS_DATE_FORMAT (MM/DD/YYYY), and NLS_NUMERIC_CHARACTERS.
Overriding default
globalization support behavior
The NLS_LANG environment variable has 3 components language, territory and charset NLS_LANG=<language>_<territory>.<charset>
Language dependent behavior of a session
An implicit ALTER SESSION command is normally executed when a session connects to a database. Users can also use the ALTER SESSION command during a session to change the globalization support behavior. If NLS_LANG is not defined, the implicit ALTER SESSION is not executed.
Globalization and
sorting
A conventional sorting mechanism is the binary sort. The linguistic sorting of multi-byte character sets is not supported. Therefore, if the database character set is multi-byte the result of any sort is binary.
The NLS_COMP init parameter controls the method used by the comparison operators such as <, >, etc. When this parameter is set to NASI(, the values are compared by the binary value of the string. When the parameter is set to ANSI the values are compared by the linguistic sort sequence.
SQL functions
Some SQL functions can use globalization support features TO_CHAR, TO_DATE, TO-NUMBER, NLS_UPPER, NLS_LOWER, NLS_INITCAP, NLSSORT.
The parameters that can be used with SQL functions are NLS_DATE_LANGUAGE, NLS_CALENDAR, NLS_NUMERIC_CHARACTERS, NLS_CURRENCY, NLS_ISO_CURRENCY and NLS_SORT.
TO_DATE NLS_DATE_LANGUAGE, NLS_CALENDAR
TO_NUMBER NLS_NUMERIC_CHARACTER, NLS_CURRENCY, NLS_ISO_CURRENCY
TO_CHAR NLS_DATE_LANGUAGE, NLS_NUMERIC_CHARACTERS, NLS_CURRENCY, NLS_ISO_CURRENCY, NLS_CALENDAR
NLS_UPPER, NLS_LOWER, NLS_INITCAP, NLS_SORT NLS_SORT
The following globalization parameters are NOT accepted in SQL functions NLS_LANGUAGE, NLS_TERRITORY, NLS_DATE_FORMAT.
Character migration
When you migrate from a single byte character set to a multi byte variable width character set some characters transalate into 2 byte characters. Therefore, these characters may not fit into the existing columns the data is truncated. During the data scanning 3 tests are performed to ensure that the conversion is OK.
The maximum size of the LONG and LOB data types are 2Gb and 4GB respectively.
Scanner process invocation steps
Obtaining information
on character sets
SQL> SELECT parameter, value from nls_databse_parameters
Where parameter lile %characterset%;
The views that contain information about the globalization support are:
NLS_DATABASE_PARAMETERS, NLS_INSTANCE_PARAMETRS, NLS_SESSION_PARAMETERS, V$NLS_VALID_VALUES, V$NLS_PARAMETERS (contains real time instance specific information)..
To retrieve information about the supported languages, you can issue the select statement displayed on the screen
SQL> SELECT * FROM V$NLS_VALID_VALUES
WHERE PARAMETER=LANGUAGE;
Test Identify the
tests that are performed on data during scan
- Find out whether the post conversion data fits into the current column size
- Find out whether the character codes of the data change
- Determine whether the data cells can be successfully converted
Select the features
of various semantics
- Number of byte semantics is equal to the naber of characters
- Semantics can be define at both database and column levels
- Byte semantics define the bytes that can be stored in a column
- NLS_LENGTH_SEMANTICS parameter can be set dynamically