Internet Draft Internet-Draft E. Wada Fujitsu Laboratories Expires in six months J. Murai Keio University K. Yamamoto IIJ Research Laboratory January, 1999 Japanese Character Encoding Scheme for Internet Messages <draft-yamamoto-charset-iso-2022-jp-02.txt> Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This memo describes the character encoding scheme used in electronic mail, NetNews, and world wide web messages in Japanese networks. It was first specified by and used in JUNET then described in RFC 1468. The name of this character encoding scheme was originally known as 'JUNET code' and is now called 'ISO-2022-JP' when used in the context of MIME. In ISO-2022-JP text, both one 7-bit byte Latin script (ASCII or JIS X 0201 Latin set) and two 7-bit bytes Kanji, Hiragana, Katakana and some other symbols and characters (JIS X 0208) are employed. Switching the graphic character sets is based on the extension techniques defined in ISO 2022. This memo eliminates some ambiguities of RFC 1468. However, it NEVER introduces any essential changes against RFC 1468. Since the character encoding scheme is now widely used in Japanese IP communities, backward compatibility is most important in this revision. This memo revises RFC 1468 on the following points: Yamamoto [Page 1] Internet-Draft ISO-2022-JP CHARSET January 1999 1. It is clarified that ISO-2022-JP does NOT conform to ISO 2022. 2. The formal syntax is divided into two new yet compatible rules, namely, ISO-2022-JP decoding syntax and ISO-2022-JP encoding syntax. 3. The bit combinations permitted in JIS X 0208 are explicitly described in the syntax so that the invalid character positions will be excluded in ISO-2022-JP text. 4. Recommended graphic character sets are specified. That is, ASCII is RECOMMENDED rather than JIS X 0201 1976 Latin set. ONLY to embed YEN SIGN and OVER LINE, JIS 0201 Latin set MAY be used. Also, JIS X 0208 1983 (including 1990) is RECOMMENDED rather than JIS X 0208 1978. 1. Introduction Efforts to exchange Japanese text via electronic mail[MAIL] and NetNews[NETNEWS] messages started in the early days of JUNET[JUNET], consisted of networks arbitrarily connected by UUCP in Japan. There was strong demand for exchanging ASCII[ASCII], JIS X 0201 Latin set[JISX0201-76], and JIS X 0208 graphic character sets. JIS X 0201 Latin set is identical to ASCII except for REVERSE SOLIDUS (BACKSLASH) and TILDE. REVERSE SOLIDUS and TILDE are replaced by YEN SIGN and OVER LINE, respectively. This set is Japan's national variant of ISO 646[ISO646]. Note that section 4.1.2 of RFC 2046 discourages use of ISO 646 in electronic mail. The JIS X 0208 graphic character sets consist of Kanji, Hiragana, Katakana and some other symbols and characters. Each character takes up two bytes. At that time, some implementations of message transfer agents (MTA) had 7-bit restriction in their transport connection. Moreover, European countries used 8-bit text to convey their languages. It was thus crucial to design a 7-bit safe format to carry Japanese text both for robustness against such MTAs and for distinction from European languages. Popular character encoding schemes(CES) for Japanese graphic character sets were EUC-JP and Shift_JIS. However, they were not chosen for a transfer form of Japanese text because they are 8-bit. To encode ASCII, JIS X 0201, JIS X 0208 1978[JISX0208-78] and 1983[JISX0208-83], JUNET code was designed. This CES switches these four graphic character sets according to the extension techniques defined in ISO 2022[ISO2022], which is described later in this memo. During JUNET days, some systems erroneously used the escape sequence "ESC ( H" for JIS X 0201 1976 due to a typological error in the JIS book. This escape sequence is officially registered for a Swedish graphic character set[ISOREG] and MUST NOT have been used in JUNET code. JUNET had difficulty in eliminating this misusage. In 1993, the CES was described in RFC 1468[OLDSPEC] and the name "ISO-2022-JP" was given to identify it in the context of MIME[MIME]. Yamamoto [Page 2] Internet-Draft ISO-2022-JP CHARSET January 1999 The word "encoding" is ambiguous because it is used for both the extension techniques and for conversion of MIME's transport safe. For this reason, throughout this memo, the word "encoding" is always preceded by a prefix word to clarify its semantics and the word "charset"[CHARSET] is introduced to describe ISO-2022-JP CES. "ISO-2022-JP encoding" means creation of ISO-2022-JP text by MIME composers while "ISO-2022-JP decoding" refers to interpretation of ISO-2022-JP text by MIME viewers. At the time of writing RFC 1468, one more revision for JIS X 0208, namely 1990[JISX0208-90], was available. RFC 1468, however, does NOT suggest using the revision identification sequence "ESC & @" even if two characters added in JIS X 0208 1990 are used. This choice was practical from the empirical point of view but it was a departure from the ISO 2022 platform. Strictly speaking, ISO-2022-JP does NOT conform to ISO 2022 though the name is associated with ISO 2022. The purposes of this memo are mainly to clarify ISO-2022-JP CES and to fix a bug of ABNF[ABNF] in RFC 1468, which failed to express null lines. To avoid confusion, NO new escape sequence for JIS X 0208 1990 NOR for JIS X 0212 1990[JISX0212-90] is introduced. Such efforts should be elaborated in other rich charsets. NOTE: One more revision for JIS X 0208, namely 1997, is available [JISX0208-97]. Since it is not a new graphic character set, most parts of this memo do NOT explicitly talk about it. Likewise, JIS X 0201 1997[JISX0201-97] does NOT appear in the main body of this memo. For more information, see Historical Note. 2. Standard Keywords The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [KEYWORDS]. 3. Description For historical reasons, ISO-2022-JP can contain four graphic character sets, ASCII, JIS X 0201 1976 Latin set, JIS X 0208 1978 and JIS X 0208 1983 (including 1990). A charset should start with and should end in ASCII since many message systems assume so. Moreover, each line should be self-composite so that lines can be safely split. ISO-2022-JP was designed to satisfy these requirements. Its definition is as follows: ISO-2022-JP text MUST start with ASCII and MUST end with ASCII. Each line of ISO-2022-JP text MUST end with ASCII or JIS X 0201 1976 (i.e. before CRLF). The next line starts in the graphic character set that was switched to before the end of the previous line. ISO-2022-JP is NOT completely self-composite because a starting Yamamoto [Page 3] Internet-Draft ISO-2022-JP CHARSET January 1999 graphic character set of a line is not known if the previous line is missing. This is one of the reasons to recommend ASCII, instead of JIS X 0201 1976, in ISO-2022-JP. An escape sequence is used to specify a start boundary of a graphic character set different form the preceding graphic character set. The following table gives the escape sequences and the graphic character sets used in ISO-2022-JP text. The "ISOREG" is the registration number in ISO's registry[ISOREG]. ESC Seq Graphic Character Set ISOREG ESC ( B ASCII 6 ESC ( J JIS X 0201 1976 (Latin set) 14 ESC $ @ JIS X 0208 1978 42 ESC $ B JIS X 0208 1983 87 For example, the escape sequence "ESC $ B" indicates that the bytes following this escape sequence are some characters of JIS X 0208 1983. To switch back to ASCII, the escape sequence "ESC ( B" is used. For the control characters in ASCII, [MIME] and [MAIL2] define CR and LF only. Section 4.1.2 of RFC 2046 says that HT and FF have de facto meaning. To make ISO-2022-JP syntactically and semantically larger than US-ASCII, this memo defines the semantics of ESC only. Semantics of other control characters in ASCII is outside the scope. Note that the extension techniques defined here switches graphical character sets, not control character sets. So, ISO-2022-JP CES assumes that %d0-31 is always the control character set in ASCII. If Japanese text represented within the four graphic character sets is transferred by SMTP[SMTP] or NNTP[NNTP], it is STRONGLY RECOMMENDED to be in ISO-2022-JP form. Not all implementations distinguish JIS X 0208 1978 and 1983. Moreover, if ISO-2022-JP text including both ASCII and JIS X 0201 1976 AND/OR ISO-2022-JP text including both JIS X 0208 1978 and JIS X 0208 1983 is converted to EUC-JP or Shift_JIS, original ISO-2022-JP text could not be re-produced. So, to implement best interoperability, message composers SHOULD use ASCII and JIS X 0208 1983 ONLY. However, message composers MAY use JIS X 0201 1976 to embed YEN SIGN and OVER LINE. Message viewers MUST accept all four graphic character sets. Please note that the best strategy for interoperability is "TO BE LIBERAL IN WHAT YOU ACCEPT, AND CONSERVATIVE IN WHAT YOU SEND"[GUIDE]. Yamamoto [Page 4] Internet-Draft ISO-2022-JP CHARSET January 1999 JIS X 0201 Kana set MUST NOT be used in ISO-2022-JP text. It may erroneously appear in ISO-2022-JP text with the following sequences: <8-bit Kana set char> SO <7-bit Kana set char> SI SO <8-bit Kana set char> SI ESC ( I <7-bit Kana set char> ESC ( B ESC ) I <8-bit Kana set char> JIS X 0212 1990 MUST NOT be embedded in ISO-2022-JP text. It may erroneously appear following "ESC $ ( D" in ISO-2022-JP text. To embed JIS X 0212 1990, other charsets such as ISO-2022-JP-2 [ISO-2022-JP-2] SHOULD be used. IMPLEMENTATION NOTE: Even if a message composer does not provide any input method for graphic character sets which is not used in ISO-2022-JP, such as JIS X 0201 Kana set and JIS X 0212 1990, there are ways to include them by accident. Typical cases are citing and forwarding a message including JIS X 0201 Kana set and/or JIS X 0212 1990 (but labeled as ISO-2022-JP). IMPLEMENTATION NOTE ON CITATION: If JIS X 0201 Kana set is cited, it MUST be removed anyway. One possible deletion is to convert the characters to corresponding characters in JIS X 0208 1983. Please note that as of this writing there doesn't exist a charset to contain JIS X 0201 Kana set. When JIS X 0212 1990 is cited, MIME composers MUST take one of the followings: (1) Remove all characters in JIS X 0212 1990 and specify ISO-2022-JP for the charset parameter. (2) Specify other richer charset such as ISO-2022-JP-2. IMPLEMENTATION NOTE ON FORWARDING: Since this memo talks about ISO-2022-JP text only, forwarding an entire message is outside the scope. But MIME composers are REQUIRED to forward valid messages in the context of MIME. Mismatch between the charset parameter and its content body MUST be eliminated. The situation would be mixed up when the invalid message was digitally signed. But again, procedures to create valid messages are outside the scope of this memo. 4. Formal Syntax Two sets of formal syntax rules are defined in this memo. One is ISO-2022-JP decoding syntax for message viewers and the other is ISO-2022-JP encoding syntax for message composers. ISO-2022-JP decoding syntax is designed to be redundant to maintain backward compatibility with existing message composers while ISO-2022-JP encoding syntax is designed to be necessary and sufficient. Readers are assumed to be familiar with the notational conventions defined in [ABNF] and [MAIL2]. Yamamoto [Page 5] Internet-Draft ISO-2022-JP CHARSET January 1999 4.1 Rules for ISO-2022-JP decoding syntax ISO-2022-JP decoding syntax that embeds ASCII, JIS X 0201 1976, JIS X 0208 1978, JIS X 0208 1983 (including 1990) is defined as follows (see also Figure 1): iso-2022-jp-text-d = *( line-d CRLF ) line-d ; When text is terminated by CRLF, the last 'line-d' is empty. line-d = *single-byte-char-d *(*double-byte-segment-d single-byte-segment-d) ; line-d MUST be limited to 998 bytes according to [MAIL2]. single-byte-segment-d = single-byte-designator-d *single-byte-char-d single-byte-designator-d = %d27 %d40 ( %d66 / %d74 ) ; ESC "(" ( "B" / "J" ) single-byte-char-d = %d0-9 / %d11-12 / %d14-26 / %d28-127 ; All bit combinations of ASCII code excluding ; LF(%d10), CR(%d13), and ESC(%d27). double-byte-segment-d = double-byte-designator-d *double-byte-char-d double-byte-designator-d = %d27 %d36 ( %d64 / %d66 ) ; ESC "$" ( "@" / "B" ) double-byte-char-d = %d33-126.33-126 ; All bit combinations of 94 x 94 graphic character set NOTE: ISO-2022-JP decoding syntax above describes valid byte combinations for ISO-2022-JP. Other byte patters are invalid. Message viewers MUST handle any combinations of %d0-255 and MUST safely ignore the invalid patterns. NOTE: The old specs, RFC 822 and RFC 1468, allow bare CR and bare LF. MIME prohibits them. Though [MAIL2] allows message viewers to accept bare CR and bare LF, it prohibits message composers from generating them. This memo conforms this recent trend. Bare CR and bare LF are irrelevantly accepted according to the note above. The ISO-2022-JP encoding syntax defined below prohibits message composers from generating them. Yamamoto [Page 6] Internet-Draft ISO-2022-JP CHARSET January 1999 +----------------------> | +----------> | | +--+ | +--+ | | +----------> | +->|1S|-> +->|1C|->->-> exit | +--+ | | | +--+ | +--+ | | line-d --> +->|1C|->-> | <-------+ v | +--+ | | +----------> | <-------+ | +--+ | +--+ | | +->|2S|-> +->|2C|->-> | +--+ | +--+ | | | <-------+ | <--------------------+ Figure 1: 'line-d' for ISO-2022-JP decoding syntax NOTE: 1S, 1C, 2S, and 2C stand for 'single-byte-designator-d', 'single-byte-char-d', 'double-byte-designator-d', and 'double-byte-char-d', respectively. 4.2 Rules for ISO-2022-JP encoding syntax To create ISO-2022-JP text or string, only characters in ASCII and JIS X 0208 1983 (including 1990) SHOULD be used. But if YEN SIGN and/or OVER LINE of JIS X 0201 1976 are to be used, the following rules SHOULD be followed: (1) It is RECOMMENTED to use YEN SIGN(%d33.111) and OVER LINE(%d33.49) of JIS X 0208 1983 instead of YEN SIGN(%d92) and OVER LINE(%d126) of JIS X 0201 1976, respectively. (2) If sequences of JIS X 0201 1976 Latin set can be produced, YEN SIGN and/or OVER LINE of the set MAY be used preceded by the designator. (3) Otherwise, YEN SIGN and OVER LINE of JIS X 0201 1976 SHOULD be converted to REVERSE SOLIDUS and TILDE of ASCII, respectively. IMPLEMENTATION NOTE: Display of YEN SIGN and OVER LINE of JIS X 0201 1976 is highly dependent on the receiving system. REVERSE SOLIDUS and TILDE of ASCII may be displayed. And again, if ISO-2022-JP text including both ASCII and JIS X 0201 1976 is converted to EUC-JP or Shift_JIS, original ISO-2022-JP text could not be re-produced. This would cause verification failure of digital signature. The following is ISO-2022-JP encoding syntax to accomplish rule (1) or (3) above (see also Figure 2) iso-2022-jp-text-e = *( line-e CRLF ) line-e ; When text is terminated by CRLF, the last 'line-e' is empty. line-e = *single-byte-char-e [*i-segment-e f-segment-e] ; line-e MUST be limited to 998 bytes and SHOULD be limited to ; 78 bytes according to [MAIL2]. Yamamoto [Page 7] Internet-Draft ISO-2022-JP CHARSET January 1999 i-segment-e = double-byte-designator-e 1*double-byte-char-e single-byte-designator-e 1*single-byte-char-e f-segment-e = double-byte-designator-e 1*double-byte-char-e single-byte-designator-e *single-byte-char-e single-byte-designator-e = %d27 %d40 %d66 ; ESC "(" "B" single-byte-char-e = %d1-9 / %d11-12 / %d14-26 / %d28-127 ; All bit combinations of ASCII code excluding ; NUL(%d0), LF(%d10), CR(%d13), and ESC(%d27). double-byte-designator-e = %d27 %d36 %d66 ; ESC "$" "B" double-byte-char-e = %d33.33-126 / %d34.(33-46 / 58-65 / 74-80 / 92-106 / 114-121 / 126) / %d35.(48-57 / 65-90 / 97-122) / %d36.33-115 / %d37.33-118 / %d38.(33-56 / 65-88) / %d39.(33-65 / 81-113) / %d40.33-64 / %d48-78.33-126 / %d79.33-83 / %d80-115.33-126 / %d116.33-38 ; The valid bit combinations of JIS X 0208 1990. +--------------------------------------> | +----------> | | | +--+ | | +----------> | | +->|1C|->->-> exit | +--+ | | +--+ +--+ +--+ | | +--+ | line-e --> +->|1C|->->->->|2S|->->|2C|->->|1S|-> <-------+ | +--+ | | +--+ | +--+ | +--+ | +--+ <-------+ | <-------+ +-->->|1C|->-> | | +--+ | | | <-------+ | <------------------------------------+ Figure 2: 'line-e' for ISO-2022-JP encoding syntax NOTE: 1S, 1C, 2S, and 2C stand for 'single-byte-designator-e', 'single-byte-char-e', 'double-byte-designator-e', and 'double-byte-char-e', respectively. 5. MIME Considerations This section discusses how to use ISO-2022-JP text or string (i.e. excluding CRLF) in the context of MIME. The scope is limited to the case where messages are transferred by SMTP[SMTP] or Yamamoto [Page 8] Internet-Draft ISO-2022-JP CHARSET January 1999 NNTP[NNTP]. Other protocols are outside the scope of this memo. 5.1 The Charset Parameter To identify the charset described in this memo, "ISO-2022-JP" MUST be used for the charset parameter wherever it is allowed. Please note that the charset parameter is case-insensitive. For the charset parameter, MIME composers SHOULD choose as small charset as possible. For example, if a content body is ASCII, US-ASCII SHOULD be specified instead of ISO-2022-JP. This rule improves interoperability with other MIME viewers which do not support various charsets. To the contrary, MIME viewers SHOULD be able to handle text labeled ISO-2022-JP even if its content is ASCII only. Remember the liberal/conservative rule. 5.2 Content Transfer Encoding ISO-2022-JP text is already in 7-bit form and MIME-encoding, such as Base64 and Quoted-Printable, makes ISO-2022-JP text unreadable for non-MIME viewers. For this reason, MIME composers SHOULD choose "7bit" (case-insensitive) and SHOULD NOT MIME-encode ISO-2022-JP text with Base64 nor Quoted-Printable. Note that "7bit" means that no MIME-encoding is applied and its syntax is in 7-bit form. MIME defines that "7bit" MIME-encoding is assumed if content transfer encoding is not present. So, this field can be omitted. In certain situation, Base64 or Quoted-Printable MAY be used. Base64 is RECOMMENDED to MIME-encode ISO-2022-JP text rather than Quoted-Printable. This is because Base64 encoding is considered more suitable for ISO-2022-JP text than Quoted-Printable encoding for the following reasons: (1) Quoted-Printable encoding is designed for text which is mostly ASCII but ISO-2022-JP text is NOT. (2) Quoted-Printable encoding produces various results while Base64 encoding creates a unique output if folding is ignored. This means that it is easier to implement interoperability for Base64 encoding than for Quoted-Printable encoding. IMPLEMENTATION NOTE: The key word "=" of Quoted-Printable encoding is used in JIS X 0208 (see 'double-byte-char-e'). If only the ASCII portion of ISO-2022-JP text is MIME-encoded with Quoted-Printable, it is likely that such text cannot be MIME-decoded to the original since the "=" characters in JIS X 0208 portion are considered to be the key word of Quoted-Printable. If ISO-2022-JP text needs to be MIME-encoded with Quoted-Printable, the entire text must be MIME-encoded. Some old implementations made this mistake. MIME viewers MUST be able to handle ISO-2022-JP text even if it is encoded with Base64 or Quoted-Printable. Also, MIME viewers SHOULD be able to handle ISO-2022-JP text even if its MIME-encoding is Yamamoto [Page 9] Internet-Draft ISO-2022-JP CHARSET January 1999 specified as "8bit". 5.3 Header Extensions In a header or in MIME content headers where `encoded-word's are allowed, Japanese string (i.e. excluding CRLF) which consists of the four graphic character sets MUST be in the form defined in RFC 2047[MIME]. For "charset", "ISO-2022-JP" (case-insensitive) SHOULD be specified. ISO-2022-JP string MUST end with ASCII (see 'line-e'). Message composers SHOULD use "B" encoding for ISO-2022-JP string while message viewers MUST be able to handle both "B" and "Q" encoding. IMPLEMENTATION NOTE: It should be noted that these requirements above apply to any fields in MIME content headers. For example, RFC 2047 mechanism MUST be used to represent ISO-2022-JP string when embedded in Content-Description: content header in a part of a multipart. ISO-2022-JP string MUST NOT be embedded directly in a header NOR in MIME content headers. EUC-JP string and Shift_JIS string SHOULD NOT be inserted even if they are in the form defined in RFC 2047. The reason why ISO-2022-JP string and "B" encoding is recommended to embed Japanese string in header fields is that RFC 1468 defines so. For historical reasons, not many message viewers support EUC-JP, Shift_JIS, and "Q" encoding, The "language" extension defined in RFC 2231[PARAMETER] SHOULD NOT be used without any special purposes. 5.4 MIME Parameter Extensions To embed Japanese string (i.e. excluding CRLF) to a MIME parameter value, the string MUST be in the form defined in RFC 2231. For "charset", "ISO-2022-JP" (case-insensitive) SHOULD be specified. "language" SHOULD be omitted. The entire ISO-2022-JP string MUST end with ASCII. Both "extended-value" and "extended-other-values" need not to be self-composite since they are to be concatenated first when received. That is, they can contain any portion of ISO-2022-JP string which may or may not end with ASCII. IMPLEMENTATION NOTE: The MIME encoding defined in RFC 2231 is a variant of "Q" encoding. Since many key words of header (including "%" and "'") may appear in ISO-2022-JP string, they MUST be encoded with the variant. It is a good idea to encode all characters other than [0-9A-Za-z] (%d48-57 / %d65-90 / %d97-122). 6. Requirements for Message Transfer Agents (MTA) Several MTAs automatically convert any message to a local charset. Yamamoto [Page 10] Internet-Draft ISO-2022-JP CHARSET January 1999 This means that it is very likely that a charset other than ISO-2022-JP is converted to the local charset by a routine which assumes that incoming data is ISO-2022-JP text. MTAs SHOULD NOT convert charsets in general. If MTAs need to convert charsets, they MUST do so according to the charset parameter of MIME messages. Actually, some MTAs convert "ESC ( B" into "ESC ( J" and "ESC $ B" into "ESC $ @" interchangeably. Gateways SHOULD NOT convert in this way. Such MTAs are the primary cause of verification failure in digital signature. 7. Historical Note The JUNET code was originally described in the JUNET User's Guide [JUNET]. JIS X 0208, which was originally called JIS C 6226, was published in 1978. It was revised in 1983, 1990 and 1997. The revision in 1983 was made because the Japanese Government published the new Kanji graphic character set called "Joyo Kanji Hyo (Kanji List for Daily Use)" for use in daily life, in the public documents, in newspapers in 1981. Some of the character shapes were modified or simplified. To harmonize JIS X 0208 1978 with Joyo Kanji, pairs of characters were swapped in the code positions and several characters had their shapes amended. Moreover, since new characters were added to the Kanji set called "Jinmei You Kanji Beppyo (Additional Kanji List for Use with People's Name)", four characters were added to and a lot of special characters were included in JIS X 0208 1983. This graphic character set was treated as a new one and a new escape sequence "ESC $ B" was assigned. Since the Kanji set for People's names was extended, two more Kanji characters were added at the end of JIS X 0208 in 1990 and the result is called JIS X 0208 1990. The revised graphic character set is indeed backward compatible with the previous set. Leaving the designation sequence unchanged, only the revision identification sequence "ESC & @" was assigned. To be strict, the designation sequence of JIS X 0208 1990 is "ESC & @ ESC $ B". However, ISO-2022-JP CES uses "ESC $ B" for practical reasons even though the two additional Kanji characters are used. The purpose of revision in 1997 is to clarify references of each character. No new characters were introduced. As a graphic character set, JIS X 0208 1997 is equivalent to JIS X 0208 1990. The designation sequence is the same. Therefore, among versions of JIS X 0208, the version of 1978 is of a different graphic character set while the versions of 1983, 1990 and 1997 are regarded as a family. For more information on the differences, please refer to Appendix J, [GLOBEFISH]. It describes what characters have been added, swapped, Yamamoto [Page 11] Internet-Draft ISO-2022-JP CHARSET January 1999 simplified, etc, in the revisions. JIS X 0201, originally called JIS C 6220, was published in 1976. It was revised in 1997 but no change was introduced. They are thus the same graphic character set. 8. Security Considerations ISO-2022-JP is NOT believed to introduce any new security holes to the Internet. References [ABNF] D. Crocker and Paul Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC2234, November 1997. [ASCII] American National Standards Institute, "Coded character set -- 7-bit American national standard code for information interchange", ANSI X3.4-1986. [CHARSET] N. Freed and J. Postel, "IANA Charset Registration Procedures", RFC2278, January 1998. [GLOBEFISH] K. Lunde, "Understanding Japanese Information Processing", O'Reilly & Associates, Inc., 1993. [GUIDE] G. Scott, "Guide for Internet Standards Writers", RFC2360, June 1998. [ISO-2022-JP-2] M. Ohta and K. Handa, "ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP", RFC 1554, December 1993. [ISO2022] International Organization for Standardization (ISO), "Information technology -- Character code structure and extension techniques", International Standard, Ref. No. ISO 2022-1991(E). [ISO646] International Organization for Standardization (ISO), "Information technology -- ISO 7-bit coded character set for information interchange", International Standard, Ref. No. ISO/IEC 646:1991(E). [ISOREG] International Organization for Standardization (ISO), "International Register of Coded Character Sets To Be Used With Escape Sequences". [JISX0201-76] Japanese Industrial Standards Committee, "Code for information interchange", JIS C 6220 1976(aka JIS X 0201 1976). [JISX0201-97] Japanese Industrial Standards Committee, "7-bit and 8-bit coded character sets for information interchange", JIS X 0201 1997. [JISX0208-78] Japanese Industrial Standards Committee, "Code of the Yamamoto [Page 12] Internet-Draft ISO-2022-JP CHARSET January 1999 Japanese graphic character set for information interchange", JIS C 6226 1978(aka JIS X 0208 1978). [JISX0208-83] Japanese Industrial Standards Committee, "Code of the Japanese graphic character set for information interchange", JIS C 6226 1983(aka JIS X 0208 1983). [JISX0208-90] Japanese Industrial Standards Committee, "Code of the Japanese graphic character set for information interchange", JIS X 0208 1990. [JISX0208-97] Japanese Industrial Standards Committee, "7-bit and 8-bit double byte coded KANJI sets for information interchange", JIS X 0208 1997 [JISX0212-90] Japanese Industrial Standards Committee, "Code of the supplementary Japanese graphic character set for information interchange", JIS X0212 1990. [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET User's Guide (First Edition)"), February 1988. [KEYWORDS] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [MAIL] D. Crocker, "Standard for the Format of ARPA Internet Text Messages", STD 11, RFC 822, August 1982. [MAIL2] P. Resnick, "Internet Message Format Standard", draft-ietf-drums-msg-fmt-07.txt, January 1999. [MIME] The primary definition of MIME. "MIME Part 1: Format of Internet Message Bodies", RFC 2045; "MIME Part 2: Media Types", RFC 2046; "MIME Part 3: Message Header Extensions for Non-ASCII Text", RFC 2047; "MIME Part 4: Registration Procedures", RFC 2048; "MIME Part 5: Conformance Criteria and Examples", RFC 2049; November 1996. [NETNEWS] M. Horton and R. Adams, "Standard for Interchange of USENET Messages", RFC 1036, December 1987. [NNTP] B. Kantor and P. Lapsley, "Network News Transfer Protocol", RFC 977, February 1986. [OLDSPEC] J. Murai, M. Crispin, and E. van der Poel, "Japanese Character Encoding for Internet Messages", RFC 1468, June 1993. [PARAMETER] N. Freed and K. Moore, "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations", RFC 2231, November 1997. [SMTP] J. Postel, "Simple Mail Transfer Protocol", RFC 821, August 1982. Yamamoto [Page 13] Internet-Draft ISO-2022-JP CHARSET January 1999 Acknowledgements Many people contributed to RFC 1468. The original authors of RFC 1468 wished to thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi Handa. The authors of this memo would acknowledge the original work of Erik M. van der Poel and Mark Crispin. Our deep gratitude goes to Noritoshi Demizu, Kenichi Handa, Jun'ichiro Ito, Shuhei Kobayashi, Tomohiko Morioka, Makoto Murata, Erik M. van der Poel, Shigeya Suzuki, and Akira Tanaka (in alphabetical order) for their contribution to this memo. Authors' Addresses Eiiti WADA Fujitsu Laboratories, Ltd 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki 211-8588 JAPAN Phone: +81-44-754-2608 FAX: +81-44-754-2580 EMail: wada@u-tokyo.ac.jp Jun MURAI Keio University 5322 Endo, Fujisawa 252-0816 JAPAN Phone: +81-466-47-5111 FAX: +81-466-49-1101 EMail: jun@wide.ad.jp Kazuhiko YAMAMOTO Research Laboratory, Internet Initiative Japan Inc. Takebashi Yasuda Bldg., 3-13 Kanda Nishiki-cho Chiyoda-ku, Tokyo 101-0054 JAPAN Phone: +81-3-5259-6350 FAX: +81-3-5259-6351 EMail: kazu@iijlab.net Appendix An example of a MIME text object is as follows: Content-Type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit ISO-2022-JP text comes here. Yamamoto [Page 14] Internet-Draft ISO-2022-JP CHARSET January 1999 Another form of the example above is as follows: Content-Type: Text/Plain; Charset=ISO-2022-JP ISO-2022-JP text comes here. The following is an example of RFC 2047 header encoding: To: jun@wide.ad.jp Subject: =?iso-2022-jp?B?GyRCJDMkTkZ8S1w4bCQsRkkkYSRsJFAbKEI=?= OK =?iso-2022-jp?B?GyRCJEckOSEjGyhC?= From: Kazu Yamamoto (=?iso-2022-jp?B?GyRCOzNLXE9CSScbKEI=?=)The following is an example of RFC 2231 parameter encoding: Content-Disposition: attachment; filename*=iso-2022-jp''%1B%24B%25U%25%21%25%24%25k%1B%28B Change Log Differences between 01 and 02 - Clarifies that %d0-31 is always the character set of ASCII. - ISO-2022-JP is RECOMMENDED for MIME parameter extensions according to suggestions from some vendors. - The word "Character Encoding Scheme(CES)" is adopted to align to RFC2278. - Several typos are fixed. - Message viewers MUST handle any combinations of %d0-255, which was described as %d0-127. - single-byte-char-e conforms [MAIL2]. - Refers to RFC 2046 to explain one reason to discourage use of JIS X 0201. Differences between 00 and 01 - Author information was updated. - References were updated. - Refers to the globefish book to describes the differences between each revision. - Explicitly says that ISO-2022-JP is syntactically and semantically larger than US-ASCII. - line-d and line-d are limited to 998. - Clarifies that message viewers MUST handle any byte combinations. - Includes the consideration for bare CR and bare LF. Yamamoto [Page 15]