IPP> Mail encodings

IPP> Mail encodings

Yuji Sasaki sasaki at jci.co.jp
Tue Apr 4 05:18:26 EDT 2000


I made some study about mail encodings.

(1) Mail contents
The MIME format is defined in RFC2045. It expands the SMTP
protocol(RFC822) to be able to handle other than ascii plain
text format. However, the restriction of RFC822 still remains...
the contents of the mail must be 7bit.


(2) Mail headers
While the mail contents can include all 7bit codes, the
charcters can be use in mail header is more restrected.
RFC2047 defines the way to expand mail headers.
The encoded mail header becomes like;

Subject: =?iso-8859-1?q?this=20is=20some=20text?=

The key of header encoding is two ASCII charcters, '=' and '?'.
The '=?' indicates the startponit of MIME encode, while the '?='
indicate the endpoint.
Between them are he charcter set (as you can see the 'iso-8859-1'
in previouse example), and the encoding method ('?q?' in the
example).

There are to ways of encodings, 'Quoted-Printable' and 'BASE64'.
The Quoated-printable encoding('q') is somewhat like URL-encode,
using escape charcter '=' instead of '%' in the URL (in the case
of above, '=20' replesents a white blank).
The BASE64 encoding('b') is more efficient but also complex.
It is defined in RFC2045.


(3) What are we doing for Japanese e-mail
The double-byte charcter set in Japan is pretty complex.
The unix guys like to use EUC or JIS(iso-2022-jp), while Windows
and Macintosh users are using Shift-JIS. Since Unicode is the
standard charcter set for WindowsNT and WindowsCE, they also
can handle Shift-JIS for backword compatibility.

Having said that, the charcter set used in e-mail system today
is iso-2022-jp(RFC1468). The reason is simple; it is 7bit code.
That means it can be sent in mail contents as "text/plain;charset=
iso-2022-jp" MIME type. However it cannot embed in the mail
headers because the iso-2022-jp uses escape sequence to "swap"
the code page.

So, the standard Japanese e-mail is:

Header: iso-2022-jp MIME encoded with BASE64 or Quoted-Printable.
Contents: text/plain;charset=iso-2022-jp

Altough UTF-7 could be used as the alternative to iso-2022-jp
from the technical point of view, it is not populer today.

If you think the backword compatibility and the interoperability
are important(I believe they are the main reason why we choose
SMTP as one of the Notification methods), I recommend that you
SHOULDN'T restrict the charcter set used in mailto: notification
to UTF-7.

Sorry for my terrible English...
--------
Yuji Sasaki
Company E-Mail :sasaki at jci.co.jp
Personal E-Mail:crazy17 at ibm.net
Nifty-Serve    :PFG02524 at nifty.ne.jp
1066 Saratoga Avenue Suite #100 San Jose CA 95129 USA
Tel:408-551-6470 Fax:408-551-6475




More information about the Ipp mailing list