IPP Mail Archive: Re: IPP> Mail encodings

Re: IPP> Mail encodings

From: ned.freed@innosoft.com
Date: Tue Apr 04 2000 - 12:55:29 EDT

  • Next message: harryl@us.ibm.com: "RE: IPP> Document-format attribute.. [ipp-mod] clarification"

    > I made some study about mail encodings.

    > (1) Mail contents
    > The MIME format is defined in RFC2045. It expands the SMTP
    > protocol(RFC822) to be able to handle other than ascii plain
    > text format. However, the restriction of RFC822 still remains...
    > the contents of the mail must be 7bit.

    Not always. See RFC 1652. This extension is widely deployed and heavily
    used.

    > (2) Mail headers
    > While the mail contents can include all 7bit codes, the
    > charcters can be use in mail header is more restrected.
    > RFC2047 defines the way to expand mail headers.
    > The encoded mail header becomes like;

    > Subject: =?iso-8859-1?q?this=20is=20some=20text?=

    > The key of header encoding is two ASCII charcters, '=' and '?'.
    > The '=?' indicates the startponit of MIME encode, while the '?='
    > indicate the endpoint.
    > Between them are he charcter set (as you can see the 'iso-8859-1'
    > in previouse example), and the encoding method ('?q?' in the
    > example).

    > There are to ways of encodings, 'Quoted-Printable' and 'BASE64'.
    > The Quoated-printable encoding('q') is somewhat like URL-encode,
    > using escape charcter '=' instead of '%' in the URL (in the case
    > of above, '=20' replesents a white blank).
    > The BASE64 encoding('b') is more efficient but also complex.
    > It is defined in RFC2045.

    This is the current state of affairs, but work is already underway to do
    something similar to RFC 1652 for headers. However, waiting for the IDN
    work to reach its conclusion first seems like a good idea.

    > (3) What are we doing for Japanese e-mail
    > The double-byte charcter set in Japan is pretty complex.
    > The unix guys like to use EUC or JIS(iso-2022-jp), while Windows
    > and Macintosh users are using Shift-JIS. Since Unicode is the
    > standard charcter set for WindowsNT and WindowsCE, they also
    > can handle Shift-JIS for backword compatibility.

    > Having said that, the charcter set used in e-mail system today
    > is iso-2022-jp(RFC1468). The reason is simple; it is 7bit code.
    > That means it can be sent in mail contents as "text/plain;charset=
    > iso-2022-jp" MIME type. However it cannot embed in the mail
    > headers because the iso-2022-jp uses escape sequence to "swap"
    > the code page.

    I believe this is an accurate assessment of the majority of use today. However,
    there are lots of exceptions in practice. UTF-8 is coming into use in some
    places, for example. And modern mail transports handle it fine without
    encoding.

    > So, the standard Japanese e-mail is:

    > Header: iso-2022-jp MIME encoded with BASE64 or Quoted-Printable.
    > Contents: text/plain;charset=iso-2022-jp

    > Altough UTF-7 could be used as the alternative to iso-2022-jp
    > from the technical point of view, it is not populer today.

    And isn't likely to gain in popularity in the future. UTF-7 is a bad idea;
    UTF-8 should be used instead.

                                    Ned



    This archive was generated by hypermail 2b29 : Tue Apr 04 2000 - 12:09:32 EDT