IPP Mail Archive: IPP> Mail encodings

IPP> Mail encodings

From: Yuji Sasaki (sasaki@jci.co.jp)
Date: Tue Apr 04 2000 - 05:18:26 EDT

  • Next message: ned.freed@innosoft.com: "Re: IPP> Mail encodings"

    I made some study about mail encodings.

    (1) Mail contents
    The MIME format is defined in RFC2045. It expands the SMTP
    protocol(RFC822) to be able to handle other than ascii plain
    text format. However, the restriction of RFC822 still remains...
    the contents of the mail must be 7bit.

    (2) Mail headers
    While the mail contents can include all 7bit codes, the
    charcters can be use in mail header is more restrected.
    RFC2047 defines the way to expand mail headers.
    The encoded mail header becomes like;

    Subject: =?iso-8859-1?q?this=20is=20some=20text?=

    The key of header encoding is two ASCII charcters, '=' and '?'.
    The '=?' indicates the startponit of MIME encode, while the '?='
    indicate the endpoint.
    Between them are he charcter set (as you can see the 'iso-8859-1'
    in previouse example), and the encoding method ('?q?' in the
    example).

    There are to ways of encodings, 'Quoted-Printable' and 'BASE64'.
    The Quoated-printable encoding('q') is somewhat like URL-encode,
    using escape charcter '=' instead of '%' in the URL (in the case
    of above, '=20' replesents a white blank).
    The BASE64 encoding('b') is more efficient but also complex.
    It is defined in RFC2045.

    (3) What are we doing for Japanese e-mail
    The double-byte charcter set in Japan is pretty complex.
    The unix guys like to use EUC or JIS(iso-2022-jp), while Windows
    and Macintosh users are using Shift-JIS. Since Unicode is the
    standard charcter set for WindowsNT and WindowsCE, they also
    can handle Shift-JIS for backword compatibility.

    Having said that, the charcter set used in e-mail system today
    is iso-2022-jp(RFC1468). The reason is simple; it is 7bit code.
    That means it can be sent in mail contents as "text/plain;charset=
    iso-2022-jp" MIME type. However it cannot embed in the mail
    headers because the iso-2022-jp uses escape sequence to "swap"
    the code page.

    So, the standard Japanese e-mail is:

    Header: iso-2022-jp MIME encoded with BASE64 or Quoted-Printable.
    Contents: text/plain;charset=iso-2022-jp

    Altough UTF-7 could be used as the alternative to iso-2022-jp
    from the technical point of view, it is not populer today.

    If you think the backword compatibility and the interoperability
    are important(I believe they are the main reason why we choose
    SMTP as one of the Notification methods), I recommend that you
    SHOULDN'T restrict the charcter set used in mailto: notification
    to UTF-7.

    Sorry for my terrible English...
    --------
    Yuji Sasaki
    Company E-Mail :sasaki@jci.co.jp
    Personal E-Mail:crazy17@ibm.net
    Nifty-Serve :PFG02524@nifty.ne.jp
    1066 Saratoga Avenue Suite #100 San Jose CA 95129 USA
    Tel:408-551-6470 Fax:408-551-6475



    This archive was generated by hypermail 2b29 : Tue Apr 04 2000 - 05:24:35 EDT