IPP Mail Archive: Re: IPP> MOD - Separate 'document-format' and 'document-language'

Re: IPP> MOD - Separate 'document-format' and 'document-language'

Ned Freed (Ned.Freed@innosoft.com)
Tue, 30 Sep 1997 17:34:13 -0700 (PDT)

> Thanks for the quick response. Actually I did re-read that section
> of RFC 2046 and the short list (US-ASCII and ISO-8859-X) were
> termed the 'Internet standard character sets'.

No, they were termed the "initial set of registered charsets". I just
didn't see fit to reiterate that this is only the initial set throughout
the section. However, since this is proving to be confusing here I'll change
the specification the next time it comes out to be more explicit about
this throughout.

Note, however, that this same section says:

No charset name other than those defined above may be used in Internet
mail without the publication of a formal specification and its registration
with IANA, or by private agreement, in which case the charset name must
begin with "X-".

In other words, the list may be amended at any time by registering the charset
with IANA and publishing a formal specification of it. And once this is done
the clear implication is that such charsets may be used in Internet mail.

Note also that this is specific to Internet mail. There are many other uses of
MIME on the Internet other than Internet mail and they are not bound by this
rule. (This omission is intentional, BTW.)

> It also says a
> little later, 'this standard does NOT endorse the use of any
> character set other than US-ASCII'.

No it doesn't. The document says that it doesn't endorse the use of any
_particular_ charset other than US-ASCII. This is a _completely_ different
thing. Specifically, it means that the document authors (myself and Nathaniel
Borenstein) did not see fit to endorse the use of, say, iso-2022 derived
charsets rather than, say, iso-10646 derived charsets. (This was a big
unresolved issue at the time and the only way we could get consensus was to
explicitly say that we weren't endorsing anything in particular.) However, the
IAB has now specifically endorsed the use of iso-10646 derived charsets in
favor of iso-2022 derived ones and the IESG is about to turn this endorsement
into a formal policy for the IETF to follow. So this statement ceases to have
any relevance and will be removed from the document the next time around.

> I'm aware that UTF-8 is registered in the IANA character set
> registry. What I couldn't find was an unambiguous statement
> in RFC 2046 (or an updating RFC) that ANY IANA registered
> character set MAY be specified in a 'charset' parameter of
> a MIME 'media-type'. Can you point at such a statement, to
> help us all out?

No such statement exists because no such statement is required. As I pointed
out above, the only restriction is that unregistered may not be used except by
private agreement.

You seem to be missing out on one of the most basic precepts of MIME here. MIME
is mostly proscriptive in nature, it is rarely prescriptive in nature. That is,
MIME tells you what you cannot do; it does not specifically tell you all the
things you can do. You're expected to infer that everything that isn't
specifically prohibited is allowed without being told so in so many words.

This differs greatly from many if not most other standards specifications, and
is one of the reasons why MIME has has so little trouble readily adapting to
world's ever-changing use of different media types, charsets, and so on.

Now, I freely admit that the language in this particular part of MIME is
awkard. However, there's a reason for this -- at the time these documents were
advanced there was no charset registration procedure in place to reference.
However, IETF has hard and fast rules about such references, and they had to be
followed regardless of how awkward the resulting prose ended up. Had they not
been we'd be stuck with RFC1521 and RFC1522 to this day, since the charset
registration issue is still not completely dealt with.

Ned