IPP> MOD - ISSUES: Contents of "document-format" attribute

IPP> MOD - ISSUES: Contents of "document-format" attribute

Tom Hastings hastings at cp10.es.xerox.com
Fri Sep 26 19:33:42 EDT 1997


Larry,


Thanks for the clarification.  It sure is simpler to treat utf-8 as
if it were a coded character set, since it can only be used with ISO 10646
coded character set.  So we can use the charset attribute of a text
media type (MIME type) to indicate the coded character set, including utf8.
Some character sets, such as utf-8, need the language specified in order
to choose the proper glyphs.  


I suggest that we enhance the current description of the IPP Model
"document-format" attribute to indicate the capability of using the
charset attribute of media types.  I'm still looking for a language
attribute for use with media types, but can't find it.  RFC 2184
didn't seem to have it. 


Keith,
We need your help with how to specify the human language for those
media types that need it.


Thanks,
Tom






Proposed clarifications for the IPP "document-format" attribute:


The current 9/26/97 text is:


4.2.16	document-format (mimeType)


This attribute defines the document format of the data to be printed.  The
standard values for this attribute are Internet Media types (MIME types) [??].  
For example, some values are:


'text/html': An HTML document
'text/plain': A plain text document
'application/postscript': A PostScript document


The IANA registry for such types is ???[???].


One special type is 'application/octet-stream'.   If the Printer object ...
[auto-sense discussion]






I suggest something like:


4.2.16	document-format (mimeType)


This attribute defines the document format of the data to be printed.  The
standard values for this attribute are Internet Media types 
(sometimes called MIME types) according to RFC 2046 [RFC-2046]
registered according to the procedures of RFC 2048 [RFC-2048].  
See the IANA Internet Media types registry [aa] for the media types 
currently registered.  RFC 2046 [RFC=2046] allows for the specification 
of coded character set and human language when needed for certain Media types.  
The coded character set registration entries labeled as "(preferred MIME
name)" SHALL be used as values of the 'charset' Media type attribute, if
indicated 
in the registration entry.


For example, some values are:


'text/html': An HTML document
'text/plain': A plain text document with CRLF line breaks (SHALL be assumed 
              to be US-ASCII)
'text/plain; charset=US-ASCII' A plain text document in US-ASCII
'text/plain; charset=ISO-8859-1' A plain text document in ISO 8859-1 (Latin 1).
'text/plain; charset=utf-8' A plain text document in ISO 10646 represented
                           as UTF-8 [28]
'application/postscript': A PostScript document (see RFC 2046)
'application/vnd.hp-PCL': A PCL document


ISSUE:  I could not find how to specify the human language used in the
data.  RFC 2184 specifies human language for the attribute parameter values 
themselves, but not the data, that I could see.  Keith, we need help here.


ISSUE:  RFC 822 and 2049 specify that text/plain SHALL use CRLF (two octets)
to indicate line breaks.  However, the coded character set US-ASCII standard
(X3.4) allows just LF for line breaks, if there is agreement between sender
and recipient.  Does that mean that 'text/plain; charset=US-ASCII MUST
be supplied to allow line breaks to be represented as just LF (as in UNIX
and C)?  Or do we need a new media type other than text to indicate
that line breaks are indicated as single LF characters?


One special type is 'application/octet-stream'.   If the Printer object ...
[no change with the auto-sense discussion].




Some additional biblio entries:


[RFC-2046] Multipurpose Internet Mail Extensions (MIME) Part Two: Media
     Types. N. Freed & N. Borenstein. November 1996. (Format: TXT=105854
     bytes) (Obsoletes RFC1521, RFC1522, RFC1590), RFC 2046.


[RFC-2048] Multipurpose Internet Mail Extension (MIME) Part Four:
     Registration Procedures. N. Freed, J. Klensin & J. Postel. November
     1996. (Format: TXT=45033 bytes) (Obsoletes RFC1521, RFC1522, RFC1590)
     (Also BCP0013), RFC 2048.


[aa] IANA Registry of Media Types:
ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/


[bb] IANA Registry of Coded Character Sets:
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets






At 11:19 09/23/97 PDT, Larry Masinter wrote:
>Yes, UTF8 is logically a transfer encoding for the UCS coded character
>set, but in Internet protocols, it doesn't fill that role. In 
>Internet standards, there are two technical protocol elements:
>"content-transfer-encoding" used in MIME email and "transfer-encoding"
>used in HTTP. "UTF-8" is not a valid token for either field, since
>both content-transfer-encoding and transfer-encoding are 8-bit-to-8-bit
>transformations. Instead, the token "UTF-8" is only valid for the
>"charset" parameter.
>
>Larry
>-- 
>http://www.parc.xerox.com/masinter
>
>



More information about the Ipp mailing list