PMP Mail Archive: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET S

Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET S

Tom Hastings (hastings@cp10.es.xerox.com)
Wed, 23 Jul 1997 11:16:15 PDT

Bill,

Good questions. See if my answers make sense.

Tom

At 08:41 07/23/97 PDT, Bill Wagner wrote:
>
> Tom,
>
> This does seem to address the objections. But I do have some questions
> on how it would work.
>
> 1. I assume that there would be no checking of the operator entered
> identification information. That is, the operator/administrator set in
> whatever they want (or can), and get the same data bytes back. They
> are not restricted to the identified character coding set.

I would say that they would be restricted to the coded character set
identified by the prtGeneralStaticCodeSet object. Otherwise, another
application (that runs with a different default code set), might be
confused when reading the text. Thus a good application that
is writing Printer MIB objects should send such data in the identified
code set specified by prtGeneralStaticCodeSet. If the user only enters
ASCII characters (codes 32-126), then the application doesn't need to do
anything special, since ALL values of prtGeneralStatisCodeSet specify.

Perhaps we need to add a sentence to clarify this to section 2.2.1,
General Printer, page 14?

How about:

When writing objects with syntax OCTET STRING (that are not idendified
as affected by localization), an application SHALL ensure that any
code characters written in the range 32-126 are US-ASCII and that any
code characters written in the range 128 to 255 are in the coded
character set and encoding scheme specified by the value of the
prtGeneralStaticCodeSet object.

>
> 2. I assume that the static code set object would be read only. Or do
> you see an instance where someone would be sufficiently masochistic to
> make it read/write?

I proposed that the MIN-ACCESS be read-only so that we don't force
it to be read-write. Whether prtGeneralStaticCodeSet is fixed by the
implementation or is modifiable by means outside the MIB, depends on
implementation when prtGeneralStaticCodeSet is read-only. However,
for the masochistic (or an implementation without a local operator panel),
the object is read-write, as with other similar objects in the Printer MIB.
The DESCRIPTION suggests that (if it can be
written) that it be written once when the device is installed (and could be
written later if the device is moved - really re-installed).

>
> 3. Is there an identified need for the printer to use a character set
> other than UTF8 for the remaining, largely read only objects?

I suggest yes, in order to handle existing practice of the 5si using
HP Roman 8 and Lexmark using their 8-bit set. However, we want to steer
future implementations towards UTF8 only as recommended by the IAB in
RFC 2130. Thus the DEFVAL 106. Even RFC 2130 doesn't preclude other
char sets.

>
> prtGeneralSerialNumber
> prtLocalizationLanguage (see note)
> prtLocalizationCountry (see note)
> prtInputName
> prtInputVendorName
> prtInputModel
> prtInputVersion
> prtInputSerialNumber
> prtOutputVendorName
> prtOutputModel
> prtOutputVersion
> prtOutputSerialNumber
> prtMarkerColorantValue
> prtChannelProtocolVersion
> prtInterpreterLangLevel
> prtInterpreterLangVersion
> prtInterpreterVersion
> prtChannelInformation (??? this is a mix of user derived and
> fixed info... I don't think any generalstatement of character set can
> hold. It would seem, for example, that Novell parameters (if 4.0 or
> higher) should be and to eneter them in double byte would be ...bad.
> Perhaps the character set for Channel Information parameters be
> separately specified, per parameter, in the MIB descriptions?)

Exactly, see David's updated prtChannelInformation proposal. In other words,
prtGeneralStaticCodeSet does not apply or affect the code set and
encoding scheme for prtChannelInformation. Instead, each enum value
specifies in its description when registered what the code set and
encoding scheme is. But the code set and encoding scheme MUST have the
property that code positions 32-126 are US-ASCII and the LF code
terminates the data.

>
> 4. LocalizationLanguage (ISO639) and LocalizationCountry (ISO3166) are
> two character codes (with a size constraint to 2 octets); would these
> be recognizable/enterable in any likely character set to be specified?

No. They are always in English as specified in ISO 3166 and ISO 639.
Since any set that could be specified by prtGeneralCodeSet MUST preserve
code positions 32-126 as US-ASCII, the values of language and country are
unaffected by prtGeneralStaticCodeSet by definition.

Tom

>
>
> Bill Wagner, Osicom/DPI
>
>______________________________ Reply Separator
_________________________________
>Subject: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRIN
>Author: Tom Hastings <hastings@cp10.es.xerox.com> at Internet
>Date: 7/23/97 3:39 AM
>
>
>Please indicate your approval/disapproval of this SYNTHESIS proposal
>by noon PDT today (Wednesday, 7/23).
>
>I hope that this proposal continues to be acceptable to those that liked
>yesterday's proposal and is intended to satisfy the Area Director and those
>who disaproved of yesterday's proposal on the grounds that there wasn't
>a way to determine the coded character set.
>
>We've had a real good *technical* discussion on the e-mail list yesterday
>about the proposal to relax the restriction on the MIB objects defined
>as OCTET STRING from being only ASCII to being any char set that has
>US-ASCII in code positions 32-126. Thanks everyone for the hard work.
>
>The people in favor of the proposal are in favor of it because it
>describes existing (and planned) implementations
>(including the HP 5si and Lexmark printers)
>and it allows implementors to follow the IAB recommendation that UTF-8
>be the default character set (which has US-ASCII in code positions 32-126).
>The people against the proposal have picked up the flaw that it does not
>follow the Area Directors admonition that it must be possible to
>determine the code set being used by objects in the MIB.
>
>Fortunately, most of the data will be using code positions 32-126, so there
>isn't any ambiguity for those code positions. It is for code positions
>128-255 for which we have no means in the MIB to indicate what the code
>set is. For example, the code set could be UTF-8, ISO 8859-1 (Latin1),
>HP Roman8, JIS X0208 (Japanese Kanji two byte set), GB2312 (PRC Chinese
>Kanji two byte set), since these code sets are all supersets of US-ASCII.
>
>Rather than abandoning all hope, lets follow the Area Director's advice
>and provide the means to specify the code set for those OCTET STRING objects
>in question (those that are not already indicated as being subject to the
>localization mechanism in the Printer MIB). I suggest as a SYNTHESIS proposal
>which should make all the commentors happy and follows the Area Director's
>advice (at the cost of a single object):
>
>1. Add a simple object to the General table that specifies the static
>code set for the OCTET STRING objects in question (those that are not
>already indicated as being subject to the localization mechanism in the
>Printer MIB)
>2. The object has MAX-ACCESS of read-write
>3. The object has MIN-ACCESS of read-only
>4. Lets just add the object to the MANDATORY General Group, rather than
>making the object OPTIONAL and putting it in a separate group and
>specifying that US-ASCII SHALL be used when the object is omitted.
>5. The default value for the object is specified as UTF-8 (enum 106)
>to follow the IAB recommendation.
>6. Lets call the new object: prtGeneralStaticCodeSet
>
>
>The complete text of this SYNTHESIS proposal affects:
>
>1. page 14 (as before) to broaden the restriction of OCTET STRING from
>just ASCII to coded character sets that are a superset of US-ASCII,
>but extended to mention the new prtGeneralStaticCodeSet object.
>
>2. page 75, change prtGeneralPrinterName from DisplayString to
>OCTET STRING (SIZE(0..63)) [or a different size if we agree to one]
>
>3. page 68, add prtGeneralStaticCodeSet to the prtGeneralTable
>
>4. page 76, to add the prtGeneralStaticCodeSet object DESCCRIPTION.
>
>5. page 137 to add MIN-ACCESS read-only
>
>6. page 143 to add the prtGeneralStaticCodeSet object to the prtGeneralGroup
>
>7. page 172 to add a proper Bibliography section
>
>
> The seven changes are:
>
>1. Page 14, change the paragraph about OCTET STRING objects from:
>
> Localization is only performed on those strings in the MIB that
> are explicitly marked as being localized. All other character
> strings are returned in ASCII.
>
>to:
>
> Localization is only performed on those strings in the MIB
> represented by objects with syntax 'OCTET STRING' that
> are explicitly marked as being localized. The agent SHALL return
> all other OCTET STRING objects as coded character sets in which code
> positions 0-127 (decimal) SHALL be US-ASCII [US-ASCII] and the remaining
> code positions, 128-255, if used, SHALL be any other coded character
> set structured according to ISO 2022 [ISO 2022] in 8-bit environments,
> including multi-byte sets.. Examples of coded character sets which
> meet this criteria are: US-ASCII, ISO 646:1991 IRV [ISO 646],
> ISO 8859-1 (Latin-1) [ISO 8859], any ISO 8859-n, HP Roman8,
> Windows Default 8-bit set, UTF-8 [UTF-8], US-ASCII plus
> JIS X0208-1990 Japanese [JIS X0208], US-ASCII plus GB2312-1980
> PRC Chinese [GB2312].
>
> Examples of coded character sets which do not meet this criteria are:
> national 7-bit sets (except US ASCII), EBCDIC, and ISO 10646 (Unicode)
> [ISO 10646]. In order to represent Unicode characters, use UTF-8
> [UTF-8].
>
> Control codes (code positions 0-31 and 127) SHALL NOT be used unless
> specifically specified in the DESCRIPTION of the object.
>
> In order for an application to be able to determine the coded
> character set returned by agents for objects of type OCTET STRING,
> (in order to be able to interpret code positions 128 to 255, since
> code positions 0 to 126 SHALL be US-ASCII), the
> prtGeneralStaticCodeSet object identifies the coded character set
> in use. This object is updated (infrequently) by system
> administrators when they install, upgrade, or move the managed
> system (to another physical and/or network location) or is
> fixed by the implementation.
>
>
>
>2. Page 75: Change the syntax of the MIB object: 'prtGeneralPrinterName'
> from 'DisplayString' which is restricted to US-ASCII to
> 'OCTET STRING (SIZE(0..63))'
> so that other sets may be used in code positions 128 to 255 and so that
> no control codes will be used (and so a lower and upperbound length
> are specified).
>
>
>
>3. Page 68, add prtGeneralStaticCodeSet to the table by replacing:
>
> prtAlertAllEvents Counter32 -- Alert
> }
>
>with:
>
> prtAlertAllEvents Counter32, -- Alert
> prtGeneralStaticCodeSet CodedCharSet
> -- General
> }
>
>
>
>
>4. Page 76: Add prtGeneralStaticCodeSet object to the end of the
>prtGeneralTable:
>
>prtGeneralStaticCodeSet OBJECT-TYPE
> SYNTAX CodedCharSet
> MAX-ACCESS read-write
> STATUS current
> DESCRIPTION
> "A code set from the IANA Character Set registry [IANA] to be used
> when interpreting 'human-readable' string objects with syntax
> 'OCTET STRING' that are not explicitly marked as being subject
> to localization. The specified code set SHALL have the
> property that code positions 32 to 126 are US-ASCII [US-ASCII]
> Code positions 0 to 31 and 127 SHALL NOT be used
> unless the object DESCRIPTION explicitly permits such usage.
> Code positions 128 to 255 SHALL represent single-byte or multi-byte
> graphic characters structured according to ISO 2022 [ISO 2022].
>
> Usage: This object is updated (infrequently) by system
> administrators when they install, upgrade, or move the managed
> system (to another physical and/or network location) or is
> fixed by the implementation.
>
> Usage: This object NEED NOT contain a value which is 'known'
> to this network printer or network print server, and need NOT
> contain a value found in some 'prtLocalizationCharacterSet'
> object instance currently present in the 'prtLocalizationTable'
> (ie, the static char set may be 'opaque' to the managed system)."
> REFERENCE
> "See: Section 2.2.1, 'General Printer',
> 'prtGeneralCurrentLocalization' (dynamic strings), and
> 'prtGeneralConsoleLocalization' (console strings)
> objects in the General group of this Printer MIB."
> DEFVAL {126} -- UTF-8 [UTF-8]
> ::= { prtGeneralEntry 20 }
>
>
>
>
>5. Page 137, after prtGeneralSerialNumber, add MIN-ACCESS read-only:
>
> OBJECT prtGeneralStaticCodeSet
> MIN-ACCESS read-only
> DESCRIPTION
> "It is conformant to implement this object as read-only"
>
>
>
>6. page 143 to add the prtGeneralStaticCodeSet object to the prtGeneralGroup
>
>Change:
>
> prtGeneralGroup OBJECT-GROUP
> OBJECTS { prtGeneralConfigChanges,
> prtGeneralCurrentLocalization,
> prtGeneralReset, prtCoverDescription,
> prtCoverStatus,
> prtLocalizationLanguage, prtLocalizationCountry,
> prtLocalizationCharacterSet, prtStorageRefIndex,
> prtDeviceRefIndex, prtGeneralPrinterName,
> prtGeneralSerialNumber }
>
>to:
>
> prtGeneralGroup OBJECT-GROUP
> OBJECTS { prtGeneralConfigChanges,
> prtGeneralCurrentLocalization,
> prtGeneralReset, prtCoverDescription,
> prtCoverStatus,
> prtLocalizationLanguage, prtLocalizationCountry,
> prtLocalizationCharacterSet, prtStorageRefIndex,
> prtDeviceRefIndex, prtGeneralPrinterName,
> prtGeneralSerialNumber, prtGeneralStaticCodeSet }
>
>
>
>7. Page 172: Add a proper Bibliography section so that the above
>references can be made. I found a proper reference to US-ASCII in
>RFC 2044 (UTF-8) as:
>
> [US-ASCII] Coded Character Set--7-bit American Standard Code for
> Information Interchange, ANSI X3.4-1986.
>
>So it is ok to refer to ANSI standards from IETF standards.
>
>
>
>So I propose that the Bibligraphy section be:
>
> [GB2312] GB 2312-1980, "Chinese People's Republic oF China (PRC)
> mixed one byte and two byte coded character set"
>
> [IANA] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
> 1700, ISI, October 1994.
>
> [ISO 646] ISO/IEC 646:1991, "Information technology -- ISO 7-bit coded
> character set for information interchange", JTC1/SC2.
>
> [ISO 8859] ISO/IEC 8859-1:1987, "Information technology -- 8-bit single
> byte coded graphic character sets -
> Part 1: Latin alplhabet No. 1, JTC1/SC2."
>
> [ISO 2022] ISO/IEC 2022:1994 - "Information technology -- Character
code
> structure and extension techniques", JTC1/SC2.
>
> [ISO 10646] ISO/IEC 10646-1:1993, "Information technology -- Universal
> Multiple-Octet Coded Character Set (UCS) - Part 1:
> Architecture and Basic Multilingual Plane, JTC1/SC2.
>
> [JIS X0208] JIS X0208-1990, "Japanese two byte coded character set."
>
> [NVT ASCII] J. Postel, J. Reynolds, "TELENET PROTOCOL SPECIFICATION",
> RFC 854, May 1983.
>
> [US-ASCII] Coded Character Set - 7-bit American Standard Code for
> Information Interchange, ANSI X3.4-1986.
>
> [UTF-7] Goldsmith, D., and M. Davis, "UTF-7", RFC1642, Taligent,
> Inc., July 1994.
>
> [UTF-8] F. Yergeau, "UTF-8, a transformation format of Unicode
> and ISO 10646", RFC 2044, October 1996.
>
>
>
>
>For reference:
>I've extracted all objects of type OCTET STRING from the MIB draft 02.
>I've put 'localized' in front of the ones whose DESCRIPTIONs say are
>localized according to prtGeneralCurrentLocalization and 'console
>localization' in front of the ones whose DESCRIPTIONs say are localized by
>prtConsoleLocalization:
>
> prtGeneralCurrentOperator OCTET STRING,
> prtGeneralServicePerson OCTET STRING,
> prtGeneralSerialNumber OCTET STRING,
>localized prtCoverDescription OCTET STRING,
> prtCoverDescription OCTET STRING,
> prtLocalizationLanguage OCTET STRING,
> prtLocalizationCountry OCTET STRING,
> prtInputMediaName OCTET STRING,
> prtInputName OCTET STRING,
> prtInputVendorName OCTET STRING,
> prtInputModel OCTET STRING,
> prtInputVersion OCTET STRING,
> prtInputSerialNumber OCTET STRING,
>localized prtInputDescription OCTET STRING,
> prtInputMediaType OCTET STRING,
> prtInputMediaColor OCTET STRING,
> prtOutputName OCTET STRING,
> prtOutputVendorName OCTET STRING,
> prtOutputModel OCTET STRING,
> prtOutputVersion OCTET STRING,
> prtOutputSerialNumber OCTET STRING,
>localized prtOutputDescription OCTET STRING,
>localized prtMarkerSuppliesDescription OCTET STRING,
> prtMarkerColorantValue OCTET STRING,
>localized prtMediaPathDescription OCTET STRING,
> prtChannelProtocolVersion OCTET STRING,
> prtInterpreterLangLevel OCTET STRING,
> prtInterpreterLangVersion OCTET STRING,
>localized prtInterpreterDescription OCTET STRING,
> prtInterpreterVersion OCTET STRING,
>console localization prtConsoleDisplayBufferText OCTET STRING
>console localization prtConsoleDescription OCTET STRING
>localized prtAlertDescription OCTET STRING,
>
>
>This proposal would add to the above list:
>
> prtGeneralPrinterName OCTET STRING
>
>
>NOTE:
>I have left out any attempts to fix the DESCRIPTION of
>prtChannelInformation in this proposal, since previous attempts to
>fix its descriptive contradictions has upset some members. I have
>also not proposed that its SYNTAX be changed from DisplayString
>to OCTET STRING for the same reason.
>
>
>
>