PMP Mail Archive: PMP> Revised proposal on definition of OCTET STRING to allow

PMP> Revised proposal on definition of OCTET STRING to allow

Tom Hastings (hastings@cp10.es.xerox.com)
Mon, 21 Jul 1997 16:40:53 PDT

I have heard no objects to the main thrust of my suggestion
to allow additional characters in code positions 128-255
for objects of syntax OCTET STRING, as long as code positions
32-126 remained US-ASCII. The discussion has been about
prtChannelInformation (which I have removed from this proposal).
There has been no objections to changing the new object:
prtGeneralPrinterName from DisplayString to OCTET STRING either.

I assume that silence means acceptance on the main thrust
of the proposal????

However, to be clear, I've simplified the proposal and removed
any mention of prtChannelInformation and an re-circulating.

I've talked with David Kellerman. As a result I have
modified my proposal to avoid mentioning prtChannelInformation (fixing
that description will be a separate issue). I have also changed
the proposal so that any object of type OCTET STRING SHALL use no control
codes, unless specifically specified in the DESCRIPTION (this should
cover prtChannelInformation, prtGeneralCurrentOperator, and
prtGeneralServicePerson which talk about LF).

I have also talked with Bob Pennecost. The HP 5si allows 8-bit data to
be written into read-write OCTET STRING objects. We tried
prtGeneralCurrentOpoerator and it accepted 8-bit Windows characters
and a windows SNMPC application correctly displayed them. Furthermore,
the read-write prtInputMediaName object which the 5si will only accept
values that have been previously set by the 5si private MIB using 8-bit
characters.

So we have significant implementation practice of RFC 1759 that is *not*
limiting OCTET STRING to US-ASCII (7-bits, code positions 32-126) as specified
on page 14, top paragraph. So we need to fix page 14 and add a REFERENCE
section.

Briefly, the problems with the current Printer MIB draft are:

1. There are many objects of type OCTET STRING that are restricted to ASCII.
But ASCII is not a clearly defined term and existing practice is in
conflict with the most likely of the interpretations. Existing practice
is to use US-ASCII (ANSI X3.4) in code positions 0-127 and some other
coded character set in code positions 128-255. In other words, current
practice is to use 8-bit coded character sets in which code positions
0 to 127 are US-ASCII. Examples of such sets are: ISO Latin 1, HP Roman 8,
UTF-8, JIS X0208-1990 Japanese two byte set in 128-255 with US-ASCII in
0-127, GB 2312-1980 Chinese two-byte set in 128-255 with US-ASCII in 0-127.

2. One of the new Printer MIB v2 objects, 'prtGeneralPrinterName' has
been given a SYNTAX of 'DisplayString', instead of OCTET STRING
which forces NVT ASCII only (code positions 128 to 255 SHALL not be used)
instead of 'OCTET STRING' which would give the same capabilities for other
sets with US-ASCII as a subset as in 1 above.

3. There isn't a proper Bibliography section to refer to other standards
that are needed in order to understand references to terms, such as "ASCII",
"NVT ASCII", "Unicode", UTF-8, etc.

Explanation of the problems with suggested solutions and text.

1. There is a serious ambiguity in the 02 Printer MIB draft about the many
objects of syntax OCTET STRING that are indicated as not being localized.
Page 14 describes them:

Localization is only performed on those strings in the MIB that
are explicitly marked as being localized. All other character
strings are returned in ASCII.

There is no reference to what is meant by "ASCII".

The number of different interpretations of this includes:

a. ANS X3.4, the ANSI standard in positions 0 to 127, 128 to 255 SHALL NOT be
used.

b. NVT ASCII (RFC 854) in positions 0 to 127, 128 to 255 SHALL NOT be used.
NVT ASCII includes the following controls for virtual terminals: NUL (0),
LF (10), CR (13), BEL (7), BS (8), HT (9), VT (11), FF (12).

c. Some think that it is any coded character set in which ASCII is in the left
hand side, i.e., values 0 to 127 decimal and any other one or two octet coded
character set is from values 128 to 255, such as ISO 8859-1 (ISO Latin-1), the
Windows default set, HP Roman8, any of the eleven ISO 8859-n sets, UTF-8, JIS
X0208, GB2312, etc.

d. And some think it means any coded character set at all, including Unicode,
any national 7-bit set, so that ASCII doesn't even have to be in positions 0 to
127.

Suggested solution:

1. I propose that we clarify the Printer MIB to be interpretation 3.
I believe that that will also correspond to actual practice of implementing RFC
1759. For example, any of the ISO 8859-n (Latin 1, etc.) meet this
criteria. Also HP's Roman-8 meets this criteria, as does the Windows
default 8-bit character set. For Asian markets, they may use either UTF-8
which is a tranformation of ISO 10646 (Unicode) that meets this criteria
or they may use US ASCII in code points 0 to 127 and their national two byte
coded character sets in code points 128 to 255 according to the code structure
of ISO 2022 for 8 bit environments.

So replace the second sentence of the paragraph on page 14:

All other character strings are returned in ASCII.

with:

The agent SHALL return all other character strings as coded
character sets in which code positions 0-127 (decimal) are
US-ASCII [US-ASCII] and the remaining values, 128-255, may be any other
coded character set, including multi-byte sets according to ISO 2022
[ISO 2022] in 8-bit environments. Examples of
coded character sets which meet this criteria are: US-ASCII,
ISO 646:1991 IRV [ISO 646], ISO 8859-1 (Latin-1) [ISO 8859],
any ISO 8859-n, HP Roman8, Windows Default 8-bit set, UTF-8 [UTF-8],
US-ASCII plus JIS X0208-1990 Japanese [JIS X0208], GB2312-1980 Chinese
[GB2312].

Examples of coded character sets which do not meet this criteria are:
national 7-bit sets (except US ASCII), EBCDIC, and ISO 10646 (Unicode)
[IS 10646]. In order to represent Unicode characters, use UTF-8.

Control codes (code positions 0-31 and 127) SHALL NOT be used unless
specifically specified in the DESCRIPTION of the object.

2. Change the syntax of the MIB object: 'prtGeneralPrinterName'
from 'DisplayString' which is restricted to US-ASCII to OCTET STRING,
so that other sets may be used in code positions 128 to 255 and so that
the restricted set of controls will be specified.

3. Add a proper Bibliography section so that the above references
can be made. I found a proper reference to US-ASCII in RFC 2044
(UTF-8) as:

[US-ASCII] Coded Character Set--7-bit American Standard Code for
Information Interchange, ANSI X3.4-1986.

So it is ok to refer to ANSI standards from IETF standards.

So I propose that the Bibligraphy section be:

[US-ASCII] Coded Character Set - 7-bit American Standard Code for
Information Interchange, ANSI X3.4-1986.

[ISO 646] ISO 646:1991, "Information technology - ISO 7-bit coded
character set for information interchange".

[ISO 8859] ISO 8859-1:1987, "Information technology - 8-bit single
byte coded graphic character sets -
Part 1: Latin alplhabet No. 1"

[ISO 2022] ISO 2022:1994 - "Information technology - Character code
structure and extension techniques"

[ISO 10646] ISO 10646-1:1993, "Information technology - Universal
Multiple-Octet Coded Character Set (UCS) - Part 1:
Architecture and Basic Multilingual Plane

[UTF-7] Goldsmith, D., and M. Davis, "UTF-7", RFC1642, Taligent,
Inc., July 1994.

[UTF-8] F. Yergeau, "UTF-8, a transformation format of Unicode
and ISO 10646", RFC 2044, October 1996.

[NVT ASCII] J. Postel, J. Reynolds, "TELENET PROTOCOL SPECIFICATION",
RFC 854, May 1983.

[JIS X0208] JIS X0208-1990, "Japanese two byte coded character set."

[GB2312] GB 2312-1980, "Chinese People's Republic oF China (PRC)
mized one byte and two byte coded character set"

For reference:
I've extracted all objects of type OCTET STRING from the draft 02.
I've put "localized" in front of the ones whose DESCRIPTIONs say are
localized according to prtGeneralCurrentLocalization and concole
localization in front of the ones whose DESCRIPTIONs say are localized by
prtConsoleLocalization:

prtGeneralCurrentOperator OCTET STRING,
prtGeneralServicePerson OCTET STRING,
prtGeneralSerialNumber OCTET STRING,
localized prtCoverDescription OCTET STRING,
prtCoverDescription OCTET STRING,
prtLocalizationLanguage OCTET STRING,
prtLocalizationCountry OCTET STRING,
prtInputMediaName OCTET STRING,
prtInputName OCTET STRING,
prtInputVendorName OCTET STRING,
prtInputModel OCTET STRING,
prtInputVersion OCTET STRING,
prtInputSerialNumber OCTET STRING,
localized prtInputDescription OCTET STRING,
prtInputMediaType OCTET STRING,
prtInputMediaColor OCTET STRING,
prtOutputName OCTET STRING,
prtOutputVendorName OCTET STRING,
prtOutputModel OCTET STRING,
prtOutputVersion OCTET STRING,
prtOutputSerialNumber OCTET STRING,
localized prtOutputDescription OCTET STRING,
localized prtMarkerSuppliesDescription OCTET STRING,
prtMarkerColorantValue OCTET STRING,
localized prtMediaPathDescription OCTET STRING,
prtChannelProtocolVersion OCTET STRING,
prtInterpreterLangLevel OCTET STRING,
prtInterpreterLangVersion OCTET STRING,
localized prtInterpreterDescription OCTET STRING,
prtInterpreterVersion OCTET STRING,
console localization prtConsoleDisplayBufferText OCTET STRING
console localization prtConsoleDescription OCTET STRING
localized prtAlertDescription OCTET STRING,

We want to add to the above list:

prtGeneralPrinterName OCTET STRING
prtChannelInformation OCTET STRING