IPP> ASIAN languages in IPP.

IPP> ASIAN languages in IPP.

Tom Hastings hastings at cp10.es.xerox.com
Tue Nov 4 12:28:49 EST 1997


At 11:59 11/03/1997 PST, Yuji Sasaki wrote:
>Dear IPP members,
>
> I'm still concerning what language should be used when the text attributes
>becomes mixed language, such as :"%%[PrinterError:Offending command while
>printing file ******.ps%%]" (please assume ***** as a Japanese kanji anything
>you like such as "TEMPURA" "FUJIYAMA" or "GEISHA"). Should it be English,
>Japanese, or we don't have to care??


Interesting question, since an error message attribute could have such
an example.  Lets assume that the user submitted the request and specified
that Japanese was the natural language, because the client submitted
the "attributes-natural-language" with the value 'jp' (is that the
correct language code for Japanese?) and the "attributes-charset" with,
say, 'utf-8'.  Then there is no ambiguity in your example.  The Japanese
glyphs, not the Chinese glyphs, will be chosen for the file name.


The Printer object should have responded with the entire string in
Japanese, not just the file name.  However, I realize that PostScript
only returns English error message text.  However, I suspect that
we don't really have a problem with IPP, if the message is in a
charset that supports ASCII characers and Kanji, such as UTF-8, Shift JIS, or
EUC JIS, since the first part of your example message would be
"ASCII" characters, which even in Japanese are displayed as:


    %%[PrinterError:Offending command while printing file


If the Printer object also supports one of the Japanese charsets, such as:
'JIS_C6226-1983' or 'JIS_X0212-1990' or 'Shift_JIS' or 'EUC-JP'
or 'Windows-31J' there also is no problem, because they all contain
ASCII characters for the english part of your example.  Thus the client
could have supplied one of these charset values, provided that the
Printer object also supported it.






PROPOSAL FOR AFTER IPP V1.0:


After version 1.0 of IPP is forwarded, I'd like to see us register two
new Job Description attributes that contain error messages encountered
during processing, such as the one in your example.  One attribute
would have a 'keyword' value for programs, and the other would have
a 'text' value for human users.  Maybe they should be multi-valued,
so that an implementation could indicate a number of problems if the
implementation wanted, but would not be required to.


Perhaps, call these attributes:




job-processing-error-id (1setOf type3 keyword)


This attribute indicates one (or more) errors encountered during the
processing of the job.  


The intent of this attribute is for program consumption while the intent 
of the corresponding "job-processing-error-message" is for human consumption.




job-processing-error-message (1setOf 'text)


This attribute indicates one (or more) errors encountered during the
processing of the job.


The intent of this attribute is for human consumption while the intent 
of the corresponding "job-processing-error-id" is for program consumption.




ISSUES for discusion would include: 


1. Do we want to include warnings or not.  If so, how are they 
indicated? Do all keywords have the suffix "-error" or "-warning" in them?


2. In order to improve interoperability, we should list a whole bunch of
keywords with the registration of this attribute.  This is why we can't
take the time now for IPP V1.0 for this attribute.


3. Should the type of the keyword be type2 so that review could eliminate
duplicates, or type 3 so that there is no review after our initial set).


>
> I have another concern to use Unicode in multilanguage environment.
>I know it is a IPP client/browser issue more than a protocol issue,
>But it is improtant for Asian like me.
>
> We have at least three Kanji codes: Chinese, Japanese, and Korean. But
>in the specification of ISO10646-1(UCS-4), most of them were combined into
>a sigle page, called "CJK charcter set".
> The problem is, some of Kanji charcters in CJK are "Looks similer" but
>have defferent "faces" depending on the language which the charcter
>"belongs to".
>
> In extreme cases, one string can include several languages like:
>
>The document named "Woo Hoi Chang" was printed from "Aoyama Tokyo".
>                    ~~~~~~~~~~~~~                    ~~~~~~~~~~~~
>                   (Chinese Kanji)                  (Japanese Kanji)


Such an example could even occur in IPP attributes, if the error
message was in, say, Japanese, but the file name was in, say, Chinese.
However, the chances are small and so the file name would get presented
in Japanese, instead of Chinese.


The most likely occurrence of mixed Chinese and Japanese would be
in the document data itself, which is a problem for the document format
specifications, such as HTML and PostScript, not for IPP.


The 'text/plain' document format would not be able to distinguish
the Chinese from the Japanese.


>
> In that case, even RFC2069 (Adding a language information to each strings)
>is not enough. Much less, current version of IPP could have only one
>language information for all text attributes within a session.
>
>In HTML4.0, "LANG" tag is defined so we can describe like:
>
>The document named <LANG="chinese">Woo Hoi Chang</LANG> was printed from
><LANG="japanese">Aoyama Tokyo</LANG>.
>
> But I don't feel like to use HTML as IPP 1.0 presentation layer, it's too
>heavy to implement for clients.


Agreed.


>
> Practically, we Asian can know what does the word mean evenif the details
>are slightly different (like you guys can know "colour" is the same word
>as "color").
> And I think we will implement CJK difference as "assuming native language".
>In the case above, all kanjis will be displaied as "Japanese Kanjis" in
Japan,
>and will be "Chinese Kanjis" in China.
>
> But the problem still remains, especially for describing human names or
>name of places. We have to know EXCACTLY CORRECT kanjis to identify the
>particular persons/places, mostly because historical reasons. Like in
>English, "Colour" and "Color" is the same but "Kristen" and "Cristen" are
>definitely different.
> Unfortunatelly, we don't have the standard method to use CJK in multi-
>language environment(except HTML4). Even in a single language(e.g Japanese),
>we are still strugging to use too many charcters in the limited capacity
>of Unicode CJK.


Fortunately, for IPP, each 'name' attribute is separate, and so can have
its own "override natural-language", so that a job can contain 'name'
attributes in different languages.


>
> Do you think it is okay to use "native language" as default language to
>handle CJK charcters (in other words, "depends on implementation")? 


So the client requests attributes to be returned in the user's natural
language, say, Japanese.  If the message contains ASCII characters,
they should be displayed correctly.  They could even contain accented
Latin characters, or Cyrillic charactes.  Its only for the CJK characters
that the ambiguity becomes a problem.  But the Japanese user requesting
IPP attributes from a job with one or more Chinese name attributes, could 
still present each Chinese name in the correct Chinese glyphs to the
Japanese user, becuase the IPP Printer returns the natural-language 
override on the name attributes indicating that the name is in Chinese, 
not Japanese as requested.  Ok?


> I think we have no alternetive other than it. This will spoil the excact
>international interoperability from IPP, but the problem is rooted on
>Unicode CJK itself, not the matter of IPP. I hope future version of IPP
>(and Unicode) will solve this problem.


See if my explanation solves your problem.  We hope it does, since that
is why we included the override mechanism in IPP/1.0.


>
> Sorry for persistance of this issue and (I gueess) make you guys confused.
>But I'm afraid if IETF people point out that it is unclear how to handle
>CJK charcters in IPP specification.
>
>Well, it is clear. Just say, "Depends on implementation" ;-).
>
>Sincerely,
>--------
>Yuji Sasaki
>E-Mail:sasaki at jci.co.jp
>
>
>



More information about the Ipp mailing list