From ElliottBradshaw at oaktech.com Fri Jan 3 14:16:03 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:38 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: Call-in arrangements for the next CR conference call: Time: 3:00 PM Eastern time Wed. 1/8 Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 As our main topic I would like to go through the draft Implementor's Guide, which I have placed at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. As before, my biggest challenge is finding online, normative material for the details of the Asian character sets (except Korean, which is covered in an RFC). Ira and others have provided handy pointers to summaries by others, but I'm wondering where I find the horse's mouth. Talk to you next Wed. -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Fri Jan 3 15:10:27 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:38 2009 Subject: CR> CR teleconference [RFCs for Asian charsets] Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE6A@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Some more RFC references: RFC 1468 - ISO-2022-JP (Japanese) RFC 1554 - ISO-2022-JP-2 (Japanese) RFC 2237 - ISO-2022-JP-1 (Japanese) RFC 1557 - ISO-2022-KR (Korean) RFC 1922 - ISO-2022-CN and ISO-2022-CN-EXT (Chinese) Each of these refers in some detail to the underlying Japanese, Korean, or Chinese national standards that are placed in planes planes of these ISO-2022 encodings. Cheers, - Ira -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, January 03, 2003 1:16 PM To: cr@pwg.org Subject: CR> CR teleconference and Implementor's Guide Call-in arrangements for the next CR conference call: Time: 3:00 PM Eastern time Wed. 1/8 Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 As our main topic I would like to go through the draft Implementor's Guide, which I have placed at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. As before, my biggest challenge is finding online, normative material for the details of the Asian character sets (except Korean, which is covered in an RFC). Ira and others have provided handy pointers to summaries by others, but I'm wondering where I find the horse's mouth. Talk to you next Wed. -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From fujisawa.jun at canon.co.jp Mon Jan 6 05:43:59 2003 From: fujisawa.jun at canon.co.jp (Jun Fujisawa) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide In-Reply-To: References: Message-ID: Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Wed Jan 8 11:41:27 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Wed Jan 8 13:33:00 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: See Rod's notes for some ideas on terminology. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 01/08/2003 01:33 PM ----- "Acosta, Roderick" cc: 01/08/2003 01:08 PM Subject: RE: CR> CR teleconference and Implementor's Guide Elliott, Some suggestions from a colleague of mine. Character set: Unicode is the default character set for HTML and XHTML. The range of valid Unicode values ranges from hexadecimal 0 to 10FFFF (decimal 0 to 1,114,111). Any valid Unicode character is associated with a codepoint in the above specified range of scalar numbers. Unicode is an "ordered" character set because each character is represented by a unique scalar value. Transformations or Encodings: A Unicode scalar value can be expressed in a variety of digital forms, including UTF-8 and UTF-16. "UTF" stands for "Unicode Transformaton Format". UTF-8 and UTF-16 are often called "encodings" because they represent ("encode") the full range of scalar values. Unicode subset: What do we call it? The Unicode character set supports a large number of characters that are derived from other legacy character sets such as ISO 8859-x and JIS X 0208. With the exception of ISO 8859-1, all legacy characters must be mapped to their equivalent Unicode value through an algorirthmic and/or table-driven process. The ordering of characters in a legacy character set is not necessarily replicated in Unicode. What does one call a subset of Unicode values that represent a range of characters from a common, legacy character set? We would like to propose the term "character collection" because a. it does not imply any particular ordering b. it does represent a closed, enumerable set c. it is distinct from "character set" /Rod -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 9:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa (See attached file: att25f6a.dat) -------------- next part -------------- A non-text attachment was scrubbed... Name: att25f6a.dat Type: application/ms-tnef Size: 5022 bytes Desc: not available Url : http://www.pwg.org/archives/cr/attachments/20030108/d3768b90/att25f6a.bin From imcdonald at sharplabs.com Wed Jan 8 17:39:16 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE77@mailsrvnt02.enet.sharplabs.com> Hi folks, Sorry I missed the telecon earlier today. I failed to note the earlier time (3pm EST rather than 5pm EST). I wrote the following definition (for CUPS documentation), drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries: Character Repertoire: (1) The complete set of characters defined in a given named character set, such as ISO 8859-1. (2) The subset of characters defined in a large character set, such as Unicode 3.2, that are needed for an exact mapping to a smaller character set, such as ISO 8859-1. For PWG CR, we could refine (2) above to fix Unicode 3.2 (or later) as the "large character set". Cheers, - Ira McDonald. -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 10:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Thu Jan 9 11:11:01 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: Ira, I think definition #2 covers exactly what we are trying to do. Is this form in prior use? -Bluetooth BPP: yes -Unicode: I couldn't get this meaning out of the Unicode glossary -Posix: ??? At the call yesterday there was some interest in the term "character collection" as an alternative to "repertoire". Have you encounted this? Group: I am going to use Ira's definitions in the next version of the Guide. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" , Jun Fujisawa 01/08/2003 cc: cr@pwg.org, owner-cr@pwg.org 05:39 PM Subject: RE: CR> CR teleconference and Implementor's Guide Hi folks, Sorry I missed the telecon earlier today. I failed to note the earlier time (3pm EST rather than 5pm EST). I wrote the following definition (for CUPS documentation), drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries: Character Repertoire: (1) The complete set of characters defined in a given named character set, such as ISO 8859-1. (2) The subset of characters defined in a large character set, such as Unicode 3.2, that are needed for an exact mapping to a smaller character set, such as ISO 8859-1. For PWG CR, we could refine (2) above to fix Unicode 3.2 (or later) as the "large character set". Cheers, - Ira McDonald. -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 10:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From imcdonald at sharplabs.com Thu Jan 9 12:38:33 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE7A@mailsrvnt02.enet.sharplabs.com> Hi, In some places POSIX uses the "collection of characters" phrasing. In others it uses (especially in the revised POSIX:2000 spec) the "subset of characters defined in a larger character set..." phrasing. I think it's important to ALSO list the classic (1) definition in our spec. It makes clear where definition (2) came from. The ISO 10646 folks have being developing named formal ISO Profiles (a kind of ISO derived standard) that define "character repertoires" that are subsets of ISO 10646/Unicode (not the subsets we want, by the way, but more generic ones like Western European coverage). Cheers, - Ira -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, January 09, 2003 10:11 AM To: McDonald, Ira Cc: cr@pwg.org Subject: RE: CR> CR teleconference and Implementor's Guide Ira, I think definition #2 covers exactly what we are trying to do. Is this form in prior use? -Bluetooth BPP: yes -Unicode: I couldn't get this meaning out of the Unicode glossary -Posix: ??? At the call yesterday there was some interest in the term "character collection" as an alternative to "repertoire". Have you encounted this? Group: I am going to use Ira's definitions in the next version of the Guide. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" , Jun Fujisawa 01/08/2003 cc: cr@pwg.org, owner-cr@pwg.org 05:39 PM Subject: RE: CR> CR teleconference and Implementor's Guide Hi folks, Sorry I missed the telecon earlier today. I failed to note the earlier time (3pm EST rather than 5pm EST). I wrote the following definition (for CUPS documentation), drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries: Character Repertoire: (1) The complete set of characters defined in a given named character set, such as ISO 8859-1. (2) The subset of characters defined in a large character set, such as Unicode 3.2, that are needed for an exact mapping to a smaller character set, such as ISO 8859-1. For PWG CR, we could refine (2) above to fix Unicode 3.2 (or later) as the "large character set". Cheers, - Ira McDonald. -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 10:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Tue Jan 14 13:33:49 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Update Implementors Guide Message-ID: Following our discussions of last week, I have posted a new version at (same URL as before): ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm There are a lot of changes, but highlights include: -definitions for charset and character repertoire -more about Microsoft -more about Asian character specs -new section "Determining a Printer's Supported Repertoires" which gives some assumptions a client can make Comments (on this list) please. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Wed Jan 15 12:47:27 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> Telecon on Jan 15 - Update Implementors Guide Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE86@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Is there a conference call today in about two hours? Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Tuesday, January 14, 2003 12:34 PM To: cr@pwg.org Subject: CR> Jan 15 - Update Implementors Guide Following our discussions of last week, I have posted a new version at (same URL as before): ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm There are a lot of changes, but highlights include: -definitions for charset and character repertoire -more about Microsoft -more about Asian character specs -new section "Determining a Printer's Supported Repertoires" which gives some assumptions a client can make Comments (on this list) please. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Wed Jan 15 14:56:41 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> NO Telecon on Jan 15 Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE87@mailsrvnt02.enet.sharplabs.com> Hi folks, I just spoke to Elliott Bradshaw on the phone. His office is without power (no email, no Internet, no voice mail). There is NOT a telecon for PWG Character Repertoires WG today. Please review the latest version of CR Implementors Guide and send email comments before next Tuesday (face-to-face in Maui - you lucky devils...). Cheers, - Ira McDonald High North Inc -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Wednesday, January 15, 2003 11:47 AM To: 'ElliottBradshaw@oaktech.com'; cr@pwg.org Subject: RE: CR> Telecon on Jan 15 - Update Implementors Guide Hi Elliot, Is there a conference call today in about two hours? Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Tuesday, January 14, 2003 12:34 PM To: cr@pwg.org Subject: CR> Jan 15 - Update Implementors Guide Following our discussions of last week, I have posted a new version at (same URL as before): ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm There are a lot of changes, but highlights include: -definitions for charset and character repertoire -more about Microsoft -more about Asian character specs -new section "Determining a Printer's Supported Repertoires" which gives some assumptions a client can make Comments (on this list) please. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From harryl at us.ibm.com Mon Feb 10 18:15:44 2003 From: harryl at us.ibm.com (Harry Lewis) Date: Wed May 6 13:53:39 2009 Subject: CR> Possible change in D.C. Schedule Message-ID: We are currently scheduled for Tuesday, parallel with FSG. There may be a possibility to move our meeting to Monday. How would people on CR feel about this? Have folks already made travel reservations? ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.pwg.org/archives/cr/attachments/20030210/88701121/attachment.html From Rod.Acosta at AgfaMonotype.com Mon Feb 10 18:16:10 2003 From: Rod.Acosta at AgfaMonotype.com (Acosta, Roderick) Date: Wed May 6 13:53:39 2009 Subject: CR> Possible change in D.C. Schedule Message-ID: Fine with me (no reservations yet). /Rod -----Original Message----- From: Harry Lewis [mailto:harryl@us.ibm.com] Sent: Monday, February 10, 2003 4:16 PM To: cr@pwg.org Subject: CR> Possible change in D.C. Schedule We are currently scheduled for Tuesday, parallel with FSG. There may be a possibility to move our meeting to Monday. How would people on CR feel about this? Have folks already made travel reservations? ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.pwg.org/archives/cr/attachments/20030210/f2b677cc/attachment.html From ElliottBradshaw at oaktech.com Tue Feb 11 08:36:21 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Possible change in D.C. Schedule Message-ID: It would be OK with me...I haven't made arrangements yet. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Harry Lewis cc: Sent by: Subject: CR> Possible change in D.C. Schedule owner-cr@pwg.o rg 02/10/2003 06:15 PM We are currently scheduled for Tuesday, parallel with FSG. There may be a possibility to move our meeting to Monday. How would people on CR feel about this? Have folks already made travel reservations? ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- From ElliottBradshaw at oaktech.com Thu Feb 27 11:30:33 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Minutes, conference call Message-ID: Minutes from the January face-to-face for Character Repertoires are posted at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRMinutes-January-2003.html My next step is to create a document with a short list of names for preferred character encodings and repertoires.? This will be reviewed and discussed, and ultimately published as a PWG document, similar to the media sizes list, and is meant to be suitable for reference from the Semantic Model. I would like to schedule a conference call to review that document (which I will publish beforehand). How do people feel about 4:00 EST, on Wed. March 12? -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Thu Feb 27 16:25:36 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> Minutes, conference call 4pm EST Wed March 12? Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF0A@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Thanks - good minutes. Yes - I can make a conference at 4pm EST Wed March 12 -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, February 27, 2003 10:31 AM To: cr@pwg.org Subject: CR> Minutes, conference call Minutes from the January face-to-face for Character Repertoires are posted at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRMinutes-January-2003.html My next step is to create a document with a short list of names for preferred character encodings and repertoires.? This will be reviewed and discussed, and ultimately published as a PWG document, similar to the media sizes list, and is meant to be suitable for reference from the Semantic Model. I would like to schedule a conference call to review that document (which I will publish beforehand). How do people feel about 4:00 EST, on Wed. March 12? -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Mon Mar 3 11:42:39 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> FW: GB 18030 Information Required Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF14@mailsrvnt02.enet.sharplabs.com> Hi folks, Elliot - the first two white papers (links below) look highly useful. Markus Scherer is a Unicode and charsets heavy at IBM. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Markus Scherer [mailto:markus.scherer@jtcsv.com] Sent: Monday, March 03, 2003 10:26 AM To: vinay.aggarwal@rebus.co.in; charsets Subject: Re: GB 18030 Information Required vinay.aggarwal@rebus.co.in wrote: > Could you please let me know if following supports the GB18030? > - Any web based application > - Browser (Internet Explorer/ Netsacpe) based application Yes and no. Generally, web-based applications and browsers and related protocols do support GB 18030 and Unicode and various other charsets. Specifically, you need to read about - charsets, e.g., http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html and about the particular applications (and versions of them) that you intend to use. markus From ElliottBradshaw at oaktech.com Mon Mar 3 12:32:07 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> FW: GB 18030 Information Required Message-ID: Interesting. If I read this correctly, then 18030 is a mapping to ALL of Unicode. This would make it an encoding, but not a subset. If that's right, then we would treat it as a kind of charset, but not as a repertoire. Your thoughts? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" plabs.com> cc: Sent by: Subject: CR> FW: GB 18030 Information Required owner-cr@pwg.or g 03/03/2003 11:42 AM Hi folks, Elliot - the first two white papers (links below) look highly useful. Markus Scherer is a Unicode and charsets heavy at IBM. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Markus Scherer [mailto:markus.scherer@jtcsv.com] Sent: Monday, March 03, 2003 10:26 AM To: vinay.aggarwal@rebus.co.in; charsets Subject: Re: GB 18030 Information Required vinay.aggarwal@rebus.co.in wrote: > Could you please let me know if following supports the GB18030? > - Any web based application > - Browser (Internet Explorer/ Netsacpe) based application Yes and no. Generally, web-based applications and browsers and related protocols do support GB 18030 and Unicode and various other charsets. Specifically, you need to read about - charsets, e.g., http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html and about the particular applications (and versions of them) that you intend to use. markus From ElliottBradshaw at oaktech.com Mon Mar 3 15:11:38 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Preferred Character Repertoires in Printers Message-ID: I have posted a draft document at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030228.html In it I take a shot at how to organize the use of character repertoires in printing from small devices. The meat of this document is pretty short; I'll add the potatoes when/if there is consensus that this is the right way to go. Following the discussion in Maui I replace ISO-8859 references with similar code charts from Unicode. Our next conference call is comfirmed at: 4pm EST Wed March 12 In this call I would like to discuss comments on this document. Also please send points for discussion to the reflector beforehand. -Elliott Pete Z.: what other things should we consider so that this can be referenced from SM? ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Mon Mar 3 16:38:58 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> FW: GB 18030 Information Required Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF17@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Yes - GB18030 is a mapping to EVERY codepoint in Unicode (not just the assigned ones, but all 1.1 million possible Unicode codepoints). But it's a multi-byte, variable-length (one to four bytes) set of codepoints in GB18030. As Markus Scherer says it is best thought of as a Chinese-market UTF (Unicode Transformation Format), like UTF-8, UTF-16, and UTF-32. I agree with you therefore, that PWG CR should view GB18030 as a valid 'charset' (which can be tagged) but NOT as a unique 'repertoire' (because it's a different encoding of Unicode). Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Monday, March 03, 2003 11:32 AM To: McDonald, Ira Cc: 'cr@pwg.org'; owner-cr@pwg.org Subject: Re: CR> FW: GB 18030 Information Required Interesting. If I read this correctly, then 18030 is a mapping to ALL of Unicode. This would make it an encoding, but not a subset. If that's right, then we would treat it as a kind of charset, but not as a repertoire. Your thoughts? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" plabs.com> cc: Sent by: Subject: CR> FW: GB 18030 Information Required owner-cr@pwg.or g 03/03/2003 11:42 AM Hi folks, Elliot - the first two white papers (links below) look highly useful. Markus Scherer is a Unicode and charsets heavy at IBM. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Markus Scherer [mailto:markus.scherer@jtcsv.com] Sent: Monday, March 03, 2003 10:26 AM To: vinay.aggarwal@rebus.co.in; charsets Subject: Re: GB 18030 Information Required vinay.aggarwal@rebus.co.in wrote: > Could you please let me know if following supports the GB18030? > - Any web based application > - Browser (Internet Explorer/ Netsacpe) based application Yes and no. Generally, web-based applications and browsers and related protocols do support GB 18030 and Unicode and various other charsets. Specifically, you need to read about - charsets, e.g., http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html and about the particular applications (and versions of them) that you intend to use. markus From ElliottBradshaw at oaktech.com Mon Mar 10 16:52:49 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern Message-ID: Whether or not you can attend the call, please have a look at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030228.html and send any comments to the reflector. Our next CR conference call is this Wednesday at 4:00 Eastern. Call-in info: Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 Our main agenda item is to review the above-referenced document, so that I can edit it prior to our face-to-face in DC. -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From jim.bigelow at hp.com Wed Mar 12 14:14:09 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:39 2009 Subject: CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern Message-ID: <25C4C6009B5BD5118FF30003470BF7F509FDBF3E@xboi04.boi.hp.com> Hello, I've reviewed the draft and support the rules 1 - 3 for conforming printers. I'd also wish to point out that I think that the character entities of XHTML-Print extend the languages supported beyond those in Unicode Basic Latin. However, I've not determined what those languages are beyond knowing they are not Russian, Greek, Hewbrew, Arabic, Thai, or those of the PRC, Japan, Korea or Taiwan. Jim Bigelow Hewlett-Packard 208-396-2068 jim.bigelow@hp.com > -----Original Message----- > From: ElliottBradshaw@oaktech.com > [mailto:ElliottBradshaw@oaktech.com] > Sent: Monday, March 10, 2003 1:53 PM > To: cr@pwg.org > Subject: CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern > > > Whether or not you can attend the call, please have a look at: > > > ftp://ftp.pwg.org/pub/pwg/Character-> Repertoires/wd-pcr10-20030228.html > > and send any comments to the reflector. > > > Our next CR conference call is this Wednesday at 4:00 > Eastern. Call-in > info: > > Dial in #: 888 205-5513 or 719 955-0562 > Participant passcode: 176310 > > > Our main agenda item is to review the above-referenced > document, so that I can edit it prior to our face-to-face in DC. > > -Elliott > > > ------------------------------------------ > Elliott Bradshaw > Director, Software Engineering > Oak Technology Imaging Group > 781 638-7534 > > From ElliottBradshaw at oaktech.com Thu Mar 13 15:41:27 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard Message-ID: Harry, Apropos of this, I wanted to let you know the latest ideas for Character Repertoires. As decided at Maui, we plan to create a standards track document that can be referenced by the semantic model. This will describe a SM element that is used to advertise the repertoires supported by a device. We will, at some future point, want to assign a PWG number to this. I will do my best to follow the existing process, then cut over to the new one when it is official. One problem is that we don't have a formal chartered CR group. Since this standard may be our entire work, I don't know that we need to go through chartering. Options are: -do a CR charter -create this document under some other group -some sort of "individual submission" scheme Should we discuss this at the plenary? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 03/13/2003 03:36 PM ----- "Hastings, Tom N" xerox.com> cc: pwg@pwg.org Sent by: Subject: PWG> PWG IEEE-ISTO number for Proposed owner-pwg@pwg.org XHTML/Print standard 03/13/2003 03:22 PM Harry, Per the discussion today at the SM telecon on PWG process about standards numbers and what to do about allocating a PWG number for the Proposed PWG XHTML/Print standard as requested by Don for the W3C. In order to give Don a PWG number for the XHTML/Print Proposed PWG Standard, the next series of numbers not yet used is 5102.n. Currently, Proposed PWG standards have the following numbers: 5100.1, 5100.2, 5100.3, 5100.4 ... for IPP 5101.1 for the Media Standardized Names So how about 5102.1 for XHTML/Print. If there are several documents, 5102.1 and 5102.2 ISSUE: How to number future standards? We can decide later how to allocate numbers for: PWG Semantic Model Print Services Interface IPPFAX PDF/is etc. Is the 5102 series for document formats, so that PDF/is would go in that series? Should IPPFAX go in its own series, or should it be in the IPP 5100.n series? Should PWG Semantic Model be in its own series? Should PSI be in its own series? Or is there some common theme that would help put some of these in the same series. ISSUE: Separate isssue is what happens when the Proposed/Candidate Standard reaches Standard? Does it get a new number or use the same number? If a new number could it be some algorithm from its original number, such as adding 50. So 5150.2 would be the Standard version of Proposed standard 5100.2. Tom From harryl at us.ibm.com Thu Mar 13 16:50:40 2003 From: harryl at us.ibm.com (Harry Lewis) Date: Wed May 6 13:53:39 2009 Subject: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard In-Reply-To: Message-ID: Yes, I will (obviously) need to carve a large time slot in the plenary for said process discussions! ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- ElliottBradshaw@oaktech.com Sent by: owner-cr@pwg.org 03/13/2003 01:41 PM To: Harry Lewis/Boulder/IBM@IBMUS cc: "Hastings, Tom N" , pzehler@crt.xerox.com, cr@pwg.org Subject: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard Harry, Apropos of this, I wanted to let you know the latest ideas for Character Repertoires. As decided at Maui, we plan to create a standards track document that can be referenced by the semantic model. This will describe a SM element that is used to advertise the repertoires supported by a device. We will, at some future point, want to assign a PWG number to this. I will do my best to follow the existing process, then cut over to the new one when it is official. One problem is that we don't have a formal chartered CR group. Since this standard may be our entire work, I don't know that we need to go through chartering. Options are: -do a CR charter -create this document under some other group -some sort of "individual submission" scheme Should we discuss this at the plenary? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 03/13/2003 03:36 PM ----- "Hastings, Tom N" xerox.com> cc: pwg@pwg.org Sent by: Subject: PWG> PWG IEEE-ISTO number for Proposed owner-pwg@pwg.org XHTML/Print standard 03/13/2003 03:22 PM Harry, Per the discussion today at the SM telecon on PWG process about standards numbers and what to do about allocating a PWG number for the Proposed PWG XHTML/Print standard as requested by Don for the W3C. In order to give Don a PWG number for the XHTML/Print Proposed PWG Standard, the next series of numbers not yet used is 5102.n. Currently, Proposed PWG standards have the following numbers: 5100.1, 5100.2, 5100.3, 5100.4 ... for IPP 5101.1 for the Media Standardized Names So how about 5102.1 for XHTML/Print. If there are several documents, 5102.1 and 5102.2 ISSUE: How to number future standards? We can decide later how to allocate numbers for: PWG Semantic Model Print Services Interface IPPFAX PDF/is etc. Is the 5102 series for document formats, so that PDF/is would go in that series? Should IPPFAX go in its own series, or should it be in the IPP 5100.n series? Should PWG Semantic Model be in its own series? Should PSI be in its own series? Or is there some common theme that would help put some of these in the same series. ISSUE: Separate isssue is what happens when the Proposed/Candidate Standard reaches Standard? Does it get a new number or use the same number? If a new number could it be some algorithm from its original number, such as adding 50. So 5150.2 would be the Standard version of Proposed standard 5100.2. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.pwg.org/archives/cr/attachments/20030313/1d0ce8d1/attachment.html From imcdonald at sharplabs.com Sat Mar 22 18:50:10 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> RE: Value matching in CR Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF3F@mailsrvnt02.enet.sharplabs.com> Hi Elliot, All existing UNIX implementations of POSIX locales do locale name matching (language/charset concatenations) based on the rules at (1) below. But POSIX itself does not formalize this matching rule (anywhere I've been able to find so far). (1) Only for purposes of comparing two character repertoire names, Printers (or Clients) MUST: (a) convert all letters to lowercase; (b) remove all hyphens, underscores, and periods; and (c) truncate semi-colons (year of standard version separators) and any trailing date info Although the character set with the common alias "Latin 1" has been registered with a 'Name:' of "ISO_8859-1:1987" in the IANA Charset Registry, it is also VERY commonly referred to by existing software as "iso8859-1" or "iso-8859-1" or "iso_8859.1" (notice the typical misuse of periods and inconsistent presence of hyphen after "iso"). It is highly desirable that IPP/PSI Printers/Clients behave like Web search engines and accept all approximate matches as equal. (2) For purposes of displaying supported character repertoires in the future "repertoire-supported" Printer object attribute, Printers MUST: (a) use a 'namespace' prefix from the PWG CR standard (such as "unihan") in all lowercase, followed by a hyphen; (b) use the best practice name of the base charset - for the "iana" prefix, this MUST be the registered 'Name:' value (complete with the year of standard suffix after a colon) and MUST NOT be any registered 'Alias:' value. However, this value MUST be normalized to lowercase, consistent with the existing 'charset-supported' Printer attribute semantics. And any imbedded underscores MUST be changed to hyphens for consistency. I'd like to say it's OK to retain the colon/date info for the comparisons, but it's really not safe, practically speaking. Note that the existing "charset-supported" attribute says that Printers MUST use the 'Name:' value and MUST NOT use any of the 'Alias:' values from the IANA Charset Registry. An interesting sidelight: The Printer MIB (RFC 1759) uses the enum tags that are 'Alias:' values beginning with "cs" (and containing NO punctuation characters at all, as recommended by SMIv2 for MIBs). When the Printer MIB is "visible" through the future PWG WBMM interface (and the new Printer Device in the PWG Semantic Model), we'll be faced with another interesting name collision. Sigh... Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, March 21, 2003 11:50 AM To: McDonald, Ira Subject: Value matching in CR Hi Ira, I've been fiddling with the rules for matching CR values...in the last version I said that hyphens and underscores would be dropped before comparison. This may be a bit drastic...what if we say that a hyphen matches an underscore? Also, I think you said there was some reference would could use on the subject. True? Thanks, E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Mon Mar 24 16:38:05 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> New CR document: Standard for Character Repertoire Interoperabiliy Message-ID: Folks, I have placed an updated version of the Chracter Repertoires document at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html This is a standards track document intended to serve as a reference for the Semantic Model. In addition to Yet Another Name, hightlights of recent changes include: -Changed the title to remove "Preferred" -Marked some sections as Informative -Formatting cleanup; addition of copyright notice, acknowledgements, etc. -Clarification in the Abstract and Introduction of goals and non-goals -More information about how this document relates to the Semantic Model -Changed the details of syntax for repertoire names -More information about rules for matching repertoire names -Clarified the wording regarding font sensitivity -Confirmed use of Unicode code charts for basic non-Asian repertoires -Changed from the notion of "Preferred Repertoire" to "Basic Repertoire"; this emphasizes that the printer is free to advertise additional repertoires -Included Latin-1 Supplement and Latin Extended-A as Basic Repertoires -Added requirement to support the euro character -References Open issues (that I know of) are highlighted in the document. Based on recent meetings I think we are approaching consensus on a number of issues. Some specific things I want to work on next week: -making this suitable for use by SM -applying the appropriate PWG process: name, number, approval, etc. Depending on how our discussion goes we may be able to move to Last Call in the fairly near future. Please send comments ASAP, esp. if you will not be in DC. Thanks, Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Fri Mar 28 08:46:37 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Character Repertoire conference call in lieu of face-to-face Message-ID: Harry and ISTO have set up phone bridges for next week. The CR session will be Monday (3/31) at 12:30-2:00 Pacific, aka 3:30-5:00 Eastern. To take part: Dial-In #: +1 719-457-0335 Participant Password: 400908# Our main agenda is to turn pages and comment on: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html Pete Zehler has added a reference to this in the Semantic Model. His comment: The latest version of the SM now includes Character Repertoire. I have posted a preliminary version linked to the SM web site (or directly at ftp://ftp.pwg.org/pub/pwg/Semantic-Model/PWG-Semantic-Model-Latest.pdf) Take a look at three locations in the document and let me know if it's OK. The locations are Figure 5, Table 6 and the reference in section 11. (All sorted alphabetically) I know the reference will need to be updated as your document progresses. Near the end of Monday's meeting I plan to ask the group what we need to address to prepare for Last Call on this. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Mon Mar 31 15:30:31 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> RE: PWG> Character Repertoire conference call in lieu of face-to- face Message-ID: In case you didn't see this. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 03/31/2003 03:31 PM ----- "BERKEMA,ALAN C To: "'ElliottBradshaw@oaktech.com'" (HP-Roseville, , cr@pwg.org ex1)" cc: pwg@pwg.org Character Repertoire conference hp.com> call in lieu of face-to- face 03/31/2003 11:01 AM Alan Berkema, You have successfully scheduled your meeting using Webex. ------------------------- TO START THIS MEETING ------------------------- On 3/31/2003, shortly before 12:30PM (GMT -08:00) Pacific Time, USA & Canada, click this URL: https://hp.webex.com/webex/e.php?AT=MO On the My Meetings page, click Start Now for this meeting. ------------------------- FIRST TIME USERS ------------------------- For fully interactive meetings, including the ability to present your documents and applications, a one-time setup takes less than 10 minutes. Click this URL to set up now: https://hp.webex.com/join/ Then click New User. ------------------------- MEETING SUMMARY ------------------------- Name: CR Date: 3/31/2003 Time: 12:30PM, (GMT -08:00) Pacific Time, USA & Canada Meeting Number: 28174637 Meeting Password: mynewcr Teleconference: None. Agenda: Host Key: 348053 used to re-assign host privilege. Host: Alan Berkema 1(916)7855605 mailto:alan_berkema@hp.com http://www.webex.com We've got to start meeting like this(TM) -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, March 28, 2003 5:47 AM To: cr@pwg.org Cc: pwg@pwg.org Subject: PWG> Character Repertoire conference call in lieu of face-to-face Harry and ISTO have set up phone bridges for next week. The CR session will be Monday (3/31) at 12:30-2:00 Pacific, aka 3:30-5:00 Eastern. To take part: Dial-In #: +1 719-457-0335 Participant Password: 400908# Our main agenda is to turn pages and comment on: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html Pete Zehler has added a reference to this in the Semantic Model. His comment: The latest version of the SM now includes Character Repertoire. I have posted a preliminary version linked to the SM web site (or directly at ftp://ftp.pwg.org/pub/pwg/Semantic-Model/PWG-Semantic-Model-Latest.pdf) Take a look at three locations in the document and let me know if it's OK. The locations are Figure 5, Table 6 and the reference in section 11. (All sorted alphabetically) I know the reference will need to be updated as your document progresses. Near the end of Monday's meeting I plan to ask the group what we need to address to prepare for Last Call on this. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Thu Apr 3 10:23:51 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Minutes, next teleconf. Message-ID: Minutes for this week's meeting are posted at: http://www.pwg.org/cr/CRMinutes-March-2003.html Comments on the reflector are welcome. I would like to schedule our next teleconference. How do people feel about: Wed. 4/9 3:00 Eastern or Wed. 4/16 3:00 Eastern Our main discussion topic is the idea of moving some of the material from the normative spec into a Best Practices. Current discussion on this is summarized in the minutes. Lemme know, E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Mon Apr 14 17:47:59 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> Available character repertoires/fonts on various OS Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF7C@mailsrvnt02.enet.sharplabs.com> Hi folks, Richard Ishida (chair of W3C Internationalization GEO project) just asked a question about what fonts are usually installed on various operating system platforms (Windows/XP, Linux, etc.) and therefore available for use in HTML documents (on the W3C Internationalization mailing list). The best response so far is to look at Markus Kuhn's info at: http://www.cl.cam.ac.uk/~mgk25/unicode.html#fonts This discussion will probably be highly relevant for the PWG's Character Repertoires WG and for FSG's Driver and Renderer WG. From ElliottBradshaw at oaktech.com Thu May 1 16:18:44 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> Next teleconference Message-ID: I plan to hold the next CR teleconference on Wed. May 7 at 3 EDT / 12 PDT. (Limited to 1 hour) Our agenda is to discuss splitting out a Best Practices from the spec (as described in the posted minutes), and to work on a Charter for the group. If anyone wishing to attend has a conflict please let me know ASAP. E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Mon May 5 14:34:59 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR conference call info. Message-ID: The next CR call is this Wednesday. Time: 3:00 EDT / 12:00 PDT Wed. 5/7 Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 Agenda: -Best Practices, as described in the March minutes -New draft Charter for the group Both of these documents are posted at www.pwg.org/cr/index.htm. See you then, Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Sun May 18 13:31:24 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> Draft of IPP "repertoire-supported" Printer attribute Message-ID: <116DB56CD7DED511BC7800508B2CA53735CFE1@mailsrvnt02.enet.sharplabs.com> [With apologies for cross-posting to IPP and Character Repertoires mailing lists.] Background - the IEEE/ISTO PWG Character Repertoires standard (a standard for NAMES, not a standard requiring support of particular character repertoires) is nearly complete and is expected to be in PWG 'last call' during the June 2003 face-to-face PWG meeting. The latest CR working draft is at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html ------------------------------------------------------------------------ Hi folks, Sunday (18 May 2003) Below is a draft version of the IPP "repertoire-supported" attribute, for inclusion in Appendix B 'Bindings to IPP' of the next working draft of the PWG Character Repertoires standard. First, some background. When I started to write up this attribute, I realized that our (currently proposed) syntax for CR labels now uses characters that are not allowed in the IPP "keyword" datatype. We _could_ add a new datatype (similar to "charset") to IPP called "repertoire". Tom Hastings has convinced me that a new "repertoire" datatype is a _very_ bad idea. Most importantly, it would break all existing IPP parsers. Instead, Tom and I agree that we should alter our CR labels to achieve strict conformance to the IPP "keyword" syntax. Then IANA can register our small set of well-known CR/1.0 labels in the IANA IPP registry, along with the new IPP "repertoire-supported" attribute itself. Cheers, - Ira McDonald High North Inc ISSUE: The Unihan names (based on the source legacy CJK charset) are _not_ disjoint (i.e., they DO overlap). Should we abandon their use in favor of IANA Charset Registry names. What value do these Unihan names add? (hint - read the attribute description below before commenting) Describing the Unicode HAN character assignments based on Unicode code chart titles (from http://www.unicode.org/charts/) _does_ provide unique non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is the title for the Unicode character block starting at 'U+2E80'). ------------------------------------------------------------------------ repertoire-supported (1setOf (keyword | name)) This REQUIRED IPP Printer Description attribute identifies some or all of the character repertoires that the IPP Printer object and contained IPP Job objects support for rendering of document data content. At least the value 'unicode_basic-latin' MUST always be present, since conforming IPP Printers MUST support at least the character repertoire defined in the Unicode/4.0 'Basic Latin' code chart (and character block). A character repertoire is defined as a named subset of the characters defined in a given character set standard (e.g., Unicode/4.0) that are supported for output rendering of document data. The character set of the document data (e.g., the value of "document-charset" in the the IPP Document object) constrains the relevant character repertoires (e.g., since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that character _cannot_ be represented in the ISO 8859-1 character set). Character repertoires of legacy character sets (e.g., ISO 8859-1 and ISO 8859-2) often overlap. However, character repertoires identified by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are disjoint). Therefore, a conforming IPP Printer SHOULD advertise "repertoire-supported" values based on the Unicode/4.0 code chart titles, to avoid ambiguity. The ABNF [RFC2234] for legal values of "repertoire-supported" is: repertoire = rep-prefix "_" rep-name rep-prefix = "unicode" / ; from Code Chart titles ; of Unicode/4.0 char database "unihan" / ; from Code Chart titles of ; of Unicode/4.0 Unihan database "iana" / ; from Name or Alias fields in ; IANA Charset Registry "vendor" ; from vendor-specific ; repertoire names rep-name = rep-alpha *(rep-char) rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars rep-alpha = %61-7A ; lowercase a-z rep-digit = %30-39 ; decimal 0-9 Mapping Rule 1: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them to their corresponding lowercase alpha characters. Mapping Rule 2: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets)] contains any other non-keyword characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them (including space) to a hyphen "-" character. From hastings at cp10.es.xerox.com Mon May 19 15:28:41 2003 From: hastings at cp10.es.xerox.com (Hastings, Tom N) Date: Wed May 6 13:53:39 2009 Subject: CR> Draft of IPP "repertoire-supported" Printer attribute Message-ID: Minor comment: Why allow the "_" in the rep-char (repertoire character names), since space characters are to be mapped to "-", not "_"? The advantage of not allowing "_" is that a future field could be added using "_" as a field separator, but not if "_" could be in rep-char. rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars Tom -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Sunday, May 18, 2003 10:31 To: 'cr@pwg.org'; 'ipp@pwg.org' Subject: CR> Draft of IPP "repertoire-supported" Printer attribute [With apologies for cross-posting to IPP and Character Repertoires mailing lists.] Background - the IEEE/ISTO PWG Character Repertoires standard (a standard for NAMES, not a standard requiring support of particular character repertoires) is nearly complete and is expected to be in PWG 'last call' during the June 2003 face-to-face PWG meeting. The latest CR working draft is at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html ------------------------------------------------------------------------ Hi folks, Sunday (18 May 2003) Below is a draft version of the IPP "repertoire-supported" attribute, for inclusion in Appendix B 'Bindings to IPP' of the next working draft of the PWG Character Repertoires standard. First, some background. When I started to write up this attribute, I realized that our (currently proposed) syntax for CR labels now uses characters that are not allowed in the IPP "keyword" datatype. We _could_ add a new datatype (similar to "charset") to IPP called "repertoire". Tom Hastings has convinced me that a new "repertoire" datatype is a _very_ bad idea. Most importantly, it would break all existing IPP parsers. Instead, Tom and I agree that we should alter our CR labels to achieve strict conformance to the IPP "keyword" syntax. Then IANA can register our small set of well-known CR/1.0 labels in the IANA IPP registry, along with the new IPP "repertoire-supported" attribute itself. Cheers, - Ira McDonald High North Inc ISSUE: The Unihan names (based on the source legacy CJK charset) are _not_ disjoint (i.e., they DO overlap). Should we abandon their use in favor of IANA Charset Registry names. What value do these Unihan names add? (hint - read the attribute description below before commenting) Describing the Unicode HAN character assignments based on Unicode code chart titles (from http://www.unicode.org/charts/) _does_ provide unique non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is the title for the Unicode character block starting at 'U+2E80'). ------------------------------------------------------------------------ repertoire-supported (1setOf (keyword | name)) This REQUIRED IPP Printer Description attribute identifies some or all of the character repertoires that the IPP Printer object and contained IPP Job objects support for rendering of document data content. At least the value 'unicode_basic-latin' MUST always be present, since conforming IPP Printers MUST support at least the character repertoire defined in the Unicode/4.0 'Basic Latin' code chart (and character block). A character repertoire is defined as a named subset of the characters defined in a given character set standard (e.g., Unicode/4.0) that are supported for output rendering of document data. The character set of the document data (e.g., the value of "document-charset" in the the IPP Document object) constrains the relevant character repertoires (e.g., since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that character _cannot_ be represented in the ISO 8859-1 character set). Character repertoires of legacy character sets (e.g., ISO 8859-1 and ISO 8859-2) often overlap. However, character repertoires identified by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are disjoint). Therefore, a conforming IPP Printer SHOULD advertise "repertoire-supported" values based on the Unicode/4.0 code chart titles, to avoid ambiguity. The ABNF [RFC2234] for legal values of "repertoire-supported" is: repertoire = rep-prefix "_" rep-name rep-prefix = "unicode" / ; from Code Chart titles ; of Unicode/4.0 char database "unihan" / ; from Code Chart titles of ; of Unicode/4.0 Unihan database "iana" / ; from Name or Alias fields in ; IANA Charset Registry "vendor" ; from vendor-specific ; repertoire names rep-name = rep-alpha *(rep-char) rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars rep-alpha = %61-7A ; lowercase a-z rep-digit = %30-39 ; decimal 0-9 Mapping Rule 1: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them to their corresponding lowercase alpha characters. Mapping Rule 2: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets)] contains any other non-keyword characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them (including space) to a hyphen "-" character. From imcdonald at sharplabs.com Mon May 19 15:38:55 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> Draft of IPP "repertoire-supported" Printer attribute Message-ID: <116DB56CD7DED511BC7800508B2CA53735CFE9@mailsrvnt02.enet.sharplabs.com> Hi Tom, Because "_" has already been used in the Name fields in the IANA Charset Registry, and we don't want to overly alter those in the 'iana_' namespace for repertoires. Cheers, - Ira PS - The namespace prefix MUST use the only 'field separator' permanently. PPS - Folks should ignore this discussion for a day or two. Elliot and I are working offline to refine the proposal and figure out the impacts on the main CR spec - thanks! -----Original Message----- From: Hastings, Tom N [mailto:hastings@cp10.es.xerox.com] Sent: Monday, May 19, 2003 3:29 PM To: McDonald, Ira Cc: 'cr@pwg.org'; 'ipp@pwg.org' Subject: RE: CR> Draft of IPP "repertoire-supported" Printer attribute Minor comment: Why allow the "_" in the rep-char (repertoire character names), since space characters are to be mapped to "-", not "_"? The advantage of not allowing "_" is that a future field could be added using "_" as a field separator, but not if "_" could be in rep-char. rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars Tom -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Sunday, May 18, 2003 10:31 To: 'cr@pwg.org'; 'ipp@pwg.org' Subject: CR> Draft of IPP "repertoire-supported" Printer attribute [With apologies for cross-posting to IPP and Character Repertoires mailing lists.] Background - the IEEE/ISTO PWG Character Repertoires standard (a standard for NAMES, not a standard requiring support of particular character repertoires) is nearly complete and is expected to be in PWG 'last call' during the June 2003 face-to-face PWG meeting. The latest CR working draft is at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html ------------------------------------------------------------------------ Hi folks, Sunday (18 May 2003) Below is a draft version of the IPP "repertoire-supported" attribute, for inclusion in Appendix B 'Bindings to IPP' of the next working draft of the PWG Character Repertoires standard. First, some background. When I started to write up this attribute, I realized that our (currently proposed) syntax for CR labels now uses characters that are not allowed in the IPP "keyword" datatype. We _could_ add a new datatype (similar to "charset") to IPP called "repertoire". Tom Hastings has convinced me that a new "repertoire" datatype is a _very_ bad idea. Most importantly, it would break all existing IPP parsers. Instead, Tom and I agree that we should alter our CR labels to achieve strict conformance to the IPP "keyword" syntax. Then IANA can register our small set of well-known CR/1.0 labels in the IANA IPP registry, along with the new IPP "repertoire-supported" attribute itself. Cheers, - Ira McDonald High North Inc ISSUE: The Unihan names (based on the source legacy CJK charset) are _not_ disjoint (i.e., they DO overlap). Should we abandon their use in favor of IANA Charset Registry names. What value do these Unihan names add? (hint - read the attribute description below before commenting) Describing the Unicode HAN character assignments based on Unicode code chart titles (from http://www.unicode.org/charts/) _does_ provide unique non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is the title for the Unicode character block starting at 'U+2E80'). ------------------------------------------------------------------------ repertoire-supported (1setOf (keyword | name)) This REQUIRED IPP Printer Description attribute identifies some or all of the character repertoires that the IPP Printer object and contained IPP Job objects support for rendering of document data content. At least the value 'unicode_basic-latin' MUST always be present, since conforming IPP Printers MUST support at least the character repertoire defined in the Unicode/4.0 'Basic Latin' code chart (and character block). A character repertoire is defined as a named subset of the characters defined in a given character set standard (e.g., Unicode/4.0) that are supported for output rendering of document data. The character set of the document data (e.g., the value of "document-charset" in the the IPP Document object) constrains the relevant character repertoires (e.g., since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that character _cannot_ be represented in the ISO 8859-1 character set). Character repertoires of legacy character sets (e.g., ISO 8859-1 and ISO 8859-2) often overlap. However, character repertoires identified by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are disjoint). Therefore, a conforming IPP Printer SHOULD advertise "repertoire-supported" values based on the Unicode/4.0 code chart titles, to avoid ambiguity. The ABNF [RFC2234] for legal values of "repertoire-supported" is: repertoire = rep-prefix "_" rep-name rep-prefix = "unicode" / ; from Code Chart titles ; of Unicode/4.0 char database "unihan" / ; from Code Chart titles of ; of Unicode/4.0 Unihan database "iana" / ; from Name or Alias fields in ; IANA Charset Registry "vendor" ; from vendor-specific ; repertoire names rep-name = rep-alpha *(rep-char) rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars rep-alpha = %61-7A ; lowercase a-z rep-digit = %30-39 ; decimal 0-9 Mapping Rule 1: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them to their corresponding lowercase alpha characters. Mapping Rule 2: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets)] contains any other non-keyword characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them (including space) to a hyphen "-" character. From imcdonald at sharplabs.com Fri May 30 19:09:05 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> PWG SM bindings for new RepertoireSupported element Message-ID: <116DB56CD7DED511BC7800508B2CA53735D00F@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Friday (30 May 2003) Per our conversation this afternoon, below is the complete verbatim text of Appendix C for the CR spec. Cheers, - Ira McDonald High North Inc ------------------------------------------------------------------------ C. Bindings to the PWG Semantic Model (Normative) To add the RepertoireSupported element to the PWG Semantic Model, the following XML Schema fragments SHALL be added to the specified files. Add the following simple type to the file 'PwgWellKnownValues.xsd': Add the following element reference to the file 'PrinterDescription.xsd' in the complex type "PrinterDescription": Add the following simple element to the file 'PrinterDescription.xsd' after the complex type "PrinterDescription": RepertoireWKV KeywordNsExtensionPattern From imcdonald at sharplabs.com Sun Jun 1 19:45:54 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> RE: PWG-ANNOUNCE> Character Repertoires Charter and Last Call Message-ID: <116DB56CD7DED511BC7800508B2CA53735D012@mailsrvnt02.enet.sharplabs.com> [I took pwg-announce off the cc: in this reply - added CR list] Hi, I agree with your suggestion that we should be using 'charset' (in the IETF/IANA sense) for a 'coded character set' (such as Unicode 4.0) in a 'character encoding scheme' (such as UTF-8). That would also be consistent with the usage in IPP/1.1 (RFC 2911), where the base datatype 'charset' is defined (on page 86) for the IPP Printer attributes 'charset-configured' and 'charset-supported'. Also, the CR charter and eventual standard should have a reference to the W3C Character Model. Cheers, - Ira McDonald -----Original Message----- From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp] Sent: Saturday, May 31, 2003 7:12 PM To: ElliottBradshaw@oaktech.com Cc: pwg-announce@pwg.org Subject: Re: PWG-ANNOUNCE> Character Repertoires Charter and Last Call Hello Elliott, At 5:20 PM -0400 03.5.29, ElliottBradshaw@oaktech.com wrote: >A Charter has been reviewed within the CR group and there are no open >issues. > >It is available online at >ftp://ftp.pwg.org/pub/pwg/cr/charter/ch-cr10-20030507.html. > >So today I begin a 10-day Last Call for comments on this document, prior to >a formal vote by the PWG. I feel a little uncomfortable with the following paragraph in the Charter. >In Unicode and W3C specifications, the term "character set" usually >refers to a method of encoding a (possibly very large) set of characters, >e.g. UTF-8. This tells how to encode a given character if it is present, >but doesn't define which characters in that space are actually in use. In the Character Model for the World Wide Web specification, W3C clearly deny the use of the term "character set" to refer to a method of encoding. >[S]?Specifications SHOULD avoid using the terms 'character set' and >'charset' to refer to a character encoding, except when the latter is used >to refer to the MIME charset parameter or its IANA-registered values. >The terms 'character encoding', 'character encoding form' or 'character >encoding scheme' are RECOMMENDED. I suggest to change the wording to something like the following. In Unicode and W3C specifications, the term "character set" usually refers to a (possibly very large) set of characters, e.g. ISO/IEC 10646. The term "character set", however, can be confusing in some cases, since the similar term "charset" is used as a MIME parameter, which refers to the combination of "coded character set" and "character encoding scheme", not just the former. -- Jun Fujisawa From imcdonald at sharplabs.com Thu Jun 5 11:57:35 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> Charset terminology Message-ID: <116DB56CD7DED511BC7800508B2CA53735D02A@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Below are terminology section updates for the CR spec for "charset". Cheers, - Ira McDonald High North Inc ---------------------------------------- Charset Terminology The following terms are used in this specification, exactly as defined in section 1 'Definitions and Notation' of the IANA Charset Registration Procedures [RFC2978]: "character", "charset", "coded character set (CCS)", and "character encoding scheme (CES)". charset: A coded charset set (e.g., ISO/IEC 10646), optionally combined with a character set encoding scheme (e.g., UTF-8). From ElliottBradshaw at oaktech.com Tue Jun 10 17:28:28 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR documents and agenda Message-ID: Following our discussion at the last CR conference call, I have split out Best Practices and posted two documents: ftp://ftp.pwg.org/pub/pwg/cr/wd/wd-crrs10-20030606.html ftp://ftp.pwg.org/pub/pwg/cr/wd/wd-crbp10-20030606.html At our face-to-face next week I would like to discuss: 1. Go through these documents and get any feedback prior to Last Call. 2. Comments on the draft Charter. So far the only issue is that we should use the term "charset" rather than "character set", and as you will see I have already made this change in the other documents. (I assume we will also act on the charter at plenary.) 3. Future work for the CR group. Possibilities include: -extensions to Best Practices to improve regional coverage (e.g. what is a good set of characters--not just basic--for Korea, etc.) -identifying and naming fonts -and any others that are suggested... Best regards, Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From elliott.bradshaw at zoran.com Tue Aug 12 13:17:22 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR update and plans for NYC Message-ID: Hi CR folks, I have been remiss in posting updates from the Portland meeting, but will do so Real Soon Now. Between now and NYC I hope to confirm the charter, which needs one modification from the Last Call. There will be posted new versions of the two documents, the spec for RepertoireSupported and the best practices. I will ask for comments, and perhaps Last Call, on these prior to NYC. There has also been discussion of another project that would define more complete coverage for regional products. E.g. a "good" set of repertoires for mainland China. Whether this would be packaged as a spec or as Best Practices is TBD. So, in summary: Items prior to NYC: -formal vote on charter -revisions, discussion, and maybe last call on two existing documents Agenda for NYC: -items from last call of two existing documents (this should be the last f2f for these) -overview and brainstorming about "good" regional coverage -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 From elliott.bradshaw at zoran.com Fri Aug 15 15:00:48 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR Charter Message-ID: In June we completed Last Call for the CR working group charter. During the Last Call period, one issue was raised: 1. The term "character set" is too vague. We should use the more technically precise term "charset". I have posted a revised charter that addresses this issue. I adapted the section in the main Repertoire Supported document which defines charset and character repertoire, and used it here. This terminology was reviewed without comment at the Portland meeting. The result is posted at: ftp://ftp.pwg.org/pub/pwg/cr/charter/ch-cr10-20030813.html I therefore believe this charter is ready for a formal vote. I will wait a few days for anyone to voice an objection, then start a voting period next week. E. P.S. Minutes from Portland are posted at http://www.pwg.org/cr/CRMinutes-June-2003.html -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 From imcdonald at sharplabs.com Sun Aug 24 16:51:47 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> FW: News: CharMod interim publication; Unicode Tech Note #10, Ind ic Scripts Message-ID: <116DB56CD7DED511BC7800508B2CA537B00179@mailsrvnt02.enet.sharplabs.com> Hi folks, Please note the new working draft of W3C "Character Model" below. And (how timely, Elliot) the Unicode Technical Note (TN10) "Introduction to Indic Scripts" by the W3C's Richard Ishida. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Sunday, August 24, 2003 10:18 AM To: www-international@w3.org Subject: News: CharMod interim publication; Unicode Tech Note #10, Indic Scripts FYI # 24 Aug 2003 "Character Model for the World Wide Web 1.0" Interim Working Draft Published The Internationalization Working Group has released an interim Working Draft of the Character Model for the World Wide Web 1.0. The document addresses character encoding identification, early uniform normalization, string identity matching, string indexing, and URI conventions, building on the Universal Character Set defined by Unicode and ISO/IEC 10646. Read about the W3C Internationalization Activity. http://www.w3.org/TR/2003/WD-charmod-20030822/ # 15 Aug 2003 Unicode Technical Note #10, "An Introduction to Indic Scripts" Published A paper by Richard Ishida called "An Introduction to Indic Scripts" has been published as Unicode Technical Note #10. This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. http://www.unicode.org/notes/tn10/ ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://www.w3.org/International/geo/ See the W3C Internationalization FAQ page http://www.w3.org/International/questions.html From imcdonald at sharplabs.com Fri Sep 12 15:21:45 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> FW: New FAQ: Script direction and languages Message-ID: <116DB56CD7DED511BC7800508B2CA537B001AA@mailsrvnt02.enet.sharplabs.com> -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Friday, September 12, 2003 3:08 PM To: www-international@w3.org Subject: New FAQ: Script direction and languages The latest FAQ published by the GEO task force is: What directions are commonly localized languages written in? Find it at: http://www.w3.org/International/questions/qa-scripts.html You can find all the questions and answers, plus information about how to contribute, at http://www.w3.org/International/questions.html We hope you find this a useful resource. = ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://www.w3.org/International/geo/ See the W3C Internationalization FAQ page http://www.w3.org/International/questions.html From jim.bigelow at hp.com Thu Sep 18 20:01:21 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <020A3CF87FB5AC47AA67966B33845755057FB68B@xboi22.boise.itc.hp.com> Hello, I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print. This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3]. One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters. >From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself. My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it? Jim -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification From elliott.bradshaw at zoran.com Fri Sep 19 10:23:13 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization In-Reply-To: <020A3CF87FB5AC47AA67966B33845755057FB68B@xboi22.boise.itc.hp.com> Message-ID: What are the XHTML-Print operations that are affacted by normalization? This discussion is useful for string processing (match, substring, sort) but I don't see how that affects printing. One possible area is CSS class names; are they restricted to ASCII? Also, I don't see how a new report can change the definition of an existing spec (XHTML). Isn't this a separate set of rules that might be folded into future revisions? I would rather see a use-case that makes sense for XHTML-Print before adding this in. E. P.S. Does it have any effect on current CR documents? I don't think so. There is no discussion of combining in there at all. -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 "BIGELOW,JIM (HP-Boise,ex1) To: "'cr@pwg.org'" " cc: W3C Character Model and Early p.com> Uniform Normalization Sent by: owner-cr@pwg.o rg 09/18/2003 08:01 PM Hello, I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print. This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3]. One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters. >From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself. My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it? Jim -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification From imcdonald at sharplabs.com Fri Sep 19 13:45:25 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <116DB56CD7DED511BC7800508B2CA537B001B8@mailsrvnt02.enet.sharplabs.com> Hi, My two cents: (1) [answering Elliot] Unicode normalization has no impact at all on the CR specs - - they merely refer to character repertoires (often including both composed and uncomposed characters) which are defined (in _all_ cases) by some other standards body (Unicode, ISO, IANA, etc.). (2) [answering Jim] No - a printer should _never_ throw away any document data that happens not to be normalized (it is actually very difficult to determine if that data is already in Unicode NFC or NFKC, except by doing the whole normalization and then doing binary compare of the results with original). (3) [answering Jim] No - a printer should _never_ trust the sender/generator to have properly normalized Unicode data. (4) [my own comment] Early Uniform Normalization is important and useful for _very_ small pieces of data and _narrow_ fields of application (such as IETF's I18N Domain Names standards). The day will never come that receivers need not check for (or simply perform) normalization, if needed. Some rendering algorithms happen to require that Unicode data be pre-normalized, but that's an implementation nit. Cheers, - Ira McDonald High North Inc -----Original Message----- From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com] Sent: Friday, September 19, 2003 10:23 AM To: BIGELOW,JIM (HP-Boise,ex1) Cc: 'cr@pwg.org'; owner-cr@pwg.org Subject: Re: CR> W3C Character Model and Early Uniform Normalization What are the XHTML-Print operations that are affacted by normalization? This discussion is useful for string processing (match, substring, sort) but I don't see how that affects printing. One possible area is CSS class names; are they restricted to ASCII? Also, I don't see how a new report can change the definition of an existing spec (XHTML). Isn't this a separate set of rules that might be folded into future revisions? I would rather see a use-case that makes sense for XHTML-Print before adding this in. E. P.S. Does it have any effect on current CR documents? I don't think so. There is no discussion of combining in there at all. ---------------------------------------------------------------------------- ---- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 "BIGELOW,JIM (HP-Boise,ex1) To: "'cr@pwg.org'" " cc: W3C Character Model and Early p.com> Uniform Normalization Sent by: owner-cr@pwg.o rg 09/18/2003 08:01 PM Hello, I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print. This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3]. One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters. >From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself. My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it? Jim -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification From jim.bigelow at hp.com Mon Sep 22 18:37:33 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <020A3CF87FB5AC47AA67966B3384575505A0A44A@xboi22.boise.itc.hp.com> Elliott wrote: > What are the XHTML-Print operations that are affected by > normalization? This discussion is useful for string > processing (match, substring, sort) but I don't see how that > affects printing. One possible area is CSS class names; are > they restricted to ASCII? The CSS 2 specification for identifiers is in Section 4.1.3 [1] and states that CSS class names are not restricted to ASCII. So, if a class name is written with precomposed characters in one place and anyone of the other equivalent sequences in another place, then the two instances would only match if they were normalized, preferably to Normalized Form C. The same holds true for id attribute values. > > Also, I don't see how a new report can change the definition > of an existing spec (XHTML). Isn't this a separate set of > rules that might be folded into future revisions? > I think that the report [2] has been around for a while and I've just now become aware of it. I think it's an omission that the XHTML-Print spec doesn't reference [2] as a normative reference. This allows the situation where na?ve implementations fail in the situation noted above. This could be addressed by adding a normative reference. Jim [1] http://www.w3.org/TR/REC-CSS2/syndata.html#q4 [2] http://www.w3.org/TR/charmod/ From jim.bigelow at hp.com Mon Sep 22 18:51:44 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <020A3CF87FB5AC47AA67966B3384575505A0A46C@xboi22.boise.itc.hp.com> Ira wrote: > > (2) [answering Jim] > No - a printer should _never_ throw away any document data > that happens not to be normalized ... I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer Conformance sections that a printer may "flush or otherwise reject a non-conforming XHTML-Print document." This is the source of my worry that a printer could reject a document that is not normalized. > > (3) [answering Jim] > No - a printer should _never_ trust the sender/generator > to have properly normalized Unicode data. If a very low cost printer assumed that an XHTML-Print document's content is normalized and it is not, the very worse that could happen is that word breaks occur in the wrong place, e.g., between a letter and it's non-spacing mark, or class/id selectors don't match the value of the class/id attribute -- causing the misapplication of style sheet rules. I think the a printer should normalize and therefore correctly handle combining characters. I just wondering if other printer people think such a normalization should be mandated for all printers. Jim [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html {3] From elliott.bradshaw at zoran.com Tue Sep 23 11:16:52 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization In-Reply-To: <020A3CF87FB5AC47AA67966B3384575505A0A46C@xboi22.boise.itc.hp.com> Message-ID: I don't mind if we require that an XHTML-Print printer normalizes its input. We probably have to stipulate what this means for CR. If a printer advertises a repertoire that supports two combinable characters, is it implicitly saying that it also supports the combination of the two? I am inclined to put this point in Best Practices and leave it out of the normative spec. (I'll track this as a Last Call issue for CR.) -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 "BIGELOW,JIM (HP-Boise,ex1) To: "'cr@pwg.org'" " cc: W3C Character Model and Early p.com> Uniform Normalization Sent by: owner-cr@pwg.o rg 09/22/2003 06:51 PM Ira wrote: > > (2) [answering Jim] > No - a printer should _never_ throw away any document data > that happens not to be normalized ... I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer Conformance sections that a printer may "flush or otherwise reject a non-conforming XHTML-Print document." This is the source of my worry that a printer could reject a document that is not normalized. > > (3) [answering Jim] > No - a printer should _never_ trust the sender/generator > to have properly normalized Unicode data. If a very low cost printer assumed that an XHTML-Print document's content is normalized and it is not, the very worse that could happen is that word breaks occur in the wrong place, e.g., between a letter and it's non-spacing mark, or class/id selectors don't match the value of the class/id attribute -- causing the misapplication of style sheet rules. I think the a printer should normalize and therefore correctly handle combining characters. I just wondering if other printer people think such a normalization should be mandated for all printers. Jim [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html {3] From imcdonald at sharplabs.com Wed Sep 24 10:21:49 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <116DB56CD7DED511BC7800508B2CA537B001C3@mailsrvnt02.enet.sharplabs.com> Hi Jim, To reduce the implementation burden, I suggest that XHTML-Print state the a conforming Printer SHOULD normalize the document data to NFC (citing UAX-15 as the authoritative source). Since W3C Charmod is still a working draft, XHTML-Print should NOT have a Normative reference to W3C Charmod (which would prevent publication of XHTML-Print as PWG Candidate Standard). Because normalization is a fairly costly activity on large volumes of data (I wrote the normalization library for the forthcoming CUPS 1.2 release), I suggest that the XHTML-Print conformance be SHOULD rather than MUST. Cheers, - Ira McDonald High North Inc -----Original Message----- From: BIGELOW,JIM (HP-Boise,ex1) [mailto:jim.bigelow@hp.com] Sent: Monday, September 22, 2003 6:52 PM To: 'cr@pwg.org' Subject: RE: CR> W3C Character Model and Early Uniform Normalization Ira wrote: > > (2) [answering Jim] > No - a printer should _never_ throw away any document data > that happens not to be normalized ... I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer Conformance sections that a printer may "flush or otherwise reject a non-conforming XHTML-Print document." This is the source of my worry that a printer could reject a document that is not normalized. > > (3) [answering Jim] > No - a printer should _never_ trust the sender/generator > to have properly normalized Unicode data. If a very low cost printer assumed that an XHTML-Print document's content is normalized and it is not, the very worse that could happen is that word breaks occur in the wrong place, e.g., between a letter and it's non-spacing mark, or class/id selectors don't match the value of the class/id attribute -- causing the misapplication of style sheet rules. I think the a printer should normalize and therefore correctly handle combining characters. I just wondering if other printer people think such a normalization should be mandated for all printers. Jim [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html {3] From imcdonald at sharplabs.com Thu Sep 25 14:47:30 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> FW: New W3C FAQ: CSS character encoding declarations Message-ID: <116DB56CD7DED511BC7800508B2CA537B001CC@mailsrvnt02.enet.sharplabs.com> -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Thursday, September 25, 2003 4:59 AM To: www-international@w3.org Subject: New FAQ: CSS character encoding declarations The latest FAQ addressed by the GEO task force is: How do I declare the character encoding inside a CSS (Cascading Style Sheets) style sheet? Find our answer at: http://www.w3.org/International/questions/qa-css-charset.html You can find all the FAQs, plus information about how to contribute, at http://www.w3.org/International/questions.html ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://www.w3.org/International/geo/ See the W3C Internationalization FAQ page http://www.w3.org/International/questions.html From elliott.bradshaw at zoran.com Thu Oct 2 11:33:09 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR face-to-face next Wed. Message-ID: Does anyone desire to join this meeting by phone? At this point I haven't done anything about it. E. -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534 From ElliottBradshaw at oaktech.com Fri Jan 3 14:16:03 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: Call-in arrangements for the next CR conference call: Time: 3:00 PM Eastern time Wed. 1/8 Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 As our main topic I would like to go through the draft Implementor's Guide, which I have placed at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. As before, my biggest challenge is finding online, normative material for the details of the Asian character sets (except Korean, which is covered in an RFC). Ira and others have provided handy pointers to summaries by others, but I'm wondering where I find the horse's mouth. Talk to you next Wed. -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Fri Jan 3 15:10:27 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference [RFCs for Asian charsets] Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE6A@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Some more RFC references: RFC 1468 - ISO-2022-JP (Japanese) RFC 1554 - ISO-2022-JP-2 (Japanese) RFC 2237 - ISO-2022-JP-1 (Japanese) RFC 1557 - ISO-2022-KR (Korean) RFC 1922 - ISO-2022-CN and ISO-2022-CN-EXT (Chinese) Each of these refers in some detail to the underlying Japanese, Korean, or Chinese national standards that are placed in planes planes of these ISO-2022 encodings. Cheers, - Ira -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, January 03, 2003 1:16 PM To: cr@pwg.org Subject: CR> CR teleconference and Implementor's Guide Call-in arrangements for the next CR conference call: Time: 3:00 PM Eastern time Wed. 1/8 Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 As our main topic I would like to go through the draft Implementor's Guide, which I have placed at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. As before, my biggest challenge is finding online, normative material for the details of the Asian character sets (except Korean, which is covered in an RFC). Ira and others have provided handy pointers to summaries by others, but I'm wondering where I find the horse's mouth. Talk to you next Wed. -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From fujisawa.jun at canon.co.jp Mon Jan 6 05:43:59 2003 From: fujisawa.jun at canon.co.jp (Jun Fujisawa) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide In-Reply-To: References: Message-ID: Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Wed Jan 8 11:41:27 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:39 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Wed Jan 8 13:33:00 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: See Rod's notes for some ideas on terminology. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 01/08/2003 01:33 PM ----- "Acosta, Roderick" cc: 01/08/2003 01:08 PM Subject: RE: CR> CR teleconference and Implementor's Guide Elliott, Some suggestions from a colleague of mine. Character set: Unicode is the default character set for HTML and XHTML. The range of valid Unicode values ranges from hexadecimal 0 to 10FFFF (decimal 0 to 1,114,111). Any valid Unicode character is associated with a codepoint in the above specified range of scalar numbers. Unicode is an "ordered" character set because each character is represented by a unique scalar value. Transformations or Encodings: A Unicode scalar value can be expressed in a variety of digital forms, including UTF-8 and UTF-16. "UTF" stands for "Unicode Transformaton Format". UTF-8 and UTF-16 are often called "encodings" because they represent ("encode") the full range of scalar values. Unicode subset: What do we call it? The Unicode character set supports a large number of characters that are derived from other legacy character sets such as ISO 8859-x and JIS X 0208. With the exception of ISO 8859-1, all legacy characters must be mapped to their equivalent Unicode value through an algorirthmic and/or table-driven process. The ordering of characters in a legacy character set is not necessarily replicated in Unicode. What does one call a subset of Unicode values that represent a range of characters from a common, legacy character set? We would like to propose the term "character collection" because a. it does not imply any particular ordering b. it does represent a closed, enumerable set c. it is distinct from "character set" /Rod -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 9:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa (See attached file: att25f6a.dat) -------------- next part -------------- A non-text attachment was scrubbed... Name: att25f6a.dat Type: application/ms-tnef Size: 5022 bytes Desc: not available Url : http://www.pwg.org/archives/cr/attachments/20030108/d3768b90/att25f6a-0001.bin From imcdonald at sharplabs.com Wed Jan 8 17:39:16 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE77@mailsrvnt02.enet.sharplabs.com> Hi folks, Sorry I missed the telecon earlier today. I failed to note the earlier time (3pm EST rather than 5pm EST). I wrote the following definition (for CUPS documentation), drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries: Character Repertoire: (1) The complete set of characters defined in a given named character set, such as ISO 8859-1. (2) The subset of characters defined in a large character set, such as Unicode 3.2, that are needed for an exact mapping to a smaller character set, such as ISO 8859-1. For PWG CR, we could refine (2) above to fix Unicode 3.2 (or later) as the "large character set". Cheers, - Ira McDonald. -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 10:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Thu Jan 9 11:11:01 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: Ira, I think definition #2 covers exactly what we are trying to do. Is this form in prior use? -Bluetooth BPP: yes -Unicode: I couldn't get this meaning out of the Unicode glossary -Posix: ??? At the call yesterday there was some interest in the term "character collection" as an alternative to "repertoire". Have you encounted this? Group: I am going to use Ira's definitions in the next version of the Guide. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" , Jun Fujisawa 01/08/2003 cc: cr@pwg.org, owner-cr@pwg.org 05:39 PM Subject: RE: CR> CR teleconference and Implementor's Guide Hi folks, Sorry I missed the telecon earlier today. I failed to note the earlier time (3pm EST rather than 5pm EST). I wrote the following definition (for CUPS documentation), drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries: Character Repertoire: (1) The complete set of characters defined in a given named character set, such as ISO 8859-1. (2) The subset of characters defined in a large character set, such as Unicode 3.2, that are needed for an exact mapping to a smaller character set, such as ISO 8859-1. For PWG CR, we could refine (2) above to fix Unicode 3.2 (or later) as the "large character set". Cheers, - Ira McDonald. -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 10:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From imcdonald at sharplabs.com Thu Jan 9 12:38:33 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> CR teleconference and Implementor's Guide Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE7A@mailsrvnt02.enet.sharplabs.com> Hi, In some places POSIX uses the "collection of characters" phrasing. In others it uses (especially in the revised POSIX:2000 spec) the "subset of characters defined in a larger character set..." phrasing. I think it's important to ALSO list the classic (1) definition in our spec. It makes clear where definition (2) came from. The ISO 10646 folks have being developing named formal ISO Profiles (a kind of ISO derived standard) that define "character repertoires" that are subsets of ISO 10646/Unicode (not the subsets we want, by the way, but more generic ones like Western European coverage). Cheers, - Ira -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, January 09, 2003 10:11 AM To: McDonald, Ira Cc: cr@pwg.org Subject: RE: CR> CR teleconference and Implementor's Guide Ira, I think definition #2 covers exactly what we are trying to do. Is this form in prior use? -Bluetooth BPP: yes -Unicode: I couldn't get this meaning out of the Unicode glossary -Posix: ??? At the call yesterday there was some interest in the term "character collection" as an alternative to "repertoire". Have you encounted this? Group: I am going to use Ira's definitions in the next version of the Guide. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" , Jun Fujisawa 01/08/2003 cc: cr@pwg.org, owner-cr@pwg.org 05:39 PM Subject: RE: CR> CR teleconference and Implementor's Guide Hi folks, Sorry I missed the telecon earlier today. I failed to note the earlier time (3pm EST rather than 5pm EST). I wrote the following definition (for CUPS documentation), drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries: Character Repertoire: (1) The complete set of characters defined in a given named character set, such as ISO 8859-1. (2) The subset of characters defined in a large character set, such as Unicode 3.2, that are needed for an exact mapping to a smaller character set, such as ISO 8859-1. For PWG CR, we could refine (2) above to fix Unicode 3.2 (or later) as the "large character set". Cheers, - Ira McDonald. -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Wednesday, January 08, 2003 10:41 AM To: Jun Fujisawa Cc: cr@pwg.org; owner-cr@pwg.org Subject: Re: CR> CR teleconference and Implementor's Guide Hello Fujisawa-san, Thanks for the useful information. I think we can get a lot of what we need from the Japanese Profile document. I am not entirely satisfied by the term "repertoire", and would like to have some discussion in the group. We are looking for a term that means "named subset of Unicode characters, without regard to encoding." Bluetooth uses "repertoire" in this way. Some other ideas: -character complement -Unicode Subset -CCSS (Coded Character SubSet) I'd like proposals for the term, as well as how we will actually define it. With regard to Shift-JIS, I now understand that there is no universal mapping from it to Unicode. And, many Japanese web pages still use Shift-JIS. So, we may want to recommend that a Japanese-capable printer support Shift-JIS as well as UTF-8, and that a Japanese-capable client use Shift-JIS if it is available. Otherwise the client must map to Unicode, and deal with the ambiguities of the different available mappings. I wonder how strongly we should follow Microsoft's lead in this area... ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Jun Fujisawa cc: cr@pwg.org Sent by: Subject: Re: CR> CR teleconference and owner-cr@pwg.org Implementor's Guide 01/06/2003 05:43 AM Hello Elliott, At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote: >As our main topic I would like to go through the draft Implementor's Guide, >which I have placed at: >ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm. I would like to point out that the terms "repertoire" and "character set" as defined in Terminology section does not seem to be consistent with the usage in W3C Character Model. For example, the use of therm "character set" is discouraged in Section 3.6.2 of Character Model for the World Wide Web 1.0 - Character Model for the World Wide Web 1.0 >As before, my biggest challenge is finding online, normative material for >the details of the Asian character sets (except Korean, which is covered in >an RFC). Unfortunately, the only normative materials to the definitions of Japanese coded character sets (CCS) are Japanese national standards. - JIS X 0201 Japanese Industrial Standards Committee. 7-bit and 8-bit coded character sets for information interchange, JIS X 0201:1997, Japanese Standards Association, 1997. - JIS X 0208 Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded KANJI sets for information interchange, JIS X 0208:1997, Japanese Standards Association, 1997. - JIS X 0212 Japanese Industrial Standards Committee. Code of the supplementary Japanese graphic character set for information interchange, JIS X0212:1990, Japanese Standards Association, 1990. - JIS X 0221 Japanese Industrial Standards Committee. Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Also, I suggest to consult the following W3C Note for the detailed information on some Japanese character encoding schemes (CES) and their mappings to Unicode. - XML Japanese Profile -- Jun Fujisawa From ElliottBradshaw at oaktech.com Tue Jan 14 13:33:49 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Update Implementors Guide Message-ID: Following our discussions of last week, I have posted a new version at (same URL as before): ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm There are a lot of changes, but highlights include: -definitions for charset and character repertoire -more about Microsoft -more about Asian character specs -new section "Determining a Printer's Supported Repertoires" which gives some assumptions a client can make Comments (on this list) please. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Wed Jan 15 12:47:27 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> Telecon on Jan 15 - Update Implementors Guide Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE86@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Is there a conference call today in about two hours? Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Tuesday, January 14, 2003 12:34 PM To: cr@pwg.org Subject: CR> Jan 15 - Update Implementors Guide Following our discussions of last week, I have posted a new version at (same URL as before): ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm There are a lot of changes, but highlights include: -definitions for charset and character repertoire -more about Microsoft -more about Asian character specs -new section "Determining a Printer's Supported Repertoires" which gives some assumptions a client can make Comments (on this list) please. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Wed Jan 15 14:56:41 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> NO Telecon on Jan 15 Message-ID: <116DB56CD7DED511BC7800508B2CA53735CE87@mailsrvnt02.enet.sharplabs.com> Hi folks, I just spoke to Elliott Bradshaw on the phone. His office is without power (no email, no Internet, no voice mail). There is NOT a telecon for PWG Character Repertoires WG today. Please review the latest version of CR Implementors Guide and send email comments before next Tuesday (face-to-face in Maui - you lucky devils...). Cheers, - Ira McDonald High North Inc -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Wednesday, January 15, 2003 11:47 AM To: 'ElliottBradshaw@oaktech.com'; cr@pwg.org Subject: RE: CR> Telecon on Jan 15 - Update Implementors Guide Hi Elliot, Is there a conference call today in about two hours? Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Tuesday, January 14, 2003 12:34 PM To: cr@pwg.org Subject: CR> Jan 15 - Update Implementors Guide Following our discussions of last week, I have posted a new version at (same URL as before): ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm There are a lot of changes, but highlights include: -definitions for charset and character repertoire -more about Microsoft -more about Asian character specs -new section "Determining a Printer's Supported Repertoires" which gives some assumptions a client can make Comments (on this list) please. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From harryl at us.ibm.com Mon Feb 10 18:15:44 2003 From: harryl at us.ibm.com (Harry Lewis) Date: Wed May 6 13:53:40 2009 Subject: CR> Possible change in D.C. Schedule Message-ID: We are currently scheduled for Tuesday, parallel with FSG. There may be a possibility to move our meeting to Monday. How would people on CR feel about this? Have folks already made travel reservations? ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.pwg.org/archives/cr/attachments/20030210/88701121/attachment-0001.html From Rod.Acosta at AgfaMonotype.com Mon Feb 10 18:16:10 2003 From: Rod.Acosta at AgfaMonotype.com (Acosta, Roderick) Date: Wed May 6 13:53:40 2009 Subject: CR> Possible change in D.C. Schedule Message-ID: Fine with me (no reservations yet). /Rod -----Original Message----- From: Harry Lewis [mailto:harryl@us.ibm.com] Sent: Monday, February 10, 2003 4:16 PM To: cr@pwg.org Subject: CR> Possible change in D.C. Schedule We are currently scheduled for Tuesday, parallel with FSG. There may be a possibility to move our meeting to Monday. How would people on CR feel about this? Have folks already made travel reservations? ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.pwg.org/archives/cr/attachments/20030210/f2b677cc/attachment-0001.html From ElliottBradshaw at oaktech.com Tue Feb 11 08:36:21 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Possible change in D.C. Schedule Message-ID: It would be OK with me...I haven't made arrangements yet. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 Harry Lewis cc: Sent by: Subject: CR> Possible change in D.C. Schedule owner-cr@pwg.o rg 02/10/2003 06:15 PM We are currently scheduled for Tuesday, parallel with FSG. There may be a possibility to move our meeting to Monday. How would people on CR feel about this? Have folks already made travel reservations? ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- From ElliottBradshaw at oaktech.com Thu Feb 27 11:30:33 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Minutes, conference call Message-ID: Minutes from the January face-to-face for Character Repertoires are posted at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRMinutes-January-2003.html My next step is to create a document with a short list of names for preferred character encodings and repertoires.? This will be reviewed and discussed, and ultimately published as a PWG document, similar to the media sizes list, and is meant to be suitable for reference from the Semantic Model. I would like to schedule a conference call to review that document (which I will publish beforehand). How do people feel about 4:00 EST, on Wed. March 12? -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Thu Feb 27 16:25:36 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> Minutes, conference call 4pm EST Wed March 12? Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF0A@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Thanks - good minutes. Yes - I can make a conference at 4pm EST Wed March 12 -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Thursday, February 27, 2003 10:31 AM To: cr@pwg.org Subject: CR> Minutes, conference call Minutes from the January face-to-face for Character Repertoires are posted at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRMinutes-January-2003.html My next step is to create a document with a short list of names for preferred character encodings and repertoires.? This will be reviewed and discussed, and ultimately published as a PWG document, similar to the media sizes list, and is meant to be suitable for reference from the Semantic Model. I would like to schedule a conference call to review that document (which I will publish beforehand). How do people feel about 4:00 EST, on Wed. March 12? -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Mon Mar 3 11:42:39 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> FW: GB 18030 Information Required Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF14@mailsrvnt02.enet.sharplabs.com> Hi folks, Elliot - the first two white papers (links below) look highly useful. Markus Scherer is a Unicode and charsets heavy at IBM. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Markus Scherer [mailto:markus.scherer@jtcsv.com] Sent: Monday, March 03, 2003 10:26 AM To: vinay.aggarwal@rebus.co.in; charsets Subject: Re: GB 18030 Information Required vinay.aggarwal@rebus.co.in wrote: > Could you please let me know if following supports the GB18030? > - Any web based application > - Browser (Internet Explorer/ Netsacpe) based application Yes and no. Generally, web-based applications and browsers and related protocols do support GB 18030 and Unicode and various other charsets. Specifically, you need to read about - charsets, e.g., http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html and about the particular applications (and versions of them) that you intend to use. markus From ElliottBradshaw at oaktech.com Mon Mar 3 12:32:07 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> FW: GB 18030 Information Required Message-ID: Interesting. If I read this correctly, then 18030 is a mapping to ALL of Unicode. This would make it an encoding, but not a subset. If that's right, then we would treat it as a kind of charset, but not as a repertoire. Your thoughts? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" plabs.com> cc: Sent by: Subject: CR> FW: GB 18030 Information Required owner-cr@pwg.or g 03/03/2003 11:42 AM Hi folks, Elliot - the first two white papers (links below) look highly useful. Markus Scherer is a Unicode and charsets heavy at IBM. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Markus Scherer [mailto:markus.scherer@jtcsv.com] Sent: Monday, March 03, 2003 10:26 AM To: vinay.aggarwal@rebus.co.in; charsets Subject: Re: GB 18030 Information Required vinay.aggarwal@rebus.co.in wrote: > Could you please let me know if following supports the GB18030? > - Any web based application > - Browser (Internet Explorer/ Netsacpe) based application Yes and no. Generally, web-based applications and browsers and related protocols do support GB 18030 and Unicode and various other charsets. Specifically, you need to read about - charsets, e.g., http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html and about the particular applications (and versions of them) that you intend to use. markus From ElliottBradshaw at oaktech.com Mon Mar 3 15:11:38 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Preferred Character Repertoires in Printers Message-ID: I have posted a draft document at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030228.html In it I take a shot at how to organize the use of character repertoires in printing from small devices. The meat of this document is pretty short; I'll add the potatoes when/if there is consensus that this is the right way to go. Following the discussion in Maui I replace ISO-8859 references with similar code charts from Unicode. Our next conference call is comfirmed at: 4pm EST Wed March 12 In this call I would like to discuss comments on this document. Also please send points for discussion to the reflector beforehand. -Elliott Pete Z.: what other things should we consider so that this can be referenced from SM? ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Mon Mar 3 16:38:58 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> FW: GB 18030 Information Required Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF17@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Yes - GB18030 is a mapping to EVERY codepoint in Unicode (not just the assigned ones, but all 1.1 million possible Unicode codepoints). But it's a multi-byte, variable-length (one to four bytes) set of codepoints in GB18030. As Markus Scherer says it is best thought of as a Chinese-market UTF (Unicode Transformation Format), like UTF-8, UTF-16, and UTF-32. I agree with you therefore, that PWG CR should view GB18030 as a valid 'charset' (which can be tagged) but NOT as a unique 'repertoire' (because it's a different encoding of Unicode). Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Monday, March 03, 2003 11:32 AM To: McDonald, Ira Cc: 'cr@pwg.org'; owner-cr@pwg.org Subject: Re: CR> FW: GB 18030 Information Required Interesting. If I read this correctly, then 18030 is a mapping to ALL of Unicode. This would make it an encoding, but not a subset. If that's right, then we would treat it as a kind of charset, but not as a repertoire. Your thoughts? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 "McDonald, Ira" plabs.com> cc: Sent by: Subject: CR> FW: GB 18030 Information Required owner-cr@pwg.or g 03/03/2003 11:42 AM Hi folks, Elliot - the first two white papers (links below) look highly useful. Markus Scherer is a Unicode and charsets heavy at IBM. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Markus Scherer [mailto:markus.scherer@jtcsv.com] Sent: Monday, March 03, 2003 10:26 AM To: vinay.aggarwal@rebus.co.in; charsets Subject: Re: GB 18030 Information Required vinay.aggarwal@rebus.co.in wrote: > Could you please let me know if following supports the GB18030? > - Any web based application > - Browser (Internet Explorer/ Netsacpe) based application Yes and no. Generally, web-based applications and browsers and related protocols do support GB 18030 and Unicode and various other charsets. Specifically, you need to read about - charsets, e.g., http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html and about the particular applications (and versions of them) that you intend to use. markus From ElliottBradshaw at oaktech.com Mon Mar 10 16:52:49 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern Message-ID: Whether or not you can attend the call, please have a look at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030228.html and send any comments to the reflector. Our next CR conference call is this Wednesday at 4:00 Eastern. Call-in info: Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 Our main agenda item is to review the above-referenced document, so that I can edit it prior to our face-to-face in DC. -Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From jim.bigelow at hp.com Wed Mar 12 14:14:09 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:40 2009 Subject: CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern Message-ID: <25C4C6009B5BD5118FF30003470BF7F509FDBF3E@xboi04.boi.hp.com> Hello, I've reviewed the draft and support the rules 1 - 3 for conforming printers. I'd also wish to point out that I think that the character entities of XHTML-Print extend the languages supported beyond those in Unicode Basic Latin. However, I've not determined what those languages are beyond knowing they are not Russian, Greek, Hewbrew, Arabic, Thai, or those of the PRC, Japan, Korea or Taiwan. Jim Bigelow Hewlett-Packard 208-396-2068 jim.bigelow@hp.com > -----Original Message----- > From: ElliottBradshaw@oaktech.com > [mailto:ElliottBradshaw@oaktech.com] > Sent: Monday, March 10, 2003 1:53 PM > To: cr@pwg.org > Subject: CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern > > > Whether or not you can attend the call, please have a look at: > > > ftp://ftp.pwg.org/pub/pwg/Character-> Repertoires/wd-pcr10-20030228.html > > and send any comments to the reflector. > > > Our next CR conference call is this Wednesday at 4:00 > Eastern. Call-in > info: > > Dial in #: 888 205-5513 or 719 955-0562 > Participant passcode: 176310 > > > Our main agenda item is to review the above-referenced > document, so that I can edit it prior to our face-to-face in DC. > > -Elliott > > > ------------------------------------------ > Elliott Bradshaw > Director, Software Engineering > Oak Technology Imaging Group > 781 638-7534 > > From ElliottBradshaw at oaktech.com Thu Mar 13 15:41:27 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard Message-ID: Harry, Apropos of this, I wanted to let you know the latest ideas for Character Repertoires. As decided at Maui, we plan to create a standards track document that can be referenced by the semantic model. This will describe a SM element that is used to advertise the repertoires supported by a device. We will, at some future point, want to assign a PWG number to this. I will do my best to follow the existing process, then cut over to the new one when it is official. One problem is that we don't have a formal chartered CR group. Since this standard may be our entire work, I don't know that we need to go through chartering. Options are: -do a CR charter -create this document under some other group -some sort of "individual submission" scheme Should we discuss this at the plenary? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 03/13/2003 03:36 PM ----- "Hastings, Tom N" xerox.com> cc: pwg@pwg.org Sent by: Subject: PWG> PWG IEEE-ISTO number for Proposed owner-pwg@pwg.org XHTML/Print standard 03/13/2003 03:22 PM Harry, Per the discussion today at the SM telecon on PWG process about standards numbers and what to do about allocating a PWG number for the Proposed PWG XHTML/Print standard as requested by Don for the W3C. In order to give Don a PWG number for the XHTML/Print Proposed PWG Standard, the next series of numbers not yet used is 5102.n. Currently, Proposed PWG standards have the following numbers: 5100.1, 5100.2, 5100.3, 5100.4 ... for IPP 5101.1 for the Media Standardized Names So how about 5102.1 for XHTML/Print. If there are several documents, 5102.1 and 5102.2 ISSUE: How to number future standards? We can decide later how to allocate numbers for: PWG Semantic Model Print Services Interface IPPFAX PDF/is etc. Is the 5102 series for document formats, so that PDF/is would go in that series? Should IPPFAX go in its own series, or should it be in the IPP 5100.n series? Should PWG Semantic Model be in its own series? Should PSI be in its own series? Or is there some common theme that would help put some of these in the same series. ISSUE: Separate isssue is what happens when the Proposed/Candidate Standard reaches Standard? Does it get a new number or use the same number? If a new number could it be some algorithm from its original number, such as adding 50. So 5150.2 would be the Standard version of Proposed standard 5100.2. Tom From harryl at us.ibm.com Thu Mar 13 16:50:40 2003 From: harryl at us.ibm.com (Harry Lewis) Date: Wed May 6 13:53:40 2009 Subject: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard In-Reply-To: Message-ID: Yes, I will (obviously) need to carve a large time slot in the plenary for said process discussions! ---------------------------------------------- Harry Lewis IBM Printing Systems ---------------------------------------------- ElliottBradshaw@oaktech.com Sent by: owner-cr@pwg.org 03/13/2003 01:41 PM To: Harry Lewis/Boulder/IBM@IBMUS cc: "Hastings, Tom N" , pzehler@crt.xerox.com, cr@pwg.org Subject: CR> PWG> PWG IEEE-ISTO number for Proposed XHTML/Print standard Harry, Apropos of this, I wanted to let you know the latest ideas for Character Repertoires. As decided at Maui, we plan to create a standards track document that can be referenced by the semantic model. This will describe a SM element that is used to advertise the repertoires supported by a device. We will, at some future point, want to assign a PWG number to this. I will do my best to follow the existing process, then cut over to the new one when it is official. One problem is that we don't have a formal chartered CR group. Since this standard may be our entire work, I don't know that we need to go through chartering. Options are: -do a CR charter -create this document under some other group -some sort of "individual submission" scheme Should we discuss this at the plenary? E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 03/13/2003 03:36 PM ----- "Hastings, Tom N" xerox.com> cc: pwg@pwg.org Sent by: Subject: PWG> PWG IEEE-ISTO number for Proposed owner-pwg@pwg.org XHTML/Print standard 03/13/2003 03:22 PM Harry, Per the discussion today at the SM telecon on PWG process about standards numbers and what to do about allocating a PWG number for the Proposed PWG XHTML/Print standard as requested by Don for the W3C. In order to give Don a PWG number for the XHTML/Print Proposed PWG Standard, the next series of numbers not yet used is 5102.n. Currently, Proposed PWG standards have the following numbers: 5100.1, 5100.2, 5100.3, 5100.4 ... for IPP 5101.1 for the Media Standardized Names So how about 5102.1 for XHTML/Print. If there are several documents, 5102.1 and 5102.2 ISSUE: How to number future standards? We can decide later how to allocate numbers for: PWG Semantic Model Print Services Interface IPPFAX PDF/is etc. Is the 5102 series for document formats, so that PDF/is would go in that series? Should IPPFAX go in its own series, or should it be in the IPP 5100.n series? Should PWG Semantic Model be in its own series? Should PSI be in its own series? Or is there some common theme that would help put some of these in the same series. ISSUE: Separate isssue is what happens when the Proposed/Candidate Standard reaches Standard? Does it get a new number or use the same number? If a new number could it be some algorithm from its original number, such as adding 50. So 5150.2 would be the Standard version of Proposed standard 5100.2. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.pwg.org/archives/cr/attachments/20030313/1d0ce8d1/attachment-0001.html From imcdonald at sharplabs.com Sat Mar 22 18:50:10 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> RE: Value matching in CR Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF3F@mailsrvnt02.enet.sharplabs.com> Hi Elliot, All existing UNIX implementations of POSIX locales do locale name matching (language/charset concatenations) based on the rules at (1) below. But POSIX itself does not formalize this matching rule (anywhere I've been able to find so far). (1) Only for purposes of comparing two character repertoire names, Printers (or Clients) MUST: (a) convert all letters to lowercase; (b) remove all hyphens, underscores, and periods; and (c) truncate semi-colons (year of standard version separators) and any trailing date info Although the character set with the common alias "Latin 1" has been registered with a 'Name:' of "ISO_8859-1:1987" in the IANA Charset Registry, it is also VERY commonly referred to by existing software as "iso8859-1" or "iso-8859-1" or "iso_8859.1" (notice the typical misuse of periods and inconsistent presence of hyphen after "iso"). It is highly desirable that IPP/PSI Printers/Clients behave like Web search engines and accept all approximate matches as equal. (2) For purposes of displaying supported character repertoires in the future "repertoire-supported" Printer object attribute, Printers MUST: (a) use a 'namespace' prefix from the PWG CR standard (such as "unihan") in all lowercase, followed by a hyphen; (b) use the best practice name of the base charset - for the "iana" prefix, this MUST be the registered 'Name:' value (complete with the year of standard suffix after a colon) and MUST NOT be any registered 'Alias:' value. However, this value MUST be normalized to lowercase, consistent with the existing 'charset-supported' Printer attribute semantics. And any imbedded underscores MUST be changed to hyphens for consistency. I'd like to say it's OK to retain the colon/date info for the comparisons, but it's really not safe, practically speaking. Note that the existing "charset-supported" attribute says that Printers MUST use the 'Name:' value and MUST NOT use any of the 'Alias:' values from the IANA Charset Registry. An interesting sidelight: The Printer MIB (RFC 1759) uses the enum tags that are 'Alias:' values beginning with "cs" (and containing NO punctuation characters at all, as recommended by SMIv2 for MIBs). When the Printer MIB is "visible" through the future PWG WBMM interface (and the new Printer Device in the PWG Semantic Model), we'll be faced with another interesting name collision. Sigh... Cheers, - Ira McDonald High North Inc -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, March 21, 2003 11:50 AM To: McDonald, Ira Subject: Value matching in CR Hi Ira, I've been fiddling with the rules for matching CR values...in the last version I said that hyphens and underscores would be dropped before comparison. This may be a bit drastic...what if we say that a hyphen matches an underscore? Also, I think you said there was some reference would could use on the subject. True? Thanks, E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Mon Mar 24 16:38:05 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> New CR document: Standard for Character Repertoire Interoperabiliy Message-ID: Folks, I have placed an updated version of the Chracter Repertoires document at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html This is a standards track document intended to serve as a reference for the Semantic Model. In addition to Yet Another Name, hightlights of recent changes include: -Changed the title to remove "Preferred" -Marked some sections as Informative -Formatting cleanup; addition of copyright notice, acknowledgements, etc. -Clarification in the Abstract and Introduction of goals and non-goals -More information about how this document relates to the Semantic Model -Changed the details of syntax for repertoire names -More information about rules for matching repertoire names -Clarified the wording regarding font sensitivity -Confirmed use of Unicode code charts for basic non-Asian repertoires -Changed from the notion of "Preferred Repertoire" to "Basic Repertoire"; this emphasizes that the printer is free to advertise additional repertoires -Included Latin-1 Supplement and Latin Extended-A as Basic Repertoires -Added requirement to support the euro character -References Open issues (that I know of) are highlighted in the document. Based on recent meetings I think we are approaching consensus on a number of issues. Some specific things I want to work on next week: -making this suitable for use by SM -applying the appropriate PWG process: name, number, approval, etc. Depending on how our discussion goes we may be able to move to Last Call in the fairly near future. Please send comments ASAP, esp. if you will not be in DC. Thanks, Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Fri Mar 28 08:46:37 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Character Repertoire conference call in lieu of face-to-face Message-ID: Harry and ISTO have set up phone bridges for next week. The CR session will be Monday (3/31) at 12:30-2:00 Pacific, aka 3:30-5:00 Eastern. To take part: Dial-In #: +1 719-457-0335 Participant Password: 400908# Our main agenda is to turn pages and comment on: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html Pete Zehler has added a reference to this in the Semantic Model. His comment: The latest version of the SM now includes Character Repertoire. I have posted a preliminary version linked to the SM web site (or directly at ftp://ftp.pwg.org/pub/pwg/Semantic-Model/PWG-Semantic-Model-Latest.pdf) Take a look at three locations in the document and let me know if it's OK. The locations are Figure 5, Table 6 and the reference in section 11. (All sorted alphabetically) I know the reference will need to be updated as your document progresses. Near the end of Monday's meeting I plan to ask the group what we need to address to prepare for Last Call on this. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Mon Mar 31 15:30:31 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> RE: PWG> Character Repertoire conference call in lieu of face-to- face Message-ID: In case you didn't see this. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 ----- Forwarded by Elliott Bradshaw/oaktech/us on 03/31/2003 03:31 PM ----- "BERKEMA,ALAN C To: "'ElliottBradshaw@oaktech.com'" (HP-Roseville, , cr@pwg.org ex1)" cc: pwg@pwg.org Character Repertoire conference hp.com> call in lieu of face-to- face 03/31/2003 11:01 AM Alan Berkema, You have successfully scheduled your meeting using Webex. ------------------------- TO START THIS MEETING ------------------------- On 3/31/2003, shortly before 12:30PM (GMT -08:00) Pacific Time, USA & Canada, click this URL: https://hp.webex.com/webex/e.php?AT=MO On the My Meetings page, click Start Now for this meeting. ------------------------- FIRST TIME USERS ------------------------- For fully interactive meetings, including the ability to present your documents and applications, a one-time setup takes less than 10 minutes. Click this URL to set up now: https://hp.webex.com/join/ Then click New User. ------------------------- MEETING SUMMARY ------------------------- Name: CR Date: 3/31/2003 Time: 12:30PM, (GMT -08:00) Pacific Time, USA & Canada Meeting Number: 28174637 Meeting Password: mynewcr Teleconference: None. Agenda: Host Key: 348053 used to re-assign host privilege. Host: Alan Berkema 1(916)7855605 mailto:alan_berkema@hp.com http://www.webex.com We've got to start meeting like this(TM) -----Original Message----- From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com] Sent: Friday, March 28, 2003 5:47 AM To: cr@pwg.org Cc: pwg@pwg.org Subject: PWG> Character Repertoire conference call in lieu of face-to-face Harry and ISTO have set up phone bridges for next week. The CR session will be Monday (3/31) at 12:30-2:00 Pacific, aka 3:30-5:00 Eastern. To take part: Dial-In #: +1 719-457-0335 Participant Password: 400908# Our main agenda is to turn pages and comment on: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html Pete Zehler has added a reference to this in the Semantic Model. His comment: The latest version of the SM now includes Character Repertoire. I have posted a preliminary version linked to the SM web site (or directly at ftp://ftp.pwg.org/pub/pwg/Semantic-Model/PWG-Semantic-Model-Latest.pdf) Take a look at three locations in the document and let me know if it's OK. The locations are Figure 5, Table 6 and the reference in section 11. (All sorted alphabetically) I know the reference will need to be updated as your document progresses. Near the end of Monday's meeting I plan to ask the group what we need to address to prepare for Last Call on this. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Thu Apr 3 10:23:51 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Minutes, next teleconf. Message-ID: Minutes for this week's meeting are posted at: http://www.pwg.org/cr/CRMinutes-March-2003.html Comments on the reflector are welcome. I would like to schedule our next teleconference. How do people feel about: Wed. 4/9 3:00 Eastern or Wed. 4/16 3:00 Eastern Our main discussion topic is the idea of moving some of the material from the normative spec into a Best Practices. Current discussion on this is summarized in the minutes. Lemme know, E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Mon Apr 14 17:47:59 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> Available character repertoires/fonts on various OS Message-ID: <116DB56CD7DED511BC7800508B2CA53735CF7C@mailsrvnt02.enet.sharplabs.com> Hi folks, Richard Ishida (chair of W3C Internationalization GEO project) just asked a question about what fonts are usually installed on various operating system platforms (Windows/XP, Linux, etc.) and therefore available for use in HTML documents (on the W3C Internationalization mailing list). The best response so far is to look at Markus Kuhn's info at: http://www.cl.cam.ac.uk/~mgk25/unicode.html#fonts This discussion will probably be highly relevant for the PWG's Character Repertoires WG and for FSG's Driver and Renderer WG. From ElliottBradshaw at oaktech.com Thu May 1 16:18:44 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> Next teleconference Message-ID: I plan to hold the next CR teleconference on Wed. May 7 at 3 EDT / 12 PDT. (Limited to 1 hour) Our agenda is to discuss splitting out a Best Practices from the spec (as described in the posted minutes), and to work on a Charter for the group. If anyone wishing to attend has a conflict please let me know ASAP. E. ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From ElliottBradshaw at oaktech.com Mon May 5 14:34:59 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR conference call info. Message-ID: The next CR call is this Wednesday. Time: 3:00 EDT / 12:00 PDT Wed. 5/7 Dial in #: 888 205-5513 or 719 955-0562 Participant passcode: 176310 Agenda: -Best Practices, as described in the March minutes -New draft Charter for the group Both of these documents are posted at www.pwg.org/cr/index.htm. See you then, Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From imcdonald at sharplabs.com Sun May 18 13:31:24 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> Draft of IPP "repertoire-supported" Printer attribute Message-ID: <116DB56CD7DED511BC7800508B2CA53735CFE1@mailsrvnt02.enet.sharplabs.com> [With apologies for cross-posting to IPP and Character Repertoires mailing lists.] Background - the IEEE/ISTO PWG Character Repertoires standard (a standard for NAMES, not a standard requiring support of particular character repertoires) is nearly complete and is expected to be in PWG 'last call' during the June 2003 face-to-face PWG meeting. The latest CR working draft is at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html ------------------------------------------------------------------------ Hi folks, Sunday (18 May 2003) Below is a draft version of the IPP "repertoire-supported" attribute, for inclusion in Appendix B 'Bindings to IPP' of the next working draft of the PWG Character Repertoires standard. First, some background. When I started to write up this attribute, I realized that our (currently proposed) syntax for CR labels now uses characters that are not allowed in the IPP "keyword" datatype. We _could_ add a new datatype (similar to "charset") to IPP called "repertoire". Tom Hastings has convinced me that a new "repertoire" datatype is a _very_ bad idea. Most importantly, it would break all existing IPP parsers. Instead, Tom and I agree that we should alter our CR labels to achieve strict conformance to the IPP "keyword" syntax. Then IANA can register our small set of well-known CR/1.0 labels in the IANA IPP registry, along with the new IPP "repertoire-supported" attribute itself. Cheers, - Ira McDonald High North Inc ISSUE: The Unihan names (based on the source legacy CJK charset) are _not_ disjoint (i.e., they DO overlap). Should we abandon their use in favor of IANA Charset Registry names. What value do these Unihan names add? (hint - read the attribute description below before commenting) Describing the Unicode HAN character assignments based on Unicode code chart titles (from http://www.unicode.org/charts/) _does_ provide unique non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is the title for the Unicode character block starting at 'U+2E80'). ------------------------------------------------------------------------ repertoire-supported (1setOf (keyword | name)) This REQUIRED IPP Printer Description attribute identifies some or all of the character repertoires that the IPP Printer object and contained IPP Job objects support for rendering of document data content. At least the value 'unicode_basic-latin' MUST always be present, since conforming IPP Printers MUST support at least the character repertoire defined in the Unicode/4.0 'Basic Latin' code chart (and character block). A character repertoire is defined as a named subset of the characters defined in a given character set standard (e.g., Unicode/4.0) that are supported for output rendering of document data. The character set of the document data (e.g., the value of "document-charset" in the the IPP Document object) constrains the relevant character repertoires (e.g., since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that character _cannot_ be represented in the ISO 8859-1 character set). Character repertoires of legacy character sets (e.g., ISO 8859-1 and ISO 8859-2) often overlap. However, character repertoires identified by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are disjoint). Therefore, a conforming IPP Printer SHOULD advertise "repertoire-supported" values based on the Unicode/4.0 code chart titles, to avoid ambiguity. The ABNF [RFC2234] for legal values of "repertoire-supported" is: repertoire = rep-prefix "_" rep-name rep-prefix = "unicode" / ; from Code Chart titles ; of Unicode/4.0 char database "unihan" / ; from Code Chart titles of ; of Unicode/4.0 Unihan database "iana" / ; from Name or Alias fields in ; IANA Charset Registry "vendor" ; from vendor-specific ; repertoire names rep-name = rep-alpha *(rep-char) rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars rep-alpha = %61-7A ; lowercase a-z rep-digit = %30-39 ; decimal 0-9 Mapping Rule 1: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them to their corresponding lowercase alpha characters. Mapping Rule 2: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets)] contains any other non-keyword characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them (including space) to a hyphen "-" character. From hastings at cp10.es.xerox.com Mon May 19 15:28:41 2003 From: hastings at cp10.es.xerox.com (Hastings, Tom N) Date: Wed May 6 13:53:40 2009 Subject: CR> Draft of IPP "repertoire-supported" Printer attribute Message-ID: Minor comment: Why allow the "_" in the rep-char (repertoire character names), since space characters are to be mapped to "-", not "_"? The advantage of not allowing "_" is that a future field could be added using "_" as a field separator, but not if "_" could be in rep-char. rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars Tom -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Sunday, May 18, 2003 10:31 To: 'cr@pwg.org'; 'ipp@pwg.org' Subject: CR> Draft of IPP "repertoire-supported" Printer attribute [With apologies for cross-posting to IPP and Character Repertoires mailing lists.] Background - the IEEE/ISTO PWG Character Repertoires standard (a standard for NAMES, not a standard requiring support of particular character repertoires) is nearly complete and is expected to be in PWG 'last call' during the June 2003 face-to-face PWG meeting. The latest CR working draft is at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html ------------------------------------------------------------------------ Hi folks, Sunday (18 May 2003) Below is a draft version of the IPP "repertoire-supported" attribute, for inclusion in Appendix B 'Bindings to IPP' of the next working draft of the PWG Character Repertoires standard. First, some background. When I started to write up this attribute, I realized that our (currently proposed) syntax for CR labels now uses characters that are not allowed in the IPP "keyword" datatype. We _could_ add a new datatype (similar to "charset") to IPP called "repertoire". Tom Hastings has convinced me that a new "repertoire" datatype is a _very_ bad idea. Most importantly, it would break all existing IPP parsers. Instead, Tom and I agree that we should alter our CR labels to achieve strict conformance to the IPP "keyword" syntax. Then IANA can register our small set of well-known CR/1.0 labels in the IANA IPP registry, along with the new IPP "repertoire-supported" attribute itself. Cheers, - Ira McDonald High North Inc ISSUE: The Unihan names (based on the source legacy CJK charset) are _not_ disjoint (i.e., they DO overlap). Should we abandon their use in favor of IANA Charset Registry names. What value do these Unihan names add? (hint - read the attribute description below before commenting) Describing the Unicode HAN character assignments based on Unicode code chart titles (from http://www.unicode.org/charts/) _does_ provide unique non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is the title for the Unicode character block starting at 'U+2E80'). ------------------------------------------------------------------------ repertoire-supported (1setOf (keyword | name)) This REQUIRED IPP Printer Description attribute identifies some or all of the character repertoires that the IPP Printer object and contained IPP Job objects support for rendering of document data content. At least the value 'unicode_basic-latin' MUST always be present, since conforming IPP Printers MUST support at least the character repertoire defined in the Unicode/4.0 'Basic Latin' code chart (and character block). A character repertoire is defined as a named subset of the characters defined in a given character set standard (e.g., Unicode/4.0) that are supported for output rendering of document data. The character set of the document data (e.g., the value of "document-charset" in the the IPP Document object) constrains the relevant character repertoires (e.g., since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that character _cannot_ be represented in the ISO 8859-1 character set). Character repertoires of legacy character sets (e.g., ISO 8859-1 and ISO 8859-2) often overlap. However, character repertoires identified by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are disjoint). Therefore, a conforming IPP Printer SHOULD advertise "repertoire-supported" values based on the Unicode/4.0 code chart titles, to avoid ambiguity. The ABNF [RFC2234] for legal values of "repertoire-supported" is: repertoire = rep-prefix "_" rep-name rep-prefix = "unicode" / ; from Code Chart titles ; of Unicode/4.0 char database "unihan" / ; from Code Chart titles of ; of Unicode/4.0 Unihan database "iana" / ; from Name or Alias fields in ; IANA Charset Registry "vendor" ; from vendor-specific ; repertoire names rep-name = rep-alpha *(rep-char) rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars rep-alpha = %61-7A ; lowercase a-z rep-digit = %30-39 ; decimal 0-9 Mapping Rule 1: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them to their corresponding lowercase alpha characters. Mapping Rule 2: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets)] contains any other non-keyword characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them (including space) to a hyphen "-" character. From imcdonald at sharplabs.com Mon May 19 15:38:55 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> Draft of IPP "repertoire-supported" Printer attribute Message-ID: <116DB56CD7DED511BC7800508B2CA53735CFE9@mailsrvnt02.enet.sharplabs.com> Hi Tom, Because "_" has already been used in the Name fields in the IANA Charset Registry, and we don't want to overly alter those in the 'iana_' namespace for repertoires. Cheers, - Ira PS - The namespace prefix MUST use the only 'field separator' permanently. PPS - Folks should ignore this discussion for a day or two. Elliot and I are working offline to refine the proposal and figure out the impacts on the main CR spec - thanks! -----Original Message----- From: Hastings, Tom N [mailto:hastings@cp10.es.xerox.com] Sent: Monday, May 19, 2003 3:29 PM To: McDonald, Ira Cc: 'cr@pwg.org'; 'ipp@pwg.org' Subject: RE: CR> Draft of IPP "repertoire-supported" Printer attribute Minor comment: Why allow the "_" in the rep-char (repertoire character names), since space characters are to be mapped to "-", not "_"? The advantage of not allowing "_" is that a future field could be added using "_" as a field separator, but not if "_" could be in rep-char. rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars Tom -----Original Message----- From: McDonald, Ira [mailto:imcdonald@sharplabs.com] Sent: Sunday, May 18, 2003 10:31 To: 'cr@pwg.org'; 'ipp@pwg.org' Subject: CR> Draft of IPP "repertoire-supported" Printer attribute [With apologies for cross-posting to IPP and Character Repertoires mailing lists.] Background - the IEEE/ISTO PWG Character Repertoires standard (a standard for NAMES, not a standard requiring support of particular character repertoires) is nearly complete and is expected to be in PWG 'last call' during the June 2003 face-to-face PWG meeting. The latest CR working draft is at: ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html ------------------------------------------------------------------------ Hi folks, Sunday (18 May 2003) Below is a draft version of the IPP "repertoire-supported" attribute, for inclusion in Appendix B 'Bindings to IPP' of the next working draft of the PWG Character Repertoires standard. First, some background. When I started to write up this attribute, I realized that our (currently proposed) syntax for CR labels now uses characters that are not allowed in the IPP "keyword" datatype. We _could_ add a new datatype (similar to "charset") to IPP called "repertoire". Tom Hastings has convinced me that a new "repertoire" datatype is a _very_ bad idea. Most importantly, it would break all existing IPP parsers. Instead, Tom and I agree that we should alter our CR labels to achieve strict conformance to the IPP "keyword" syntax. Then IANA can register our small set of well-known CR/1.0 labels in the IANA IPP registry, along with the new IPP "repertoire-supported" attribute itself. Cheers, - Ira McDonald High North Inc ISSUE: The Unihan names (based on the source legacy CJK charset) are _not_ disjoint (i.e., they DO overlap). Should we abandon their use in favor of IANA Charset Registry names. What value do these Unihan names add? (hint - read the attribute description below before commenting) Describing the Unicode HAN character assignments based on Unicode code chart titles (from http://www.unicode.org/charts/) _does_ provide unique non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is the title for the Unicode character block starting at 'U+2E80'). ------------------------------------------------------------------------ repertoire-supported (1setOf (keyword | name)) This REQUIRED IPP Printer Description attribute identifies some or all of the character repertoires that the IPP Printer object and contained IPP Job objects support for rendering of document data content. At least the value 'unicode_basic-latin' MUST always be present, since conforming IPP Printers MUST support at least the character repertoire defined in the Unicode/4.0 'Basic Latin' code chart (and character block). A character repertoire is defined as a named subset of the characters defined in a given character set standard (e.g., Unicode/4.0) that are supported for output rendering of document data. The character set of the document data (e.g., the value of "document-charset" in the the IPP Document object) constrains the relevant character repertoires (e.g., since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that character _cannot_ be represented in the ISO 8859-1 character set). Character repertoires of legacy character sets (e.g., ISO 8859-1 and ISO 8859-2) often overlap. However, character repertoires identified by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are disjoint). Therefore, a conforming IPP Printer SHOULD advertise "repertoire-supported" values based on the Unicode/4.0 code chart titles, to avoid ambiguity. The ABNF [RFC2234] for legal values of "repertoire-supported" is: repertoire = rep-prefix "_" rep-name rep-prefix = "unicode" / ; from Code Chart titles ; of Unicode/4.0 char database "unihan" / ; from Code Chart titles of ; of Unicode/4.0 Unihan database "iana" / ; from Name or Alias fields in ; IANA Charset Registry "vendor" ; from vendor-specific ; repertoire names rep-name = rep-alpha *(rep-char) rep-char = rep-alpha / rep-digit / ; alphanumeric or "-" / "." / "_" ; limited punctuation chars rep-alpha = %61-7A ; lowercase a-z rep-digit = %30-39 ; decimal 0-9 Mapping Rule 1: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them to their corresponding lowercase alpha characters. Mapping Rule 2: If a source standard repertoire name (e.g., a value in the IANA Charset Registry [IANA-Charsets)] contains any other non-keyword characters, those characters MUST be mapped to the IPP 'keyword' syntax by converting each of them (including space) to a hyphen "-" character. From imcdonald at sharplabs.com Fri May 30 19:09:05 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> PWG SM bindings for new RepertoireSupported element Message-ID: <116DB56CD7DED511BC7800508B2CA53735D00F@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Friday (30 May 2003) Per our conversation this afternoon, below is the complete verbatim text of Appendix C for the CR spec. Cheers, - Ira McDonald High North Inc ------------------------------------------------------------------------ C. Bindings to the PWG Semantic Model (Normative) To add the RepertoireSupported element to the PWG Semantic Model, the following XML Schema fragments SHALL be added to the specified files. Add the following simple type to the file 'PwgWellKnownValues.xsd': Add the following element reference to the file 'PrinterDescription.xsd' in the complex type "PrinterDescription": Add the following simple element to the file 'PrinterDescription.xsd' after the complex type "PrinterDescription": RepertoireWKV KeywordNsExtensionPattern From imcdonald at sharplabs.com Sun Jun 1 19:45:54 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> RE: PWG-ANNOUNCE> Character Repertoires Charter and Last Call Message-ID: <116DB56CD7DED511BC7800508B2CA53735D012@mailsrvnt02.enet.sharplabs.com> [I took pwg-announce off the cc: in this reply - added CR list] Hi, I agree with your suggestion that we should be using 'charset' (in the IETF/IANA sense) for a 'coded character set' (such as Unicode 4.0) in a 'character encoding scheme' (such as UTF-8). That would also be consistent with the usage in IPP/1.1 (RFC 2911), where the base datatype 'charset' is defined (on page 86) for the IPP Printer attributes 'charset-configured' and 'charset-supported'. Also, the CR charter and eventual standard should have a reference to the W3C Character Model. Cheers, - Ira McDonald -----Original Message----- From: Jun Fujisawa [mailto:fujisawa.jun@canon.co.jp] Sent: Saturday, May 31, 2003 7:12 PM To: ElliottBradshaw@oaktech.com Cc: pwg-announce@pwg.org Subject: Re: PWG-ANNOUNCE> Character Repertoires Charter and Last Call Hello Elliott, At 5:20 PM -0400 03.5.29, ElliottBradshaw@oaktech.com wrote: >A Charter has been reviewed within the CR group and there are no open >issues. > >It is available online at >ftp://ftp.pwg.org/pub/pwg/cr/charter/ch-cr10-20030507.html. > >So today I begin a 10-day Last Call for comments on this document, prior to >a formal vote by the PWG. I feel a little uncomfortable with the following paragraph in the Charter. >In Unicode and W3C specifications, the term "character set" usually >refers to a method of encoding a (possibly very large) set of characters, >e.g. UTF-8. This tells how to encode a given character if it is present, >but doesn't define which characters in that space are actually in use. In the Character Model for the World Wide Web specification, W3C clearly deny the use of the term "character set" to refer to a method of encoding. >[S]?Specifications SHOULD avoid using the terms 'character set' and >'charset' to refer to a character encoding, except when the latter is used >to refer to the MIME charset parameter or its IANA-registered values. >The terms 'character encoding', 'character encoding form' or 'character >encoding scheme' are RECOMMENDED. I suggest to change the wording to something like the following. In Unicode and W3C specifications, the term "character set" usually refers to a (possibly very large) set of characters, e.g. ISO/IEC 10646. The term "character set", however, can be confusing in some cases, since the similar term "charset" is used as a MIME parameter, which refers to the combination of "coded character set" and "character encoding scheme", not just the former. -- Jun Fujisawa From imcdonald at sharplabs.com Thu Jun 5 11:57:35 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> Charset terminology Message-ID: <116DB56CD7DED511BC7800508B2CA53735D02A@mailsrvnt02.enet.sharplabs.com> Hi Elliot, Below are terminology section updates for the CR spec for "charset". Cheers, - Ira McDonald High North Inc ---------------------------------------- Charset Terminology The following terms are used in this specification, exactly as defined in section 1 'Definitions and Notation' of the IANA Charset Registration Procedures [RFC2978]: "character", "charset", "coded character set (CCS)", and "character encoding scheme (CES)". charset: A coded charset set (e.g., ISO/IEC 10646), optionally combined with a character set encoding scheme (e.g., UTF-8). From ElliottBradshaw at oaktech.com Tue Jun 10 17:28:28 2003 From: ElliottBradshaw at oaktech.com (ElliottBradshaw@oaktech.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR documents and agenda Message-ID: Following our discussion at the last CR conference call, I have split out Best Practices and posted two documents: ftp://ftp.pwg.org/pub/pwg/cr/wd/wd-crrs10-20030606.html ftp://ftp.pwg.org/pub/pwg/cr/wd/wd-crbp10-20030606.html At our face-to-face next week I would like to discuss: 1. Go through these documents and get any feedback prior to Last Call. 2. Comments on the draft Charter. So far the only issue is that we should use the term "charset" rather than "character set", and as you will see I have already made this change in the other documents. (I assume we will also act on the charter at plenary.) 3. Future work for the CR group. Possibilities include: -extensions to Best Practices to improve regional coverage (e.g. what is a good set of characters--not just basic--for Korea, etc.) -identifying and naming fonts -and any others that are suggested... Best regards, Elliott ------------------------------------------ Elliott Bradshaw Director, Software Engineering Oak Technology Imaging Group 781 638-7534 From elliott.bradshaw at zoran.com Tue Aug 12 13:17:22 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR update and plans for NYC Message-ID: Hi CR folks, I have been remiss in posting updates from the Portland meeting, but will do so Real Soon Now. Between now and NYC I hope to confirm the charter, which needs one modification from the Last Call. There will be posted new versions of the two documents, the spec for RepertoireSupported and the best practices. I will ask for comments, and perhaps Last Call, on these prior to NYC. There has also been discussion of another project that would define more complete coverage for regional products. E.g. a "good" set of repertoires for mainland China. Whether this would be packaged as a spec or as Best Practices is TBD. So, in summary: Items prior to NYC: -formal vote on charter -revisions, discussion, and maybe last call on two existing documents Agenda for NYC: -items from last call of two existing documents (this should be the last f2f for these) -overview and brainstorming about "good" regional coverage -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 From elliott.bradshaw at zoran.com Fri Aug 15 15:00:48 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR Charter Message-ID: In June we completed Last Call for the CR working group charter. During the Last Call period, one issue was raised: 1. The term "character set" is too vague. We should use the more technically precise term "charset". I have posted a revised charter that addresses this issue. I adapted the section in the main Repertoire Supported document which defines charset and character repertoire, and used it here. This terminology was reviewed without comment at the Portland meeting. The result is posted at: ftp://ftp.pwg.org/pub/pwg/cr/charter/ch-cr10-20030813.html I therefore believe this charter is ready for a formal vote. I will wait a few days for anyone to voice an objection, then start a voting period next week. E. P.S. Minutes from Portland are posted at http://www.pwg.org/cr/CRMinutes-June-2003.html -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 From imcdonald at sharplabs.com Sun Aug 24 16:51:47 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> FW: News: CharMod interim publication; Unicode Tech Note #10, Ind ic Scripts Message-ID: <116DB56CD7DED511BC7800508B2CA537B00179@mailsrvnt02.enet.sharplabs.com> Hi folks, Please note the new working draft of W3C "Character Model" below. And (how timely, Elliot) the Unicode Technical Note (TN10) "Introduction to Indic Scripts" by the W3C's Richard Ishida. Cheers, - Ira McDonald High North Inc -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Sunday, August 24, 2003 10:18 AM To: www-international@w3.org Subject: News: CharMod interim publication; Unicode Tech Note #10, Indic Scripts FYI # 24 Aug 2003 "Character Model for the World Wide Web 1.0" Interim Working Draft Published The Internationalization Working Group has released an interim Working Draft of the Character Model for the World Wide Web 1.0. The document addresses character encoding identification, early uniform normalization, string identity matching, string indexing, and URI conventions, building on the Universal Character Set defined by Unicode and ISO/IEC 10646. Read about the W3C Internationalization Activity. http://www.w3.org/TR/2003/WD-charmod-20030822/ # 15 Aug 2003 Unicode Technical Note #10, "An Introduction to Indic Scripts" Published A paper by Richard Ishida called "An Introduction to Indic Scripts" has been published as Unicode Technical Note #10. This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. http://www.unicode.org/notes/tn10/ ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://www.w3.org/International/geo/ See the W3C Internationalization FAQ page http://www.w3.org/International/questions.html From imcdonald at sharplabs.com Fri Sep 12 15:21:45 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> FW: New FAQ: Script direction and languages Message-ID: <116DB56CD7DED511BC7800508B2CA537B001AA@mailsrvnt02.enet.sharplabs.com> -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Friday, September 12, 2003 3:08 PM To: www-international@w3.org Subject: New FAQ: Script direction and languages The latest FAQ published by the GEO task force is: What directions are commonly localized languages written in? Find it at: http://www.w3.org/International/questions/qa-scripts.html You can find all the questions and answers, plus information about how to contribute, at http://www.w3.org/International/questions.html We hope you find this a useful resource. = ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://www.w3.org/International/geo/ See the W3C Internationalization FAQ page http://www.w3.org/International/questions.html From jim.bigelow at hp.com Thu Sep 18 20:01:21 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <020A3CF87FB5AC47AA67966B33845755057FB68B@xboi22.boise.itc.hp.com> Hello, I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print. This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3]. One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters. >From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself. My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it? Jim -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification From elliott.bradshaw at zoran.com Fri Sep 19 10:23:13 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization In-Reply-To: <020A3CF87FB5AC47AA67966B33845755057FB68B@xboi22.boise.itc.hp.com> Message-ID: What are the XHTML-Print operations that are affacted by normalization? This discussion is useful for string processing (match, substring, sort) but I don't see how that affects printing. One possible area is CSS class names; are they restricted to ASCII? Also, I don't see how a new report can change the definition of an existing spec (XHTML). Isn't this a separate set of rules that might be folded into future revisions? I would rather see a use-case that makes sense for XHTML-Print before adding this in. E. P.S. Does it have any effect on current CR documents? I don't think so. There is no discussion of combining in there at all. -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 "BIGELOW,JIM (HP-Boise,ex1) To: "'cr@pwg.org'" " cc: W3C Character Model and Early p.com> Uniform Normalization Sent by: owner-cr@pwg.o rg 09/18/2003 08:01 PM Hello, I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print. This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3]. One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters. >From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself. My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it? Jim -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification From imcdonald at sharplabs.com Fri Sep 19 13:45:25 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <116DB56CD7DED511BC7800508B2CA537B001B8@mailsrvnt02.enet.sharplabs.com> Hi, My two cents: (1) [answering Elliot] Unicode normalization has no impact at all on the CR specs - - they merely refer to character repertoires (often including both composed and uncomposed characters) which are defined (in _all_ cases) by some other standards body (Unicode, ISO, IANA, etc.). (2) [answering Jim] No - a printer should _never_ throw away any document data that happens not to be normalized (it is actually very difficult to determine if that data is already in Unicode NFC or NFKC, except by doing the whole normalization and then doing binary compare of the results with original). (3) [answering Jim] No - a printer should _never_ trust the sender/generator to have properly normalized Unicode data. (4) [my own comment] Early Uniform Normalization is important and useful for _very_ small pieces of data and _narrow_ fields of application (such as IETF's I18N Domain Names standards). The day will never come that receivers need not check for (or simply perform) normalization, if needed. Some rendering algorithms happen to require that Unicode data be pre-normalized, but that's an implementation nit. Cheers, - Ira McDonald High North Inc -----Original Message----- From: elliott.bradshaw@zoran.com [mailto:elliott.bradshaw@zoran.com] Sent: Friday, September 19, 2003 10:23 AM To: BIGELOW,JIM (HP-Boise,ex1) Cc: 'cr@pwg.org'; owner-cr@pwg.org Subject: Re: CR> W3C Character Model and Early Uniform Normalization What are the XHTML-Print operations that are affacted by normalization? This discussion is useful for string processing (match, substring, sort) but I don't see how that affects printing. One possible area is CSS class names; are they restricted to ASCII? Also, I don't see how a new report can change the definition of an existing spec (XHTML). Isn't this a separate set of rules that might be folded into future revisions? I would rather see a use-case that makes sense for XHTML-Print before adding this in. E. P.S. Does it have any effect on current CR documents? I don't think so. There is no discussion of combining in there at all. ---------------------------------------------------------------------------- ---- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 "BIGELOW,JIM (HP-Boise,ex1) To: "'cr@pwg.org'" " cc: W3C Character Model and Early p.com> Uniform Normalization Sent by: owner-cr@pwg.o rg 09/18/2003 08:01 PM Hello, I've been reading the W3C Working Draft, Character Model for the World Wide Web [1], which deals with requires of internet applications should as producers and consumers of XHTML-Print. This report [1] indicates that XHTML-Print as a derivate of XHTML is bound by it. Therefore, by extension, all XHTML-Print producing and consuming applications are bound by this report all thought this is never explicitly stated in any version of the XHTML-Print specification [2,3]. One of the interesting parts of [1] is the requirement that applications that produce XHTML-Print should produce fully-normalized text [4] meaning, among other things, that it is in Unicode Normalized Form C [5], which favors the canonical composite forms of Unicode characters. >From the printer's perspective, as a receiver of XHTML-Print documents, this makes its job easier since it can always assume that text is fully-normalized and it doesn't have to do so itself. My question to you is, do you think that the XHTML-Print specification should be amended to site the requirement that a conforming XHTML-Print document be fully-normalized? Furthermore, should a printer be required to check an XHTML-Print document to see that it is fully-normalized or should it assume so? Lastly, should a printer normalize text that is not fully-normalized or discard it? Jim -- Jim Bigelow, Editor: XHTML-Print & CSS Print Profile Member: W3C HTML and CSS Working Groups Hewlett-Packard 208-396-2068 jim.bigelow@hp.com [1] http://www.w3.org/TR/charmod/ [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html [3] http://www.w3.org/TR/xhtml-print/ [4] http://www.w3.org/TR/2003/WD-charmod-20030822/#sec-FullyNormalized [5] http://www.unicode.org/unicode/reports/tr15/#Specification From jim.bigelow at hp.com Mon Sep 22 18:37:33 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <020A3CF87FB5AC47AA67966B3384575505A0A44A@xboi22.boise.itc.hp.com> Elliott wrote: > What are the XHTML-Print operations that are affected by > normalization? This discussion is useful for string > processing (match, substring, sort) but I don't see how that > affects printing. One possible area is CSS class names; are > they restricted to ASCII? The CSS 2 specification for identifiers is in Section 4.1.3 [1] and states that CSS class names are not restricted to ASCII. So, if a class name is written with precomposed characters in one place and anyone of the other equivalent sequences in another place, then the two instances would only match if they were normalized, preferably to Normalized Form C. The same holds true for id attribute values. > > Also, I don't see how a new report can change the definition > of an existing spec (XHTML). Isn't this a separate set of > rules that might be folded into future revisions? > I think that the report [2] has been around for a while and I've just now become aware of it. I think it's an omission that the XHTML-Print spec doesn't reference [2] as a normative reference. This allows the situation where na?ve implementations fail in the situation noted above. This could be addressed by adding a normative reference. Jim [1] http://www.w3.org/TR/REC-CSS2/syndata.html#q4 [2] http://www.w3.org/TR/charmod/ From jim.bigelow at hp.com Mon Sep 22 18:51:44 2003 From: jim.bigelow at hp.com (BIGELOW,JIM (HP-Boise,ex1)) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <020A3CF87FB5AC47AA67966B3384575505A0A46C@xboi22.boise.itc.hp.com> Ira wrote: > > (2) [answering Jim] > No - a printer should _never_ throw away any document data > that happens not to be normalized ... I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer Conformance sections that a printer may "flush or otherwise reject a non-conforming XHTML-Print document." This is the source of my worry that a printer could reject a document that is not normalized. > > (3) [answering Jim] > No - a printer should _never_ trust the sender/generator > to have properly normalized Unicode data. If a very low cost printer assumed that an XHTML-Print document's content is normalized and it is not, the very worse that could happen is that word breaks occur in the wrong place, e.g., between a letter and it's non-spacing mark, or class/id selectors don't match the value of the class/id attribute -- causing the misapplication of style sheet rules. I think the a printer should normalize and therefore correctly handle combining characters. I just wondering if other printer people think such a normalization should be mandated for all printers. Jim [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html {3] From elliott.bradshaw at zoran.com Tue Sep 23 11:16:52 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization In-Reply-To: <020A3CF87FB5AC47AA67966B3384575505A0A46C@xboi22.boise.itc.hp.com> Message-ID: I don't mind if we require that an XHTML-Print printer normalizes its input. We probably have to stipulate what this means for CR. If a printer advertises a repertoire that supports two combinable characters, is it implicitly saying that it also supports the combination of the two? I am inclined to put this point in Best Practices and leave it out of the normative spec. (I'll track this as a Last Call issue for CR.) -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Group (formerly Oak Technology Imaging Group) 781 638-7534 "BIGELOW,JIM (HP-Boise,ex1) To: "'cr@pwg.org'" " cc: W3C Character Model and Early p.com> Uniform Normalization Sent by: owner-cr@pwg.o rg 09/22/2003 06:51 PM Ira wrote: > > (2) [answering Jim] > No - a printer should _never_ throw away any document data > that happens not to be normalized ... I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer Conformance sections that a printer may "flush or otherwise reject a non-conforming XHTML-Print document." This is the source of my worry that a printer could reject a document that is not normalized. > > (3) [answering Jim] > No - a printer should _never_ trust the sender/generator > to have properly normalized Unicode data. If a very low cost printer assumed that an XHTML-Print document's content is normalized and it is not, the very worse that could happen is that word breaks occur in the wrong place, e.g., between a letter and it's non-spacing mark, or class/id selectors don't match the value of the class/id attribute -- causing the misapplication of style sheet rules. I think the a printer should normalize and therefore correctly handle combining characters. I just wondering if other printer people think such a normalization should be mandated for all printers. Jim [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html {3] From imcdonald at sharplabs.com Wed Sep 24 10:21:49 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> W3C Character Model and Early Uniform Normalization Message-ID: <116DB56CD7DED511BC7800508B2CA537B001C3@mailsrvnt02.enet.sharplabs.com> Hi Jim, To reduce the implementation burden, I suggest that XHTML-Print state the a conforming Printer SHOULD normalize the document data to NFC (citing UAX-15 as the authoritative source). Since W3C Charmod is still a working draft, XHTML-Print should NOT have a Normative reference to W3C Charmod (which would prevent publication of XHTML-Print as PWG Candidate Standard). Because normalization is a fairly costly activity on large volumes of data (I wrote the normalization library for the forthcoming CUPS 1.2 release), I suggest that the XHTML-Print conformance be SHOULD rather than MUST. Cheers, - Ira McDonald High North Inc -----Original Message----- From: BIGELOW,JIM (HP-Boise,ex1) [mailto:jim.bigelow@hp.com] Sent: Monday, September 22, 2003 6:52 PM To: 'cr@pwg.org' Subject: RE: CR> W3C Character Model and Early Uniform Normalization Ira wrote: > > (2) [answering Jim] > No - a printer should _never_ throw away any document data > that happens not to be normalized ... I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer Conformance sections that a printer may "flush or otherwise reject a non-conforming XHTML-Print document." This is the source of my worry that a printer could reject a document that is not normalized. > > (3) [answering Jim] > No - a printer should _never_ trust the sender/generator > to have properly normalized Unicode data. If a very low cost printer assumed that an XHTML-Print document's content is normalized and it is not, the very worse that could happen is that word breaks occur in the wrong place, e.g., between a letter and it's non-spacing mark, or class/id selectors don't match the value of the class/id attribute -- causing the misapplication of style sheet rules. I think the a printer should normalize and therefore correctly handle combining characters. I just wondering if other printer people think such a normalization should be mandated for all printers. Jim [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html {3] From imcdonald at sharplabs.com Thu Sep 25 14:47:30 2003 From: imcdonald at sharplabs.com (McDonald, Ira) Date: Wed May 6 13:53:40 2009 Subject: CR> FW: New W3C FAQ: CSS character encoding declarations Message-ID: <116DB56CD7DED511BC7800508B2CA537B001CC@mailsrvnt02.enet.sharplabs.com> -----Original Message----- From: Richard Ishida [mailto:ishida@w3.org] Sent: Thursday, September 25, 2003 4:59 AM To: www-international@w3.org Subject: New FAQ: CSS character encoding declarations The latest FAQ addressed by the GEO task force is: How do I declare the character encoding inside a CSS (Cascading Style Sheets) style sheet? Find our answer at: http://www.w3.org/International/questions/qa-css-charset.html You can find all the FAQs, plus information about how to contribute, at http://www.w3.org/International/questions.html ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://www.w3.org/International/geo/ See the W3C Internationalization FAQ page http://www.w3.org/International/questions.html From elliott.bradshaw at zoran.com Thu Oct 2 11:33:09 2003 From: elliott.bradshaw at zoran.com (elliott.bradshaw@zoran.com) Date: Wed May 6 13:53:40 2009 Subject: CR> CR face-to-face next Wed. Message-ID: Does anyone desire to join this meeting by phone? At this point I haven't done anything about it. E. -------------------------------------------------------------------------------- Elliott Bradshaw Director, Software Engineering Zoran Imaging Division (formerly Oak Technology Imaging Group) 781 638-7534