UPD Mail Archive: UPD> character sets in device fonts

UPD> character sets in device fonts

From: Norbert Schade (nschade@xionics.com)
Date: Tue Mar 28 2000 - 16:55:19 EST

Next message: don@lexmark.com: "UPD> Tokyo meeting"

Previous message: Norbert Schade: "UPD> True Type fonts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

You may want to find out about the reasons why I designed the character set
handling the way I introduced it almost three weeks ago. See some
explanations in the following section.

Character selection in device fonts

The target format of this spec concerning device fonts � in case they
survive in the longterm run � is based on Unicode handling. That means a
character would be identified by its Unicode. Character sets would become
redundant.

As long as this is not reality, we need character sets. I keeping away more
and more from thinking in terms of 1-byte and 2-byte character sets. I�ve
learnt that a set is a more or less arbitrary collection of characters. But
the fact is that character sets are needed to identify characters in a font.

DeviceCharacterSets
I want to realize some rules when defining ways to identify characters in
the device:
1. I do not see any reason to specify all predefined character sets of a PDL
with all characters.
The ideal solution would be to specify each character exactly once, whatever
the used character set for that character is.
2. As the ideal solution could have a serious impact on the print file size
and especially on the performance and as PDL�s nowadays have a number of
character sets, which are at least close to operating system character sets,
I consider it useful to specify some character sets used in the PDL with all
characters. Those character sets will be the ones used as primary and
secondary sets in the CharacterSetIdentification.
This avoids unnecessary switching and the next character will most likely be
found more quickly.
This could result in a device font with several Windows character sets plus
some sets with only few characters, which are really new ones compared to
the primary/secondary sets.
This would allow to list all characters without an overwhelming amount of
data. Targets of this priority is to list all characters while allowing some
redundancy for performance advantages.
3. If operating system character sets are not available in the numbers
needed and if user defined character sets are supported that could be the
alternative. We can discuss whether this should be the top priority.
This approach offers the easiest switch from character sets to real Unicode
handling in the device. Then there would not be any DeviceCharacterSets
section.
It is to be investigated, whether these DeviceCharacterSets should be
specified outside the individual font specification, as many if not most of
them will appear in many fonts. The font specific section would then only
list, which character sets are available, not the ranges. It should be a
target to avoid redundancy where possible. But in this case we would loose
the independency of each font specification.

Ranges
I think ranges are at least an alternative to matrix definitions. The
problem with matrix definitions always is to handle 1-byte (16x16 matrix)
and multibyte character sets.
To list each character with its Unicode and the corresponding binary output
would create long lists in many cases, while not each pair of data is not
really providing new information. A range from U+0020 to U+007E for example
could just show the binary output for the first Unicode. The binary values
for all following characters in that range can easily be calculated. This is
a kind of compression therefore.

CharacterSetIdentification
After having defined all necessary DeviceCharacterSets we can think about
how to tell, which operating system character sets the font can fulfill.
So for any OS char set there might be a device char set with the perfect
match. This one would be listed as primary and that�s it. In case it�s just
the best one available and very good, but not perfect, it would still be
listed as primary, but one or more other could be added as secondary sets.
I assume that the search for a character would always start in the character
where the driver found the last character, then in the primary and then in
the secondary sets. This should provide a good performance with a maximum of
flexibility.
CharacterSubstitutionTables can provide further help in the search process.
To be investigated, whether characters, which are not listed in either the
primary or the secondary sets, can be used for that OS char set at all.

I hope I made it clear enough that this structure would support one Arial
font for all OS char sets and not one per OS char set like realized in most
of today's drivers.

Norbert

Next message: don@lexmark.com: "UPD> Tokyo meeting"
Previous message: Norbert Schade: "UPD> True Type fonts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Mar 28 2000 - 17:02:01 EST