UPD Mail Archive: UPD> character sets in device fonts

UPD> character sets in device fonts

From: Norbert Schade (nschade@xionics.com)
Date: Tue Mar 28 2000 - 16:55:19 EST

  • Next message: don@lexmark.com: "UPD> Tokyo meeting"

    You may want to find out about the reasons why I designed the character set
    handling the way I introduced it almost three weeks ago. See some
    explanations in the following section.

    Character selection in device fonts

    The target format of this spec concerning device fonts – in case they
    survive in the longterm run – is based on Unicode handling. That means a
    character would be identified by its Unicode. Character sets would become
    redundant.

    As long as this is not reality, we need character sets. I keeping away more
    and more from thinking in terms of 1-byte and 2-byte character sets. I’ve
    learnt that a set is a more or less arbitrary collection of characters. But
    the fact is that character sets are needed to identify characters in a font.

    DeviceCharacterSets
    I want to realize some rules when defining ways to identify characters in
    the device:
    1. I do not see any reason to specify all predefined character sets of a PDL
    with all characters.
    The ideal solution would be to specify each character exactly once, whatever
    the used character set for that character is.
    2. As the ideal solution could have a serious impact on the print file size
    and especially on the performance and as PDL’s nowadays have a number of
    character sets, which are at least close to operating system character sets,
    I consider it useful to specify some character sets used in the PDL with all
    characters. Those character sets will be the ones used as primary and
    secondary sets in the CharacterSetIdentification.
    This avoids unnecessary switching and the next character will most likely be
    found more quickly.
    This could result in a device font with several Windows character sets plus
    some sets with only few characters, which are really new ones compared to
    the primary/secondary sets.
    This would allow to list all characters without an overwhelming amount of
    data. Targets of this priority is to list all characters while allowing some
    redundancy for performance advantages.
    3. If operating system character sets are not available in the numbers
    needed and if user defined character sets are supported that could be the
    alternative. We can discuss whether this should be the top priority.
    This approach offers the easiest switch from character sets to real Unicode
    handling in the device. Then there would not be any DeviceCharacterSets
    section.
    It is to be investigated, whether these DeviceCharacterSets should be
    specified outside the individual font specification, as many if not most of
    them will appear in many fonts. The font specific section would then only
    list, which character sets are available, not the ranges. It should be a
    target to avoid redundancy where possible. But in this case we would loose
    the independency of each font specification.

    Ranges
    I think ranges are at least an alternative to matrix definitions. The
    problem with matrix definitions always is to handle 1-byte (16x16 matrix)
    and multibyte character sets.
    To list each character with its Unicode and the corresponding binary output
    would create long lists in many cases, while not each pair of data is not
    really providing new information. A range from U+0020 to U+007E for example
    could just show the binary output for the first Unicode. The binary values
    for all following characters in that range can easily be calculated. This is
    a kind of compression therefore.

    CharacterSetIdentification
    After having defined all necessary DeviceCharacterSets we can think about
    how to tell, which operating system character sets the font can fulfill.
    So for any OS char set there might be a device char set with the perfect
    match. This one would be listed as primary and that’s it. In case it’s just
    the best one available and very good, but not perfect, it would still be
    listed as primary, but one or more other could be added as secondary sets.
    I assume that the search for a character would always start in the character
    where the driver found the last character, then in the primary and then in
    the secondary sets. This should provide a good performance with a maximum of
    flexibility.
    CharacterSubstitutionTables can provide further help in the search process.
    To be investigated, whether characters, which are not listed in either the
    primary or the secondary sets, can be used for that OS char set at all.

    I hope I made it clear enough that this structure would support one Arial
    font for all OS char sets and not one per OS char set like realized in most
    of today's drivers.

    Norbert



    This archive was generated by hypermail 2b29 : Tue Mar 28 2000 - 17:02:01 EST