IFX Mail Archive: RE: IFX> PDF/is Issue.

IFX Mail Archive: RE: IFX> PDF/is Issue.

RE: IFX> PDF/is Issue.

From: Rick Seeler (rseeler@adobe.com)
Date: Thu Mar 06 2003 - 14:37:15 EST

  • Next message: Gail Songer: "IFX> Conf Call Summary"

    Yes, the stream length should precede the stream, if possible (this is allowed).
    But, in the case where the stream may be long, this may not be possible for the
    Producer. In that case, the length should be an indirect object reference to
    the length that should come immediately after the stream.
    As for your idea of scanning for "endstream" that's followed by the size object.
    This still has the same problem as scanning for "endstream" but just has more
    data and a smaller likelihood of occurrence.
    Given that, and what I discussed in my previous e-mail on this subject (to Rob
    Buckley), I think the best approach might be to:
    1) The Producer MUST always write the stream length of all 'Content Streams' and
    'ICC Profile' streams immediately in the object dictionary (before the stream).
    2) When writing image streams, the Producer MAY either write the stream length
    before or after the stream, as they prefer.
    3) When an image stream is length succeeded (indirect object), the Consumer
    SHOULD decode image streams to determine the stream length, when possible. But,
    the Consumer MAY (at their peril) scan for the 'endstream' marker.
    How does this sound as a solution?


    -----Original Message-----
    From: owner-ifx@pwg.org [mailto:owner-ifx@pwg.org] On Behalf Of Poysa, Kari
    Sent: Thursday, March 06, 2003 7:15 AM
    To: 'Carl Kugler'
    Cc: ifx@pwg.org
    Subject: RE: IFX> PDF/is Issue.

    In my opinion the goal should be to write the stream length immediately to the
    stream dictionary.
    Also, the likelihood of "endofstream" to exists in the data is small. We could
    also require that if a low resource streaming writer is not able to add the
    length directly into the stream directory, then the PDF object for the length
    MUST immediately follow the stream object. This way, the reader can scan for
    "endofstream" (but of course only if the length was not in the stream
    dictionary) and make sure that it is the correct "endofstream" by verifying that
    it is immediately followed by something that looks like a length object. Could
    reader implementers comment on this?
    I think introducing an additional filter like ASCII85 just for spotting the end
    of stream adds unnecessary complexity to both writer and reader, increases file
    sizes and also requires more memory and processing as the stream cannot be
    passed directly to a decompressor.
        --- Kari ---

    -----Original Message-----
    From: Carl Kugler [mailto:kugler@us.ibm.com]
    Sent: Wednesday, March 05, 2003 10:50 AM
    Cc: ifx@pwg.org
    Subject: RE: IFX> PDF/is Issue.

    I like the chunking approach. It is efficient, reliable, and has low overhead
    for reasonably sized chunks. Also fits well in a typical implementation that
    writes a chunk of data at a time.


            "Zehler, Peter" <PZehler@crt.xerox.com>
    Sent by: owner-ifx@pwg.org

    03/05/2003 05:00 AM

            To: "'Rick Seeler'" <rseeler@adobe.com>, ifx@pwg.org
            Subject: RE: IFX> PDF/is Issue.

    Why not just increase the size of the length field signature? Could this be
    done by the addition of data or comments in the length object or by adding
    another object? I don't know pdf very well. I don't think we need 0%
    probability of confusion just a statistically insignificant chance.

    Peter Zehler
    Xerox Architecture Center
    Email: PZehler@crt.xerox.com
    Voice: (585) 265-8755
    FAX: (585) 265-8871
    US Mail: Peter Zehler

            Xerox Corp.
           800 Phillips Rd.
           M/S 128-30E
           Webster NY, 14580-9701

    -----Original Message-----
    From: Rick Seeler [mailto:rseeler@adobe.com]
    Sent: Tuesday, March 04, 2003 1:29 PM
    To: ifx@pwg.org
    Subject: IFX> PDF/is Issue.

    During prototyping of PDF/is the following problem arose:
    How does the Consumer know when the end of a data stream (See section 3.2.7 of
    [pdf]) is reached? Normally, in a PDF, the Consumer would consult the stream
    length field. The problem here is where to put the length field. If the length
    were placed before the stream, the Consumer would know how long the stream is.
    This requires the Producer to know the stream's length before writing it to the
    Consumer. If, instead, the length were written at the end of the stream, this
    would solve the Producer's problem but the Consumer would not know how to find
    the length since they can't identify, 100% of the time, where the stream ends
    and where the length object is.
    An example will illustrate:
    First, the normal case...
    sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....
    12 0 obj
    3456 <- the length of the previous stream.
    But, what if the data looked like this...
    sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....
    endstream <- the binary data could have a string of bytes that looked
    like this.
    12 0 obj
    4567 <- the length of the previous stream.
    Of course, you could look to bytes after the appearance of the word 'endstream'
    to see if this is really the end of the stream; but you can always come up with
    a stream that could match your parsing algorithm's expectations (although with
    decreasing percentage of occurrence).
    Possible solutions:
    1) Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]). This
    will increase stream lengths by 25%. ASCII85 has a stream delimiter which would
    solve this problem -- the end of the stream can be known for certain and the
    length field can be placed after the stream.
    2) Require the Producer to write the stream length before any stream (the
    streams would stay binary). The Producer can use banding to break up large
    images into small enough chunks so the Producer can cache the stream before
    3) Offer a combination of 1 & 2. The Producer would cache streams if possible,
    but may use ASCII85, if necessary.
    4) Producer must make certain all streams must not contain a series of bytes
    "\0D\0Aendstream" in the stream data. This is how the spec is defined currently
    -- but this may be too onerous for the Producer.
    Any other ideas? I'm personally leaning toward solution #3.


    This archive was generated by hypermail 2b29 : Thu Mar 06 2003 - 14:37:39 EST