IFX Mail Archive: RE: IFX> PDF/is Issue.

RE: IFX> PDF/is Issue.

From: Carl Kugler (kugler@us.ibm.com)
Date: Wed Mar 05 2003 - 10:50:07 EST

  • Next message: Gail Songer: "IFX> FW: Meeting: {IPP FAX / PDF-is} March 06, 2003 02:00 PM America/Los_Angeles {123123}"

    I like the chunking approach. It is efficient, reliable, and has low
    overhead for reasonably sized chunks. Also fits well in a typical
    implementation that writes a chunk of data at a time.

            -Carl

    "Zehler, Peter" <PZehler@crt.xerox.com>
    Sent by: owner-ifx@pwg.org
    03/05/2003 05:00 AM

     
            To: "'Rick Seeler'" <rseeler@adobe.com>, ifx@pwg.org
            cc:
            Subject: RE: IFX> PDF/is Issue.

    Rick,
    Why not just increase the size of the length field signature? Could this
    be done by the addition of data or comments in the length object or by
    adding another object? I don't know pdf very well. I don't think we need
    0% probability of confusion just a statistically insignificant chance.
    Pete
     
    Peter Zehler
    XEROX
    Xerox Architecture Center
    Email: PZehler@crt.xerox.com
    Voice: (585) 265-8755
    FAX: (585) 265-8871
    US Mail: Peter Zehler
            Xerox Corp.
            800 Phillips Rd.
            M/S 128-30E
            Webster NY, 14580-9701
    -----Original Message-----
    From: Rick Seeler [mailto:rseeler@adobe.com]
    Sent: Tuesday, March 04, 2003 1:29 PM
    To: ifx@pwg.org
    Subject: IFX> PDF/is Issue.

    During prototyping of PDF/is the following problem arose:
     
    How does the Consumer know when the end of a data stream (See section
    3.2.7 of [pdf]) is reached? Normally, in a PDF, the Consumer would
    consult the stream length field. The problem here is where to put the
    length field. If the length were placed before the stream, the Consumer
    would know how long the stream is. This requires the Producer to know the
    stream's length before writing it to the Consumer. If, instead, the
    length were written at the end of the stream, this would solve the
    Producer's problem but the Consumer would not know how to find the length
    since they can't identify, 100% of the time, where the stream ends and
    where the length object is.
     
    An example will illustrate:
    First, the normal case...
     
    stream
    sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
    here)....
    84trhdvfyu7wgf4.nbdrgur4uaru4gb
    endstream
    12 0 obj
    3456 <- the length of the previous stream.
    endobj
     
    But, what if the data looked like this...
     
    stream
    sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
    here)....
    endstream <- the binary data could have a string of bytes that
    looked like this.
    84trhdvfyu7wgf4.nbdrgur4uaru4gb
    endstream
    12 0 obj
    4567 <- the length of the previous stream.
    endobj
     
    Of course, you could look to bytes after the appearance of the word
    'endstream' to see if this is really the end of the stream; but you can
    always come up with a stream that could match your parsing algorithm's
    expectations (although with decreasing percentage of occurrence).
     
    Possible solutions:
    1) Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]).
    This will increase stream lengths by 25%. ASCII85 has a stream delimiter
    which would solve this problem -- the end of the stream can be known for
    certain and the length field can be placed after the stream.
    2) Require the Producer to write the stream length before any stream (the
    streams would stay binary). The Producer can use banding to break up
    large images into small enough chunks so the Producer can cache the stream
    before sending.
    3) Offer a combination of 1 & 2. The Producer would cache streams if
    possible, but may use ASCII85, if necessary.
    4) Producer must make certain all streams must not contain a series of
    bytes "\0D\0Aendstream" in the stream data. This is how the spec is
    defined currently -- but this may be too onerous for the Producer.
     
    Any other ideas? I'm personally leaning toward solution #3.
     
    -Rick
     



    This archive was generated by hypermail 2b29 : Wed Mar 05 2003 - 10:50:44 EST