IFX> PDF/is Issue.

Wed Mar 5 10:50:07 EST 2003

I like the chunking approach.  It is efficient, reliable, and has low 
overhead for reasonably sized chunks.  Also fits well in a typical 
implementation that writes a chunk of data at a time.

        -Carl

"Zehler, Peter" <PZehler at crt.xerox.com>
Sent by: owner-ifx at pwg.org
03/05/2003 05:00 AM

        To:     "'Rick Seeler'" <rseeler at adobe.com>, ifx at pwg.org
        cc: 
        Subject:        RE: IFX> PDF/is Issue.

Rick,
Why not just increase the size of the length field signature?  Could this 
be done by the addition of data or comments in the length object or by 
adding another object?  I don't know pdf very well.  I don't think we need 
0% probability of confusion just a statistically insignificant chance.
Pete

Peter Zehler 
XEROX 
Xerox Architecture Center 
Email: PZehler at crt.xerox.com 
Voice:    (585) 265-8755 
FAX:      (585) 265-8871 
US Mail: Peter Zehler 
        Xerox Corp. 
        800 Phillips Rd. 
        M/S 128-30E 
        Webster NY, 14580-9701 
-----Original Message-----
From: Rick Seeler [mailto:rseeler at adobe.com]
Sent: Tuesday, March 04, 2003 1:29 PM
To: ifx at pwg.org
Subject: IFX> PDF/is Issue.

During prototyping of PDF/is the following problem arose:

How does the Consumer know when the end of a data stream (See section 
3.2.7 of [pdf]) is reached?  Normally, in a PDF, the Consumer would 
consult the stream length field.  The problem here is where to put the 
length field.  If the length were placed before the stream, the Consumer 
would know how long the stream is. This requires the Producer to know the 
stream's length before writing it to the Consumer.  If, instead, the 
length were written at the end of the stream, this would solve the 
Producer's problem but the Consumer would not know how to find the length 
since they can't identify, 100% of the time, where the stream ends and 
where the length object is.

An example will illustrate:
First, the normal case...

stream
sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data 
here)....
84trhdvfyu7wgf4.nbdrgur4uaru4gb
endstream
12 0 obj
3456    <- the length of the previous stream.
endobj

But, what if the data looked like this...

stream
sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data 
here)....
endstream            <- the binary data could have a string of bytes that 
looked like this.
84trhdvfyu7wgf4.nbdrgur4uaru4gb
endstream
12 0 obj
4567    <- the length of the previous stream.
endobj

Of course, you could look to bytes after the appearance of the word 
'endstream' to see if this is really the end of the stream; but you can 
always come up with a stream that could match your parsing algorithm's 
expectations (although with decreasing percentage of occurrence).

Possible solutions:
1) Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]). 
This will increase stream lengths by 25%.  ASCII85 has a stream delimiter 
which would solve this problem -- the end of the stream can be known for 
certain and the length field can be placed after the stream.
2) Require the Producer to write the stream length before any stream (the 
streams would stay binary).  The Producer can use banding to break up 
large images into small enough chunks so the Producer can cache the stream 
before sending.
3) Offer a combination of 1 & 2.  The Producer would cache streams if 
possible, but may use ASCII85, if necessary.
4) Producer must make certain all streams must not contain a series of 
bytes "\0D\0Aendstream" in the stream data.  This is how the spec is 
defined currently -- but this may be too onerous for the Producer.

Any other ideas?  I'm personally leaning toward solution #3.

-Rick

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pwg.org/archives/ifx/attachments/20030305/3a343c51/attachment-0001.html