attachment-0001
<br><font size=2 face="sans-serif">Of course, if you had to read in reverse order starting from the end of the file, that would have performance implications, too. It would be impossible for the reader to start generating output until the writer had finished generating and transmitting the document.</font>
<br>
<br><font size=2 face="sans-serif"> -Carl</font>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>"Poysa, Kari" <Kari.Poysa@usa.xerox.com></b></font>
<p><font size=1 face="sans-serif">03/12/2003 07:04 AM</font>
<br>
<td><font size=1 face="Arial"> </font>
<br><font size=1 face="sans-serif"> To: "Hastings, Tom N" <hastings@cp10.es.xerox.com>, "'Rick Seeler'" <rseeler@adobe.com>, Carl Kugler/Boulder/IBM@IBMUS</font>
<br><font size=1 face="sans-serif"> cc: ifx@pwg.org</font>
<br><font size=1 face="sans-serif"> Subject: RE: IFX> PDF/is Issue.</font>
<br></table>
<br>
<br>
<br><font size=2 color=blue face="Arial">Tom, The Length being discussed here actually is the byte count of the streams of Image XObjects that belong to the Page. So if the Page is comprised of more than one image (a.k.a banding), then the sender does not need to cache even a full page's worth of compressed data in order to be able to write the Image XObject's stream length in the stream dictionary.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">Full PDF allows the writer to enter an indirect object reference into the required Length entry. This makes it easy to implement writers because the separate object for the length can be written after all of the image data has been written. The PDF files are then read in the reverse order starting from the end of the file. This works well if one has a file system to store the complete PDF file. So requiring the Length to be a direct value in the stream dictionary most likely would cause existing writer SW to have to be modified. One could not keep writing the same kind of files and claim them PDF/is compliant.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial"> --- Kari ---</font>
<br><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> Hastings, Tom N <b><br>
Sent:</b> Tuesday, March 11, 2003 5:49 PM<b><br>
To:</b> Poysa, Kari; 'Rick Seeler'; 'Carl Kugler'<b><br>
Cc:</b> ifx@pwg.org<b><br>
Subject:</b> RE: IFX> PDF/is Issue.<br>
</font>
<br><font size=2 color=blue face="Arial">Kari,</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">I think you summed up the argument about tradeoff simply between the Sender and the Receiver when you said:</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">"If we require the reader to be able to cache a page's worth of uncompressed data, surely we can require the writer to cache a page's worth of compressed data [in order to determine the length and send that length in the stream]."</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">I assume that PDF has the notion of a length for each page, right? So we require that the Sender put in a length field for each page of data at the front of each page of data. Can that length field be sent with the data in some manner, so that the Sender doesn't have to know the lengths of all of the pages before sending any?</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">Tom</font>
<br><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> Poysa, Kari [mailto:Kari.Poysa@usa.xerox.com]<b><br>
Sent:</b> Friday, March 07, 2003 15:04<b><br>
To:</b> 'Rick Seeler'; 'Carl Kugler'<b><br>
Cc:</b> ifx@pwg.org<b><br>
Subject:</b> RE: IFX> PDF/is Issue.<br>
</font>
<br><font size=2 color=blue face="Arial">Rick, I bet this solution can be implemented, but it does have some problems for the reader that unfortunately I did not see earlier. The difficulty really is whether we want to make life easy for the streaming writer or the reader. </font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">If the length follows the image stream, the reader must scan the filtered stream to find the end of the stream. This can make the reader implementation both cumbersome and slow, especially if the stream has to be fully decoded during the PDF file parsing, instead of simply extracting the correct amount of binary data and passing it to a separate decompression module. The PDF file parser would have to know details of the compressed streams which should really be of no interest to the PDF file parser module and makes creating applications from 3rd party components harder.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">In addition, if the reader attempts to decode the stream, how much data should be cached and decoded at a time? If the end of stream is not found at first attempt, one has to pass additional data to the decoder and continue decoding from where previous data ended. This can delay achieving robust implementations. The alternative, searching for the "endstream" text, is not 100% reliable (although very close) and is a wasted step since no decompression is achieved yet.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">This issue is really at the heart of what "streamable" means, and also has a big impact on what kind of low resource applications PDF/is can be used for. I think we should consider it a "MUST" for the writer to prefix the stream with its length, since the goal is to make the file format streamable especially at a low resource reader. If we require the reader to be able to cache a page's worth of uncompressed data, surely we can require the writer to cache a page's worth of compressed data.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">I do understand Ira McDonalds note about streaming writers (see separate Email). Possibly this issue whether to prefix or postfix image streams with their lengths should be a negotiable capability between the sender and receiver?</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial"> --- Kari ---</font>
<br><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> Rick Seeler [mailto:rseeler@adobe.com]<b><br>
Sent:</b> Thursday, March 06, 2003 2:37 PM<b><br>
To:</b> 'Poysa, Kari'; 'Carl Kugler'<b><br>
Cc:</b> ifx@pwg.org<b><br>
Subject:</b> RE: IFX> PDF/is Issue.<br>
</font>
<br><font size=2 color=blue face="Arial">Kari,</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">Yes, the stream length should precede the stream, if possible (this is allowed). But, in the case where the stream may be long, this may not be possible for the Producer. In that case, the length should be an indirect object reference to the length that should come immediately after the stream.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">As for your idea of scanning for "endstream" that's followed by the size object. This still has the same problem as scanning for "endstream" but just has more data and a smaller likelihood of occurrence.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">Given that, and what I discussed in my previous e-mail on this subject (to Rob Buckley), I think the best approach might be to:</font>
<br><font size=2 color=blue face="Arial">1) The Producer MUST always write the stream length of all 'Content Streams' and 'ICC Profile' streams immediately in the object dictionary (before the stream).</font>
<br><font size=2 color=blue face="Arial">2) When writing image streams, the Producer MAY either write the stream length before or after the stream, as they prefer.</font>
<br><font size=2 color=blue face="Arial">3) When an image stream is length succeeded (indirect object), the Consumer SHOULD decode image streams to determine the stream length, when possible. But, the Consumer MAY (at their peril) scan for the 'endstream' marker.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">How does this sound as a solution?</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=3 face="Times New Roman"> </font>
<p><font size=2 face="Times New Roman">-Rick</font>
<p><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> owner-ifx@pwg.org [mailto:owner-ifx@pwg.org] <b>On Behalf Of </b>Poysa, Kari<b><br>
Sent:</b> Thursday, March 06, 2003 7:15 AM<b><br>
To:</b> 'Carl Kugler'<b><br>
Cc:</b> ifx@pwg.org<b><br>
Subject:</b> RE: IFX> PDF/is Issue.<br>
</font>
<br><font size=2 color=blue face="Arial">In my opinion the goal should be to write the stream length immediately to the stream dictionary. </font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">Also, the likelihood of "endofstream" to exists in the data is small. We could also require that if a low resource streaming writer is not able to add the length directly into the stream directory, then the PDF object for the length MUST immediately follow the stream object. This way, the reader can scan for "endofstream" (but of course only if the length was not in the stream dictionary) and make sure that it is the correct "endofstream" by verifying that it is immediately followed by something that looks like a length object. Could reader implementers comment on this?</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial">I think introducing an additional filter like ASCII85 just for spotting the end of stream adds unnecessary complexity to both writer and reader, increases file sizes and also requires more memory and processing as the stream cannot be passed directly to a decompressor.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 color=blue face="Arial"> --- Kari ---</font>
<br><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> Carl Kugler [mailto:kugler@us.ibm.com]<b><br>
Sent:</b> Wednesday, March 05, 2003 10:50 AM<b><br>
Cc:</b> ifx@pwg.org<b><br>
Subject:</b> RE: IFX> PDF/is Issue.<br>
</font>
<br><font size=2 face="sans-serif"><br>
I like the chunking approach. It is efficient, reliable, and has low overhead for reasonably sized chunks. Also fits well in a typical implementation that writes a chunk of data at a time.</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="sans-serif"><br>
-Carl</font><font size=3 face="Times New Roman"> <br>
<br>
<br>
</font>
<table width=100%>
<tr valign=top>
<td width=2%>
<td width=41%><font size=1 face="sans-serif"><b>"Zehler, Peter" <PZehler@crt.xerox.com></b></font><font size=3 face="Times New Roman"> </font><font size=1 face="sans-serif"><br>
Sent by: owner-ifx@pwg.org</font><font size=3 face="Times New Roman"> </font>
<p><font size=1 face="sans-serif">03/05/2003 05:00 AM</font><font size=3 face="Times New Roman"> </font>
<td width=55%><font size=1 face="Arial"> </font><font size=1 face="sans-serif"><br>
To: "'Rick Seeler'" <rseeler@adobe.com>, ifx@pwg.org</font><font size=3 face="Times New Roman"> </font><font size=1 face="sans-serif"><br>
cc: </font><font size=3 face="Times New Roman"> </font><font size=1 face="sans-serif"><br>
Subject: RE: IFX> PDF/is Issue.</font><font size=3 face="Times New Roman"> </font></table>
<br><font size=3 face="Times New Roman"><br>
<br>
</font><font size=2 color=blue face="Arial"><br>
Rick,</font><font size=3 face="Times New Roman"> </font><font size=2 color=blue face="Arial"><br>
Why not just increase the size of the length field signature? Could this be done by the addition of data or comments in the length object or by adding another object? I don't know pdf very well. I don't think we need 0% probability of confusion just a statistically insignificant chance.</font><font size=3 face="Times New Roman"> </font><font size=2 color=blue face="Arial"><br>
Pete</font><font size=3 face="Times New Roman"> <br>
</font>
<p><font size=3 face="Impact">Peter Zehler</font><font size=3 face="Times New Roman"> </font><font size=3 color=red face="Times New Roman"><br>
XEROX</font><font size=3 face="Times New Roman"> </font><font size=2 face="Tahoma"><br>
Xerox Architecture Center</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Email: PZehler@crt.xerox.com</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Voice: (585) 265-8755</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
FAX: (585) 265-8871 <br>
US Mail: Peter Zehler</font><font size=3 face="Times New Roman"> </font>
<p><font size=2 face="Arial"> Xerox Corp.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
800 Phillips Rd.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
M/S 128-30E</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Webster NY, 14580-9701</font><font size=3 face="Times New Roman"> </font>
<p><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> Rick Seeler [mailto:rseeler@adobe.com]<b><br>
Sent:</b> Tuesday, March 04, 2003 1:29 PM<b><br>
To:</b> ifx@pwg.org<b><br>
Subject:</b> IFX> PDF/is Issue.</font><font size=3 face="Times New Roman"><br>
</font><font size=2 face="Arial"><br>
During prototyping of PDF/is the following problem arose:</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
How does the Consumer know when the end of a data stream (See section 3.2.7 of [pdf]) is reached? Normally, in a PDF, the Consumer would consult the stream length field. The problem here is where to put the length field. If the length were placed before the stream, the Consumer would know how long the stream is. This requires the Producer to know the stream's length before writing it to the Consumer. If, instead, the length were written at the end of the stream, this would solve the Producer's problem but the Consumer would not know how to find the length since they can't identify, 100% of the time, where the stream ends and where the length object is.</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
An example will illustrate:</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
First, the normal case...</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
stream</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
84trhdvfyu7wgf4.nbdrgur4uaru4gb</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
endstream</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
12 0 obj</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
3456 <- the length of the previous stream.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
endobj</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
But, what if the data looked like this...</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
stream</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
endstream <- the binary data could have a string of bytes that looked like this.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
84trhdvfyu7wgf4.nbdrgur4uaru4gb</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
endstream</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
12 0 obj</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
4567 <- the length of the previous stream.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
endobj</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Of course, you could look to bytes after the appearance of the word 'endstream' to see if this is really the end of the stream; but you can always come up with a stream that could match your parsing algorithm's expectations (although with decreasing percentage of occurrence).</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
Possible solutions:</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
1) Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]). This will increase stream lengths by 25%. ASCII85 has a stream delimiter which would solve this problem -- the end of the stream can be known for certain and the length field can be placed after the stream.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
2) Require the Producer to write the stream length before any stream (the streams would stay binary). The Producer can use banding to break up large images into small enough chunks so the Producer can cache the stream before sending.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
3) Offer a combination of 1 & 2. The Producer would cache streams if possible, but may use ASCII85, if necessary.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
4) Producer must make certain all streams must not contain a series of bytes "\0D\0Aendstream" in the stream data. This is how the spec is defined currently -- but this may be too onerous for the Producer.</font><font size=3 face="Times New Roman"> </font>
<br><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Any other ideas? I'm personally leaning toward solution #3.</font><font size=3 face="Times New Roman"> <br>
</font>
<p><font size=2 face="Times New Roman">-Rick</font><font size=3 face="Times New Roman"> </font>
<p>
<p>