attachment-0001
<br><font size=2 face="sans-serif">I like the chunking approach. It is efficient, reliable, and has low overhead for reasonably sized chunks. Also fits well in a typical implementation that writes a chunk of data at a time.</font>
<br>
<br><font size=2 face="sans-serif"> -Carl</font>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>"Zehler, Peter" <PZehler@crt.xerox.com></b></font>
<br><font size=1 face="sans-serif">Sent by: owner-ifx@pwg.org</font>
<p><font size=1 face="sans-serif">03/05/2003 05:00 AM</font>
<br>
<td><font size=1 face="Arial"> </font>
<br><font size=1 face="sans-serif"> To: "'Rick Seeler'" <rseeler@adobe.com>, ifx@pwg.org</font>
<br><font size=1 face="sans-serif"> cc: </font>
<br><font size=1 face="sans-serif"> Subject: RE: IFX> PDF/is Issue.</font>
<br></table>
<br>
<br>
<br><font size=2 color=blue face="Arial">Rick,</font>
<br><font size=2 color=blue face="Arial">Why not just increase the size of the length field signature? Could this be done by the addition of data or comments in the length object or by adding another object? I don't know pdf very well. I don't think we need 0% probability of confusion just a statistically insignificant chance.</font>
<br><font size=2 color=blue face="Arial">Pete</font>
<br><font size=3 face="Times New Roman"> </font>
<p><font size=3 face="Impact">Peter Zehler</font><font size=3 face="Times New Roman"> </font><font size=3 color=red face="Times New Roman"><br>
XEROX</font><font size=3 face="Times New Roman"> </font><font size=2 face="Tahoma"><br>
Xerox Architecture Center</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Email: PZehler@crt.xerox.com</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Voice: (585) 265-8755</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
FAX: (585) 265-8871 <br>
US Mail: Peter Zehler</font><font size=3 face="Times New Roman"> </font>
<p><font size=2 face="Arial"> Xerox Corp.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
800 Phillips Rd.</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
M/S 128-30E</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
Webster NY, 14580-9701</font><font size=3 face="Times New Roman"> </font>
<p><font size=2 face="Tahoma">-----Original Message-----<b><br>
From:</b> Rick Seeler [mailto:rseeler@adobe.com]<b><br>
Sent:</b> Tuesday, March 04, 2003 1:29 PM<b><br>
To:</b> ifx@pwg.org<b><br>
Subject:</b> IFX> PDF/is Issue.<br>
</font>
<br><font size=2 face="Arial">During prototyping of PDF/is the following problem arose:</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">How does the Consumer know when the end of a data stream (See section 3.2.7 of [pdf]) is reached? Normally, in a PDF, the Consumer would consult the stream length field. The problem here is where to put the length field. If the length were placed before the stream, the Consumer would know how long the stream is. This requires the Producer to know the stream's length before writing it to the Consumer. If, instead, the length were written at the end of the stream, this would solve the Producer's problem but the Consumer would not know how to find the length since they can't identify, 100% of the time, where the stream ends and where the length object is.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">An example will illustrate:</font>
<br><font size=2 face="Arial">First, the normal case...</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">stream</font>
<br><font size=2 face="Arial">sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....</font>
<br><font size=2 face="Arial">84trhdvfyu7wgf4.nbdrgur4uaru4gb</font>
<br><font size=2 face="Arial">endstream</font>
<br><font size=2 face="Arial">12 0 obj</font>
<br><font size=2 face="Arial">3456 <- the length of the previous stream.</font>
<br><font size=2 face="Arial">endobj</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">But, what if the data looked like this...</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">stream</font>
<br><font size=2 face="Arial">sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....</font>
<br><font size=2 face="Arial">endstream <- the binary data could have a string of bytes that looked like this.</font>
<br><font size=2 face="Arial">84trhdvfyu7wgf4.nbdrgur4uaru4gb</font>
<br><font size=2 face="Arial">endstream</font>
<br><font size=2 face="Arial">12 0 obj</font>
<br><font size=2 face="Arial">4567 <- the length of the previous stream.</font>
<br><font size=2 face="Arial">endobj</font>
<br><font size=2 face="Arial"> </font>
<br><font size=2 face="Arial">Of course, you could look to bytes after the appearance of the word 'endstream' to see if this is really the end of the stream; but you can always come up with a stream that could match your parsing algorithm's expectations (although with decreasing percentage of occurrence).</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">Possible solutions:</font>
<br><font size=2 face="Arial">1) Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]). This will increase stream lengths by 25%. ASCII85 has a stream delimiter which would solve this problem -- the end of the stream can be known for certain and the length field can be placed after the stream.</font>
<br><font size=2 face="Arial">2) Require the Producer to write the stream length before any stream (the streams would stay binary). The Producer can use banding to break up large images into small enough chunks so the Producer can cache the stream before sending.</font>
<br><font size=2 face="Arial">3) Offer a combination of 1 & 2. The Producer would cache streams if possible, but may use ASCII85, if necessary.</font>
<br><font size=2 face="Arial">4) Producer must make certain all streams must not contain a series of bytes "\0D\0Aendstream" in the stream data. This is how the spec is defined currently -- but this may be too onerous for the Producer.</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial">Any other ideas? I'm personally leaning toward solution #3.</font>
<br><font size=3 face="Times New Roman"> </font>
<p><font size=2 face="Times New Roman">-Rick</font>
<p><font size=3 face="Times New Roman"> </font>
<p>
<p>