attachment-0001
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1141" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>During prototyping
of PDF/is the following problem arose:</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>How does the
Consumer know when the end of a data stream (See section 3.2.7 of [pdf]) is
reached? Normally, in a PDF, the Consumer would consult the stream length
field. The problem here is where to put the length field. If the
length were placed before the stream, the Consumer would know how long the
stream is. This requires the Producer to know the stream's length before
writing it to the Consumer. If, instead, the length were written at the
end of the stream, this would solve the Producer's problem but the Consumer
would not know how to find the length since they can't identify, 100% of the
time, where the stream ends and where the length object is.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>An example will
illustrate:</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>First, the normal
case...</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>stream</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
here)....</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>endstream</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>12 0
obj</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>3456 <- the length of the previous
stream.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>endobj</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>But, what if the
data looked like this...</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>stream</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
here)....</FONT></SPAN></DIV>
<DIV><SPAN
class=358045117-04032003>endstream
<- the binary data could have a string of bytes that looked like
this.</SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>endstream</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>12 0
obj</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>4567 <- the length of the previous
stream.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2>endobj</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003>Of course, you could look to bytes after the
appearance of the word 'endstream' to see if this is really the end of the
stream; but you can always come up with a stream that could match your parsing
algorithm's expectations (although with decreasing percentage of
occurrence).</SPAN></DIV></FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>Possible
solutions:</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>1) Write all data
using ASCII85 encoding (See Section 3.3.2 of [pdf]). This will increase
stream lengths by 25%. ASCII85 has a stream delimiter which would solve
this problem -- the end of the stream can be known for certain and the length
field can be placed after the stream.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>2) Require the
Producer to write the stream length before any stream (the streams would stay
binary). The Producer can use banding to break up large images into small
enough chunks so the Producer can cache the stream before
sending.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>3) Offer a
combination of 1 & 2. The Producer would cache streams if possible,
but may use ASCII85, if necessary.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>4) Producer must
make certain all streams must not contain a series of bytes
"\0D\0Aendstream" in the stream data. This is how the spec is defined
currently -- but this may be too onerous for the Producer.</FONT></SPAN></DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=358045117-04032003><FONT face=Arial size=2>Any other
ideas? I'm personally leaning toward solution #3.</FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV><!-- Converted from text/plain format -->
<P><FONT size=2>-Rick<BR></FONT></P>
<DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>