attachment
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<META content="MSHTML 5.50.4919.2200" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=265555414-06032003>In my
opinion the goal should be to write the stream length immediately to the stream
dictionary. </SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003></SPAN></FONT><FONT face=Arial color=#0000ff
size=2><SPAN class=265555414-06032003>Also, the likelihood of "endofstream" to
exists in the data is small. We could also require that if a low resource
streaming writer is not able to add the length directly into the stream
directory, then the PDF object for the length MUST immediately follow the stream
object. This way, the reader can scan for "endofstream" (but of course only if
the length was not in the stream dictionary) and make sure that it is the
correct "endofstream" by verifying that it is immediately followed by something
that looks like a length object. Could reader implementers comment on
this?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=265555414-06032003>I
think introducing an additional filter like ASCII85 just for spotting the end of
stream adds unnecessary complexity to both writer and reader, increases file
sizes and also requires more memory and processing as the stream cannot be
passed directly to a decompressor.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003> --- Kari ---</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Carl Kugler
[mailto:kugler@us.ibm.com]<BR><B>Sent:</B> Wednesday, March 05, 2003 10:50
AM<BR><B>Cc:</B> ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV><BR><FONT face=sans-serif size=2>I like the
chunking approach. It is efficient, reliable, and has low overhead for
reasonably sized chunks. Also fits well in a typical implementation that
writes a chunk of data at a time.</FONT> <BR><BR><FONT face=sans-serif
size=2> -Carl</FONT> <BR><BR><BR><BR>
<TABLE width="100%">
<TBODY>
<TR vAlign=top>
<TD>
<TD><FONT face=sans-serif size=1><B>"Zehler, Peter"
<PZehler@crt.xerox.com></B></FONT> <BR><FONT face=sans-serif
size=1>Sent by: owner-ifx@pwg.org</FONT>
<P><FONT face=sans-serif size=1>03/05/2003 05:00 AM</FONT> <BR></P>
<TD><FONT face=Arial size=1> </FONT><BR><FONT
face=sans-serif size=1> To:
"'Rick Seeler'" <rseeler@adobe.com>,
ifx@pwg.org</FONT> <BR><FONT face=sans-serif size=1>
cc: </FONT> <BR><FONT face=sans-serif
size=1> Subject:
RE: IFX> PDF/is Issue.</FONT>
<BR></TR></TBODY></TABLE><BR><BR><BR><FONT face=Arial color=blue
size=2>Rick,</FONT> <BR><FONT face=Arial color=blue size=2>Why not just
increase the size of the length field signature? Could this be done by
the addition of data or comments in the length object or by adding another
object? I don't know pdf very well. I don't think we need 0%
probability of confusion just a statistically insignificant chance.</FONT>
<BR><FONT face=Arial color=blue size=2>Pete</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT>
<P><FONT face=Impact size=3>Peter Zehler</FONT><FONT face="Times New Roman"
size=3> </FONT><FONT face="Times New Roman" color=red
size=3><BR>XEROX</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Tahoma size=2><BR>Xerox Architecture Center</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial size=2><BR>Email:
PZehler@crt.xerox.com</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Arial size=2><BR>Voice: (585) 265-8755</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial size=2><BR>FAX:
(585) 265-8871 <BR>US Mail: Peter Zehler</FONT><FONT
face="Times New Roman" size=3> </FONT>
<P><FONT face=Arial size=2> Xerox Corp.</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial size=2><BR>
800 Phillips Rd.</FONT><FONT face="Times New Roman"
size=3> </FONT><FONT face=Arial size=2><BR> M/S
128-30E</FONT><FONT face="Times New Roman" size=3> </FONT><FONT face=Arial
size=2><BR> Webster NY, 14580-9701</FONT><FONT
face="Times New Roman" size=3> </FONT>
<P><FONT face=Tahoma size=2>-----Original Message-----<B><BR>From:</B> Rick
Seeler [mailto:rseeler@adobe.com]<B><BR>Sent:</B> Tuesday, March 04, 2003 1:29
PM<B><BR>To:</B> ifx@pwg.org<B><BR>Subject:</B> IFX> PDF/is
Issue.<BR></FONT><BR><FONT face=Arial size=2>During prototyping of PDF/is the
following problem arose:</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT> <BR><FONT face=Arial size=2>How does the Consumer know
when the end of a data stream (See section 3.2.7 of [pdf]) is reached?
Normally, in a PDF, the Consumer would consult the stream length field.
The problem here is where to put the length field. If the length
were placed before the stream, the Consumer would know how long the stream is.
This requires the Producer to know the stream's length before writing it to
the Consumer. If, instead, the length were written at the end of the
stream, this would solve the Producer's problem but the Consumer would not
know how to find the length since they can't identify, 100% of the time, where
the stream ends and where the length object is.</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial size=2>An
example will illustrate:</FONT> <BR><FONT face=Arial size=2>First, the normal
case...</FONT> <BR><FONT face="Times New Roman" size=3> </FONT> <BR><FONT
face=Arial size=2>stream</FONT> <BR><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
here)....</FONT> <BR><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT> <BR><FONT face=Arial
size=2>endstream</FONT> <BR><FONT face=Arial size=2>12 0 obj</FONT> <BR><FONT
face=Arial size=2>3456 <- the length of the previous
stream.</FONT> <BR><FONT face=Arial size=2>endobj</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial size=2>But,
what if the data looked like this...</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT> <BR><FONT face=Arial size=2>stream</FONT> <BR><FONT
face=Arial size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary
data here)....</FONT> <BR><FONT face=Arial size=2>endstream
<- the binary data could have a string of bytes
that looked like this.</FONT> <BR><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT> <BR><FONT face=Arial
size=2>endstream</FONT> <BR><FONT face=Arial size=2>12 0 obj</FONT> <BR><FONT
face=Arial size=2>4567 <- the length of the previous
stream.</FONT> <BR><FONT face=Arial size=2>endobj</FONT> <BR><FONT face=Arial
size=2> </FONT> <BR><FONT face=Arial size=2>Of course, you could look to
bytes after the appearance of the word 'endstream' to see if this is really
the end of the stream; but you can always come up with a stream that could
match your parsing algorithm's expectations (although with decreasing
percentage of occurrence).</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT> <BR><FONT face=Arial size=2>Possible solutions:</FONT>
<BR><FONT face=Arial size=2>1) Write all data using ASCII85 encoding (See
Section 3.3.2 of [pdf]). This will increase stream lengths by 25%.
ASCII85 has a stream delimiter which would solve this problem -- the end
of the stream can be known for certain and the length field can be placed
after the stream.</FONT> <BR><FONT face=Arial size=2>2) Require the Producer
to write the stream length before any stream (the streams would stay binary).
The Producer can use banding to break up large images into small enough
chunks so the Producer can cache the stream before sending.</FONT> <BR><FONT
face=Arial size=2>3) Offer a combination of 1 & 2. The Producer
would cache streams if possible, but may use ASCII85, if necessary.</FONT>
<BR><FONT face=Arial size=2>4) Producer must make certain all streams must not
contain a series of bytes "\0D\0Aendstream" in the stream data. This is
how the spec is defined currently -- but this may be too onerous for the
Producer.</FONT> <BR><FONT face="Times New Roman" size=3> </FONT>
<BR><FONT face=Arial size=2>Any other ideas? I'm personally leaning
toward solution #3.</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT>
<P><FONT face="Times New Roman" size=2>-Rick</FONT>
<P><FONT face="Times New Roman" size=3></FONT>
<P>
<P></P></BLOCKQUOTE></BODY></HTML>