attachment-0001
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1106" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003>Kari,</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=375564522-11032003>I
think you summed up the argument about tradeoff simply between the Sender and
the Receiver when you said:</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=375564522-11032003>"If we
require the reader to be able to cache a page's worth of uncompressed data,
surely we can require the writer to cache a page's worth of compressed data [in
order to determine the length and send that length in the
stream]."</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=375564522-11032003>I
assume that PDF has the notion of a length for each page, right? So we
require that the Sender put in a length field for each page of data at the front
of each page of data. Can that length field be sent with the data in some
manner, so that the Sender doesn't have to know the lengths of all of the pages
before sending any?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003>Tom</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Poysa, Kari
[mailto:Kari.Poysa@usa.xerox.com]<BR><B>Sent:</B> Friday, March 07, 2003
15:04<BR><B>To:</B> 'Rick Seeler'; 'Carl Kugler'<BR><B>Cc:</B>
ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is Issue.<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003>Rick, I bet this solution can be implemented, but it
does have some problems for the reader that unfortunately I did not see
earlier. The difficulty really is whether we want to make life easy for the
streaming writer or the reader. </SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=796070222-07032003>If
the length follows the image stream, the reader must scan the filtered stream
to find the end of the stream. This can make the reader implementation both
cumbersome and slow, especially if the stream has to be fully decoded during
the PDF file parsing, instead of simply extracting the correct amount of
binary data and passing it to a separate decompression module. The PDF file
parser would have to know details of the compressed streams which should
really be of no interest to the PDF file parser module and makes creating
applications from 3rd party components harder.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=796070222-07032003>In
addition, if the reader attempts to decode the stream, how much data should be
cached and decoded at a time? If the end of stream is not found at first
attempt, one has to pass additional data to the decoder and continue decoding
from where previous data ended. This can delay achieving robust
implementations. The alternative, searching for the "endstream" text, is not
100% reliable (although very close) and is a wasted step since no
decompression is achieved yet.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=796070222-07032003>This
issue is really at the heart of what "streamable" means, and also has a big
impact on what kind of low resource applications PDF/is can be used for. I
think we should consider it a "MUST" for the writer to prefix the stream with
its length, since the goal is to make the file format streamable especially at
a low resource reader. If we require the reader to be able to cache a page's
worth of uncompressed data, surely we can require the writer to cache a page's
worth of compressed data.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=796070222-07032003>I do
understand Ira McDonalds note about streaming writers (see separate Email).
Possibly this issue whether to prefix or postfix image streams with their
lengths should be a negotiable capability between the sender and
receiver?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003> --- Kari ---</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Rick Seeler
[mailto:rseeler@adobe.com]<BR><B>Sent:</B> Thursday, March 06, 2003 2:37
PM<BR><B>To:</B> 'Poysa, Kari'; 'Carl Kugler'<BR><B>Cc:</B>
ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>Kari,</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>Yes, the stream length should precede the stream, if possible (this
is allowed). But, in the case where the stream may be long, this may
not be possible for the Producer. In that case, the length should be
an indirect object reference to the length that should come immediately
after the stream.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff size=2>As
for your idea of scanning for "endstream" that's followed by the size
object. This still has the same problem as scanning for "endstream"
but just has more data and a smaller likelihood of
occurrence.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>Given that, and what I discussed in my previous e-mail on this
subject (to Rob Buckley), I think the best approach might be
to:</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff size=2>1)
The Producer MUST always write the stream length of all 'Content Streams'
and 'ICC Profile' streams immediately in the object dictionary (before the
stream).</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff size=2>2)
When writing image streams, the Producer MAY either write the
stream length before or after the stream, as they
prefer.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff size=2>3)
When an image stream is length succeeded (indirect object), the Consumer
SHOULD decode image streams to determine the stream length, when
possible. But, the Consumer MAY (at their peril) scan for
the 'endstream' marker.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>How does this sound as a solution?</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV> </DIV><!-- Converted from text/plain format -->
<P><FONT size=2>-Rick<BR></FONT></P>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left><FONT
face=Tahoma size=2>-----Original Message-----<BR><B>From:</B>
owner-ifx@pwg.org [mailto:owner-ifx@pwg.org] <B>On Behalf Of </B>Poysa,
Kari<BR><B>Sent:</B> Thursday, March 06, 2003 7:15 AM<BR><B>To:</B> 'Carl
Kugler'<BR><B>Cc:</B> ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003>In my opinion the goal should be to write the
stream length immediately to the stream dictionary. </SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003></SPAN></FONT><FONT face=Arial color=#0000ff
size=2><SPAN class=265555414-06032003>Also, the likelihood of
"endofstream" to exists in the data is small. We could also require
that if a low resource streaming writer is not able to add the length
directly into the stream directory, then the PDF object for the length
MUST immediately follow the stream object. This way, the reader can scan
for "endofstream" (but of course only if the length was not in the stream
dictionary) and make sure that it is the correct "endofstream" by
verifying that it is immediately followed by something that looks like a
length object. Could reader implementers comment on
this?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003>I think introducing an additional filter like
ASCII85 just for spotting the end of stream adds unnecessary complexity to
both writer and reader, increases file sizes and also requires more memory
and processing as the stream cannot be passed directly to a
decompressor.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003> --- Kari
---</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Carl Kugler
[mailto:kugler@us.ibm.com]<BR><B>Sent:</B> Wednesday, March 05, 2003
10:50 AM<BR><B>Cc:</B> ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV><BR><FONT face=sans-serif size=2>I like the
chunking approach. It is efficient, reliable, and has low overhead
for reasonably sized chunks. Also fits well in a typical
implementation that writes a chunk of data at a time.</FONT>
<BR><BR><FONT face=sans-serif size=2>
-Carl</FONT> <BR><BR><BR><BR>
<TABLE width="100%">
<TBODY>
<TR vAlign=top>
<TD>
<TD><FONT face=sans-serif size=1><B>"Zehler, Peter"
<PZehler@crt.xerox.com></B></FONT> <BR><FONT face=sans-serif
size=1>Sent by: owner-ifx@pwg.org</FONT>
<P><FONT face=sans-serif size=1>03/05/2003 05:00 AM</FONT>
<BR></P>
<TD><FONT face=Arial size=1>
</FONT><BR><FONT face=sans-serif size=1>
To: "'Rick Seeler'"
<rseeler@adobe.com>, ifx@pwg.org</FONT> <BR><FONT
face=sans-serif size=1> cc:
</FONT> <BR><FONT face=sans-serif
size=1> Subject:
RE: IFX> PDF/is Issue.</FONT>
<BR></TD></TR></TBODY></TABLE><BR><BR><BR><FONT face=Arial color=blue
size=2>Rick,</FONT> <BR><FONT face=Arial color=blue size=2>Why not just
increase the size of the length field signature? Could this be
done by the addition of data or comments in the length object or by
adding another object? I don't know pdf very well. I don't
think we need 0% probability of confusion just a statistically
insignificant chance.</FONT> <BR><FONT face=Arial color=blue
size=2>Pete</FONT> <BR><FONT face="Times New Roman" size=3> </FONT>
<P><FONT face=Impact size=3>Peter Zehler</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face="Times New Roman"
color=red size=3><BR>XEROX</FONT><FONT face="Times New Roman" size=3>
</FONT><FONT face=Tahoma size=2><BR>Xerox Architecture
Center</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Arial size=2><BR>Email: PZehler@crt.xerox.com</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial size=2><BR>Voice:
(585) 265-8755</FONT><FONT face="Times New Roman" size=3>
</FONT><FONT face=Arial size=2><BR>FAX: (585)
265-8871 <BR>US Mail: Peter Zehler</FONT><FONT face="Times New Roman"
size=3> </FONT>
<P><FONT face=Arial size=2> Xerox
Corp.</FONT><FONT face="Times New Roman" size=3> </FONT><FONT face=Arial
size=2><BR> 800 Phillips Rd.</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial size=2><BR>
M/S 128-30E</FONT><FONT face="Times New Roman"
size=3> </FONT><FONT face=Arial size=2><BR>
Webster NY, 14580-9701</FONT><FONT face="Times New Roman" size=3>
</FONT>
<P><FONT face=Tahoma size=2>-----Original Message-----<B><BR>From:</B>
Rick Seeler [mailto:rseeler@adobe.com]<B><BR>Sent:</B> Tuesday, March
04, 2003 1:29 PM<B><BR>To:</B> ifx@pwg.org<B><BR>Subject:</B> IFX>
PDF/is Issue.<BR></FONT><BR><FONT face=Arial size=2>During prototyping
of PDF/is the following problem arose:</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>How does the Consumer know when the end of a data stream (See
section 3.2.7 of [pdf]) is reached? Normally, in a PDF, the
Consumer would consult the stream length field. The problem here
is where to put the length field. If the length were placed before
the stream, the Consumer would know how long the stream is. This
requires the Producer to know the stream's length before writing it to
the Consumer. If, instead, the length were written at the end of
the stream, this would solve the Producer's problem but the Consumer
would not know how to find the length since they can't identify, 100% of
the time, where the stream ends and where the length object is.</FONT>
<BR><FONT face="Times New Roman" size=3> </FONT> <BR><FONT
face=Arial size=2>An example will illustrate:</FONT> <BR><FONT
face=Arial size=2>First, the normal case...</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>stream</FONT> <BR><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
here)....</FONT> <BR><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT> <BR><FONT face=Arial
size=2>endstream</FONT> <BR><FONT face=Arial size=2>12 0 obj</FONT>
<BR><FONT face=Arial size=2>3456 <- the length of the
previous stream.</FONT> <BR><FONT face=Arial size=2>endobj</FONT>
<BR><FONT face="Times New Roman" size=3> </FONT> <BR><FONT
face=Arial size=2>But, what if the data looked like this...</FONT>
<BR><FONT face="Times New Roman" size=3> </FONT> <BR><FONT
face=Arial size=2>stream</FONT> <BR><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data
here)....</FONT> <BR><FONT face=Arial size=2>endstream
<- the binary data could have a string of
bytes that looked like this.</FONT> <BR><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT> <BR><FONT face=Arial
size=2>endstream</FONT> <BR><FONT face=Arial size=2>12 0 obj</FONT>
<BR><FONT face=Arial size=2>4567 <- the length of the
previous stream.</FONT> <BR><FONT face=Arial size=2>endobj</FONT>
<BR><FONT face=Arial size=2> </FONT> <BR><FONT face=Arial size=2>Of
course, you could look to bytes after the appearance of the word
'endstream' to see if this is really the end of the stream; but you can
always come up with a stream that could match your parsing algorithm's
expectations (although with decreasing percentage of occurrence).</FONT>
<BR><FONT face="Times New Roman" size=3> </FONT> <BR><FONT
face=Arial size=2>Possible solutions:</FONT> <BR><FONT face=Arial
size=2>1) Write all data using ASCII85 encoding (See Section 3.3.2 of
[pdf]). This will increase stream lengths by 25%. ASCII85
has a stream delimiter which would solve this problem -- the end of the
stream can be known for certain and the length field can be placed after
the stream.</FONT> <BR><FONT face=Arial size=2>2) Require the Producer
to write the stream length before any stream (the streams would stay
binary). The Producer can use banding to break up large images
into small enough chunks so the Producer can cache the stream before
sending.</FONT> <BR><FONT face=Arial size=2>3) Offer a combination of 1
& 2. The Producer would cache streams if possible, but may use
ASCII85, if necessary.</FONT> <BR><FONT face=Arial size=2>4) Producer
must make certain all streams must not contain a series of bytes
"\0D\0Aendstream" in the stream data. This is how the spec is
defined currently -- but this may be too onerous for the
Producer.</FONT> <BR><FONT face="Times New Roman" size=3> </FONT>
<BR><FONT face=Arial size=2>Any other ideas? I'm personally
leaning toward solution #3.</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT>
<P><FONT face="Times New Roman" size=2>-Rick</FONT>
<P><FONT face="Times New Roman" size=3></FONT>
<P>
<P></P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>