attachment-0001
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1141" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2>Kari,</FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff size=2>Thanks
for the explanation. That helps a lot for those of us not very familiar
with PDF.</FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff size=2>Just
to check our objectives for PDF/is:</FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff size=2>We
want to make sure that existing PDF readers can read PDF/is without any
modification, right? </FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff size=2>In
other words, the PDF/is specification is a subset of the full PDF
spefication.</FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2>However, PDF/is writers will most likely be new or modified PDF writers,
right?</FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff size=2>In
other words, since PDF/is is a subset of PDF, the PDF/is writer has to make sure
it doesn't emit those features or representations of PDF that are outside the
PDF/is subset when creating a conforming PDF/is file.</FONT></SPAN></DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=271484416-14032003><FONT face=Arial color=#0000ff
size=2>Tom</FONT></SPAN></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Poysa, Kari <BR><B>Sent:</B>
Wednesday, March 12, 2003 06:04<BR><B>To:</B> Hastings, Tom N; 'Rick Seeler';
'Carl Kugler'<BR><B>Cc:</B> ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=250572413-12032003>Tom,
The Length being discussed here actually is the byte count of the streams
of Image XObjects that belong to the Page. So if the Page is
comprised of more than one image (a.k.a banding), then the sender does not
need to cache even a full page's worth of compressed data in order to be able
to write the Image XObject's stream length in the stream
dictionary.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=250572413-12032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=250572413-12032003>Full
PDF allows the writer to enter an indirect object reference into the required
Length entry. This makes it easy to implement writers because the separate
object for the length can be written after all of the image data has been
written. The PDF files are then read in the reverse order starting from the
end of the file. This works well if one has a file system to store the
complete PDF file. So requiring the Length to be a direct value in the
stream dictionary most likely would cause existing writer SW to have to be
modified. One could not keep writing the same kind of files and claim
them PDF/is compliant.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=250572413-12032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=250572413-12032003> --- Kari ---</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Hastings, Tom N
<BR><B>Sent:</B> Tuesday, March 11, 2003 5:49 PM<BR><B>To:</B> Poysa, Kari;
'Rick Seeler'; 'Carl Kugler'<BR><B>Cc:</B> ifx@pwg.org<BR><B>Subject:</B>
RE: IFX> PDF/is Issue.<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003>Kari,</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=375564522-11032003>I
think you summed up the argument about tradeoff simply between the Sender
and the Receiver when you said:</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003>"If we require the reader to be able to cache a
page's worth of uncompressed data, surely we can require the writer to cache
a page's worth of compressed data [in order to determine the length and send
that length in the stream]."</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=375564522-11032003>I
assume that PDF has the notion of a length for each page, right? So we
require that the Sender put in a length field for each page of data at the
front of each page of data. Can that length field be sent with the
data in some manner, so that the Sender doesn't have to know the lengths of
all of the pages before sending any?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=375564522-11032003>Tom</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Poysa, Kari
[mailto:Kari.Poysa@usa.xerox.com]<BR><B>Sent:</B> Friday, March 07, 2003
15:04<BR><B>To:</B> 'Rick Seeler'; 'Carl Kugler'<BR><B>Cc:</B>
ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003>Rick, I bet this solution can be implemented, but
it does have some problems for the reader that unfortunately I did not see
earlier. The difficulty really is whether we want to make life easy for
the streaming writer or the reader. </SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003>If the length follows the image stream, the
reader must scan the filtered stream to find the end of the stream. This
can make the reader implementation both cumbersome and slow, especially if
the stream has to be fully decoded during the PDF file parsing, instead of
simply extracting the correct amount of binary data and passing it to a
separate decompression module. The PDF file parser would have to know
details of the compressed streams which should really be of no interest to
the PDF file parser module and makes creating applications from 3rd party
components harder.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003>In addition, if the reader attempts to decode the
stream, how much data should be cached and decoded at a time? If the end
of stream is not found at first attempt, one has to pass additional data
to the decoder and continue decoding from where previous data ended. This
can delay achieving robust implementations. The alternative, searching for
the "endstream" text, is not 100% reliable (although very close) and is a
wasted step since no decompression is achieved yet.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003>This issue is really at the heart of what
"streamable" means, and also has a big impact on what kind of low resource
applications PDF/is can be used for. I think we should consider it a
"MUST" for the writer to prefix the stream with its length, since the goal
is to make the file format streamable especially at a low resource reader.
If we require the reader to be able to cache a page's worth of
uncompressed data, surely we can require the writer to cache a page's
worth of compressed data.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003>I do understand Ira McDonalds note about
streaming writers (see separate Email). Possibly this issue whether to
prefix or postfix image streams with their lengths should be a negotiable
capability between the sender and receiver?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=796070222-07032003> --- Kari
---</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Rick Seeler
[mailto:rseeler@adobe.com]<BR><B>Sent:</B> Thursday, March 06, 2003 2:37
PM<BR><B>To:</B> 'Poysa, Kari'; 'Carl Kugler'<BR><B>Cc:</B>
ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>Kari,</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>Yes, the stream length should precede the stream, if possible
(this is allowed). But, in the case where the stream may be long,
this may not be possible for the Producer. In that case, the
length should be an indirect object reference to the length that should
come immediately after the stream.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>As for your idea of scanning for "endstream" that's followed by
the size object. This still has the same problem as scanning for
"endstream" but just has more data and a smaller likelihood of
occurrence.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>Given that, and what I discussed in my previous e-mail on this
subject (to Rob Buckley), I think the best approach might be
to:</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>1) The Producer MUST always write the stream length of all
'Content Streams' and 'ICC Profile' streams immediately in the object
dictionary (before the stream).</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>2) When writing image streams, the Producer MAY
either write the stream length before or after the stream, as they
prefer.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>3) When an image stream is length succeeded (indirect object),
the Consumer SHOULD decode image streams to determine the stream
length, when possible. But, the Consumer MAY (at
their peril) scan for the 'endstream' marker.</FONT></SPAN></DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=010362619-06032003><FONT face=Arial color=#0000ff
size=2>How does this sound as a solution?</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV> </DIV><!-- Converted from text/plain format -->
<P><FONT size=2>-Rick<BR></FONT></P>
<BLOCKQUOTE dir=ltr
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">
<DIV></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left><FONT
face=Tahoma size=2>-----Original Message-----<BR><B>From:</B>
owner-ifx@pwg.org [mailto:owner-ifx@pwg.org] <B>On Behalf Of
</B>Poysa, Kari<BR><B>Sent:</B> Thursday, March 06, 2003 7:15
AM<BR><B>To:</B> 'Carl Kugler'<BR><B>Cc:</B>
ifx@pwg.org<BR><B>Subject:</B> RE: IFX> PDF/is
Issue.<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003>In my opinion the goal should be to write the
stream length immediately to the stream dictionary.
</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003></SPAN></FONT><FONT face=Arial color=#0000ff
size=2><SPAN class=265555414-06032003>Also, the likelihood of
"endofstream" to exists in the data is small. We could also
require that if a low resource streaming writer is not able to add the
length directly into the stream directory, then the PDF object for the
length MUST immediately follow the stream object. This way, the reader
can scan for "endofstream" (but of course only if the length was not
in the stream dictionary) and make sure that it is the correct
"endofstream" by verifying that it is immediately followed by
something that looks like a length object. Could reader implementers
comment on this?</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003>I think introducing an additional filter like
ASCII85 just for spotting the end of stream adds unnecessary
complexity to both writer and reader, increases file sizes and also
requires more memory and processing as the stream cannot be passed
directly to a decompressor.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=265555414-06032003> --- Kari
---</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
size=2>-----Original Message-----<BR><B>From:</B> Carl Kugler
[mailto:kugler@us.ibm.com]<BR><B>Sent:</B> Wednesday, March 05, 2003
10:50 AM<BR><B>Cc:</B> ifx@pwg.org<BR><B>Subject:</B> RE: IFX>
PDF/is Issue.<BR><BR></FONT></DIV><BR><FONT face=sans-serif size=2>I
like the chunking approach. It is efficient, reliable, and has
low overhead for reasonably sized chunks. Also fits well in a
typical implementation that writes a chunk of data at a time.</FONT>
<BR><BR><FONT face=sans-serif size=2>
-Carl</FONT> <BR><BR><BR><BR>
<TABLE width="100%">
<TBODY>
<TR vAlign=top>
<TD>
<TD><FONT face=sans-serif size=1><B>"Zehler, Peter"
<PZehler@crt.xerox.com></B></FONT> <BR><FONT
face=sans-serif size=1>Sent by: owner-ifx@pwg.org</FONT>
<P><FONT face=sans-serif size=1>03/05/2003 05:00 AM</FONT>
<BR></P>
<TD><FONT face=Arial size=1>
</FONT><BR><FONT face=sans-serif size=1>
To: "'Rick Seeler'"
<rseeler@adobe.com>, ifx@pwg.org</FONT> <BR><FONT
face=sans-serif size=1> cc:
</FONT> <BR><FONT face=sans-serif
size=1> Subject:
RE: IFX> PDF/is Issue.</FONT>
<BR></TD></TR></TBODY></TABLE><BR><BR><BR><FONT face=Arial
color=blue size=2>Rick,</FONT> <BR><FONT face=Arial color=blue
size=2>Why not just increase the size of the length field signature?
Could this be done by the addition of data or comments in the
length object or by adding another object? I don't know pdf
very well. I don't think we need 0% probability of confusion
just a statistically insignificant chance.</FONT> <BR><FONT
face=Arial color=blue size=2>Pete</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT>
<P><FONT face=Impact size=3>Peter Zehler</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face="Times New Roman"
color=red size=3><BR>XEROX</FONT><FONT face="Times New Roman"
size=3> </FONT><FONT face=Tahoma size=2><BR>Xerox Architecture
Center</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Arial size=2><BR>Email: PZehler@crt.xerox.com</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial
size=2><BR>Voice: (585) 265-8755</FONT><FONT
face="Times New Roman" size=3> </FONT><FONT face=Arial
size=2><BR>FAX: (585) 265-8871 <BR>US Mail:
Peter Zehler</FONT><FONT face="Times New Roman" size=3> </FONT>
<P><FONT face=Arial size=2> Xerox
Corp.</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Arial size=2><BR> 800 Phillips
Rd.</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Arial size=2><BR> M/S
128-30E</FONT><FONT face="Times New Roman" size=3> </FONT><FONT
face=Arial size=2><BR> Webster NY,
14580-9701</FONT><FONT face="Times New Roman" size=3> </FONT>
<P><FONT face=Tahoma size=2>-----Original
Message-----<B><BR>From:</B> Rick Seeler
[mailto:rseeler@adobe.com]<B><BR>Sent:</B> Tuesday, March 04, 2003
1:29 PM<B><BR>To:</B> ifx@pwg.org<B><BR>Subject:</B> IFX> PDF/is
Issue.<BR></FONT><BR><FONT face=Arial size=2>During prototyping of
PDF/is the following problem arose:</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>How does the Consumer know when the end of a data stream (See
section 3.2.7 of [pdf]) is reached? Normally, in a PDF, the
Consumer would consult the stream length field. The problem
here is where to put the length field. If the length were
placed before the stream, the Consumer would know how long the
stream is. This requires the Producer to know the stream's length
before writing it to the Consumer. If, instead, the length
were written at the end of the stream, this would solve the
Producer's problem but the Consumer would not know how to find the
length since they can't identify, 100% of the time, where the stream
ends and where the length object is.</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>An example will illustrate:</FONT> <BR><FONT face=Arial
size=2>First, the normal case...</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>stream</FONT> <BR><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary
data here)....</FONT> <BR><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT> <BR><FONT face=Arial
size=2>endstream</FONT> <BR><FONT face=Arial size=2>12 0 obj</FONT>
<BR><FONT face=Arial size=2>3456 <- the length of
the previous stream.</FONT> <BR><FONT face=Arial
size=2>endobj</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT> <BR><FONT face=Arial size=2>But, what if the
data looked like this...</FONT> <BR><FONT face="Times New Roman"
size=3> </FONT> <BR><FONT face=Arial size=2>stream</FONT>
<BR><FONT face=Arial
size=2>sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary
data here)....</FONT> <BR><FONT face=Arial size=2>endstream
<- the binary data could have a
string of bytes that looked like this.</FONT> <BR><FONT face=Arial
size=2>84trhdvfyu7wgf4.nbdrgur4uaru4gb</FONT> <BR><FONT face=Arial
size=2>endstream</FONT> <BR><FONT face=Arial size=2>12 0 obj</FONT>
<BR><FONT face=Arial size=2>4567 <- the length of
the previous stream.</FONT> <BR><FONT face=Arial
size=2>endobj</FONT> <BR><FONT face=Arial size=2> </FONT>
<BR><FONT face=Arial size=2>Of course, you could look to bytes after
the appearance of the word 'endstream' to see if this is really the
end of the stream; but you can always come up with a stream that
could match your parsing algorithm's expectations (although with
decreasing percentage of occurrence).</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>Possible solutions:</FONT> <BR><FONT face=Arial size=2>1)
Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]).
This will increase stream lengths by 25%. ASCII85 has a
stream delimiter which would solve this problem -- the end of the
stream can be known for certain and the length field can be placed
after the stream.</FONT> <BR><FONT face=Arial size=2>2) Require the
Producer to write the stream length before any stream (the streams
would stay binary). The Producer can use banding to break up
large images into small enough chunks so the Producer can cache the
stream before sending.</FONT> <BR><FONT face=Arial size=2>3) Offer a
combination of 1 & 2. The Producer would cache streams if
possible, but may use ASCII85, if necessary.</FONT> <BR><FONT
face=Arial size=2>4) Producer must make certain all streams must not
contain a series of bytes "\0D\0Aendstream" in the stream data.
This is how the spec is defined currently -- but this may be
too onerous for the Producer.</FONT> <BR><FONT
face="Times New Roman" size=3> </FONT> <BR><FONT face=Arial
size=2>Any other ideas? I'm personally leaning toward solution
#3.</FONT> <BR><FONT face="Times New Roman" size=3> </FONT>
<P><FONT face="Times New Roman" size=2>-Rick</FONT>
<P><FONT face="Times New Roman" size=3></FONT>
<P>
<P></P></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>