IFX Mail Archive: RE: IFX> PDF/is Issue.

RE: IFX> PDF/is Issue.

From: Buckley, Robert R (RBuckley@crt.xerox.com)
Date: Thu Mar 06 2003 - 10:20:42 EST

Next message: HALL,DAVID (HP-Vancouver,ex1): "IFX> Today's IFX webEx"

Previous message: Poysa, Kari: "RE: IFX> PDF/is Issue."
Maybe in reply to: Rick Seeler: "IFX> PDF/is Issue."
Reply: Rick Seeler: "RE: IFX> PDF/is Issue."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Rick et al.,

I would go with #2. In fact in our prototyping, we include the codestream
length explicitly in the image object, rather than as an indirect object
reference, to address the problem you describe. I would not support #1,
simply because it would increase file size.

Rob

-----Original Message-----
From: Rick Seeler [mailto:rseeler@adobe.com]
Sent: Tuesday, March 04, 2003 1:29 PM
To: ifx@pwg.org
Subject: IFX> PDF/is Issue.

During prototyping of PDF/is the following problem arose:

How does the Consumer know when the end of a data stream (See section 3.2.7
of [pdf]) is reached? Normally, in a PDF, the Consumer would consult the
stream length field. The problem here is where to put the length field. If
the length were placed before the stream, the Consumer would know how long
the stream is. This requires the Producer to know the stream's length before
writing it to the Consumer. If, instead, the length were written at the end
of the stream, this would solve the Producer's problem but the Consumer
would not know how to find the length since they can't identify, 100% of the
time, where the stream ends and where the length object is.

An example will illustrate:
First, the normal case...

stream
sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....
84trhdvfyu7wgf4.nbdrgur4uaru4gb
endstream
12 0 obj
3456 <- the length of the previous stream.
endobj

But, what if the data looked like this...

stream
sdljfiwefnwfubrevurewliysnhr;hgawebfz;h;uwre (lots of binary data here)....
endstream <- the binary data could have a string of bytes that
looked like this.
84trhdvfyu7wgf4.nbdrgur4uaru4gb
endstream
12 0 obj
4567 <- the length of the previous stream.
endobj

Of course, you could look to bytes after the appearance of the word
'endstream' to see if this is really the end of the stream; but you can
always come up with a stream that could match your parsing algorithm's
expectations (although with decreasing percentage of occurrence).

Possible solutions:
1) Write all data using ASCII85 encoding (See Section 3.3.2 of [pdf]). This
will increase stream lengths by 25%. ASCII85 has a stream delimiter which
would solve this problem -- the end of the stream can be known for certain
and the length field can be placed after the stream.
2) Require the Producer to write the stream length before any stream (the
streams would stay binary). The Producer can use banding to break up large
images into small enough chunks so the Producer can cache the stream before
sending.
3) Offer a combination of 1 & 2. The Producer would cache streams if
possible, but may use ASCII85, if necessary.
4) Producer must make certain all streams must not contain a series of bytes
"\0D\0Aendstream" in the stream data. This is how the spec is defined
currently -- but this may be too onerous for the Producer.

Any other ideas? I'm personally leaning toward solution #3.

-Rick

Next message: HALL,DAVID (HP-Vancouver,ex1): "IFX> Today's IFX webEx"
Previous message: Poysa, Kari: "RE: IFX> PDF/is Issue."
Maybe in reply to: Rick Seeler: "IFX> PDF/is Issue."
Reply: Rick Seeler: "RE: IFX> PDF/is Issue."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Mar 06 2003 - 10:21:24 EST