[IPP] IPP Scan question.

Mon Sep 29 16:55:08 UTC 2014

Many thanks Pete. This has been a great help to me in understanding IPP
Scan.

Bill Wagner

From: Zehler, Peter [mailto:Peter.Zehler at xerox.com] 
Sent: Monday, September 29, 2014 7:25 AM
To: William A Wagner
Cc: ipp at pwg.org
Subject: RE: IPP Scan question.

Bill,

My responses are in-line below.

Pete

Peter Zehler

PARC, A Xerox Company
800 Phillips Rd, 128-27E
Webster NY, 14580-9701
Email:  <mailto:Peter.Zehler at Xerox.com> Peter.Zehler at Xerox.com
Office: +1 (585) 265-8755

Mobile: +1 (585) 329-9508
FAX: +1 (585) 265-7441

From: William A Wagner [mailto:wamwagner at comcast.net] 
Sent: Friday, September 26, 2014 5:19 PM
To: Zehler, Peter
Cc: ipp at pwg.org
Subject: RE: IPP Scan question.

Pete,

Thanks for the correction. A few items for clarification to see if I got it
right. I would very much appreciate either agreement with or correction of
my interpretation.

1.  Hardcopy document refers just to the material that is being scanned and
has no other correlation to Digital Document. For example, conceivably, a
set of pages from multiple books could be scanned with the image data
ultimately appearing in a single document. Alternatively, multiple pages for
a single hardcopy document could be scanned and appear as multiple Digital
Documents.

2.  <PZ>Correct</PZ>

3.  Image refers to the content in a specified scan region of a hardcopy
document. Unless the scanner is set up to combine the data from multiple
scanned images, an image cannot refer to more than the content of one sheet
side. Therefore, for example, a scan job concerned with multiple pages of a
hardcopy document will contain multiple images.

4.  <PZ>Correct</PZ>

      3. There is a 1:1 relationship between Digital Documents and files, in
that each Digital Document is formatted and stored as a separate file.

<PZ>That is true of the IPP binding for Network Scanning.  The MFD Network
Scan model permitted more than one file to be associated with a Digital
Document.  We never discussed the packaging of a multi-file Digital
Document.  We could add a "run list" that referenced all the files or we
could, as I did in an early prototype, collect all the image files for a
Digital Document into a folder and the Document Object would reference the
folder.   It will be up to the SM 3 group to determine what will be done
with the SM3 Network Scan specification.  The SM3 model can be simplified or
there could be an "IPP Production Scanning Set 1" drafted. J</PZ>

5.  In general, the relationship of images to digital documents is format
(and perhaps implementation) specific. If the 'document-format-accepted' is
a document format such as PDF, there may be multiple images per document. If
'document-format-accepted' is an image format such as JPEG or GIF, each
image is more likely a separate digital document.

6.  <PZ>That is correct, I think.  As far as I know there is no multipage
JPEG format.</PZ>

      5. In IPP Scan Push mode, all Digital Documents produced in a Job are
scanned and formatted  in the same way and stored to the same
destination(s), as specified in the CreateJob.

<PZ>Not necessarily, the client could specify multiple acceptable formats.
I see no limitation preventing a smart scan service storing content that
contains text in a pdf document and photos in jpeg files. The usual case
would be to consistently store all the Digital Documents produced in the
Job.</PZ> 

a.  If the job produces multiple digital documents, the  destination is a
directory with each Digital Document being a separate file in that
directory. 

<PZ>Correct, it would be an error to specify a file as a destination for a
multi-paged Digital Document.  The error would something like
"conflicting-attributes" or "document-access-error"</PZ>

b.  If the job produces a single document, the specified destination is of
the file.

<PZ>If it is a file that is where the Digital Document is stored.  If it is
a directory, a file would be created in that directory to hold the Digital
Document.</PZ>

7.  In IPP Scan Push mode, the client may  specify multiple destinations.

<PZ>Yes.</PZ>

8.  In IPP Scan Pull mode, each digital document produced by a job is sent
back to the client in response to a GetNextDocumentImage request from the
Client. Each document may contain data from a single image or from multiple
images. There may be multiple documents as part of a single job. 

<PZ>Yes</PZ>

a.  Unlike in Push mode, the Compression Accepted and Document Format
Accepted may be separately specified in each GetNextDocumentImage request.
(I find this rather odd - especially since each such request does not
necessarily correspond to either a Document or an image. )

<PZ>While this is true I would not expect the client to change the
acceptable format/compression throughout the exchange.  I believe it is the
Scan Service that is in control of the format/compression as the Digital
Document is being delivered.  The acceptable format/compression operational
attributes for the "Get-Next-Document-Images" Request could be deleted from
the specification without any problems.  It would probably remove some
confusion.  The format/compression attributes in the response and in the
"Create-Job" request</PZ>

b.  The mode of operation of GetNextDocumentImage depends upon whether Wait
Mode is agreed upon. In Wait mode, data is sent as it becomes available and
can be accepted. If not in Wait mode (or if Wait mode is interrupted or
timed out) , the client must issue a GetNextDocumentImage for each
buffer's-worth of data.

<PZ>Yes, it is a choice between synchronous and asynchronous network
operations.</PZ>

      8. This mode of transfer suggests that GetNextDocumentImage does not
refer either to getting an Image or getting a Document, it just pulls data
in a mode determined by the Wait mode. That data may be formatted into one
or more Digital Documents, depending on format and contents.
<PZ>While that is true we had to call it something.  We went through a
couple of name changes.  In IPP it's just "0x004A".  It is pulling the data
for an image(s) within a document.  Subsequent responses may pull the data
for an image(s) from within another document    I'd prefer not to entertain
cosmetic changes at this time.</PZ>

I also understand that you suggest that the Scan Service in SM3 be changed
to agree with IPP Scan.

<PZ>.As previously stated, it will be up to the SM 3 group to determine what
will be done with the SM3 Network Scan specification.  The SM3 model can be
simplified or there could be an "IPP Production Scanning Set 1" drafted.
J</PZ>

Many thanks,

Bill Wagner

-----Original Message-----
From: Zehler, Peter [mailto:Peter.Zehler at xerox.com] 
Sent: Friday, September 26, 2014 9:20 AM
To: William A Wagner; 'Michael Sweet'
Cc: ipp at pwg.org; cloud at pwg.org
Subject: RE: IPP Scan question.

All,

IPP Scan can support multiple document jobs.  There are attributes that
allow the printer to declare that capability (
"multiple-document-jobs-supported") as well as operational attributes
("document-number", "last-document") to segment the data pulled from the
scan service into multiple files (i.e. one file per document, number of
images in a file is format and implementation specific).  During the
prototype I used a scanner that emitted JPG or PDF.  When loading a stack of
media into the ADF each image acquisition resulted in an image.  The number
of documents objects generated was dictated by output file type.  In the IPP
binding I limited the file to document object association to 1 to 1.  I did
not want to deal with the complexities of associating multiple files with a
single document object.    The abstract MFD Scan model did allow multiple
files per document.

Running a stack of paper using JPG as the " document-format-accepted"
resulted in a multiple files each of which was associated with a single
document.  Running that same stack of paper using PDF as the
"document-format-accepted" resulted in a single multipage file associated
with a single document.  From the client perspective using
Get-Next-Document-Images behaved a bit different for each job.  With the JPG
output the responses had a document number that changed throughout the scan
job retrieval.  The number of responses with the same document number varied
based on the complexity of the image.  Each time the document number
changed, the output file is closed and a new one is opened.  The last
Get-Next-Document-Images for the last document in the job set the
"last-document" to true.  In a push job version of this scan job, the same
number of files are created at the destination. With the PDF output the
responses had a document number remained the same throughout the scan job
retrieval.    When the last Get-Next-Document-Images for the job had the
"last-document" to true, the output file was closed.  In a push job version
of this scan job, one file was created at the destination.

The MFD Scan model was created with the idea that the same protocol would be
used locally or remotely.  Therefore the was considerable more control over
the behavior of the scanner itself.  The IPP Scan service simplified a
number of aspects to address the 98% needs for network scanning in a mobile
environment.  I expect the MFD Scan service would be adjusted to better
reflect implementation experience within the PWG (i.e., IPP Scan) and in the
industry (e.g., WS-Scan, UPnP Scan, vendor specific scan).

Peter Zehler

PARC, A Xerox Company

800 Phillips Rd, 128-27E

Webster NY, 14580-9701

Email:  <mailto:Peter.Zehler at Xerox.com> Peter.Zehler at Xerox.com

Office: +1 (585) 265-8755

Mobile: +1 (585) 329-9508

FAX: +1 (585) 265-7441

-----Original Message-----

From: William A Wagner [ <mailto:wamwagner at comcast.net>
mailto:wamwagner at comcast.net]

Sent: Thursday, September 25, 2014 2:15 PM

To: 'Michael Sweet'

Cc: Zehler, Peter;  <mailto:ipp at pwg.org> ipp at pwg.org;
<mailto:cloud at pwg.org> cloud at pwg.org

Subject: RE: IPP Scan question.

Michael,

Thank you for your response.

1. I agree that Figure 3 of the MFD Scan spec definitely indicates that
there can be multiple images in one scan document; I do not see where it
indicates that there cannot be multiple documents is a job. Furthermore,
Figure 4 of that same document (with the associated text) definitely states
that, for a multi-document Job,  " Job object contains multiple Document
objects. Each Document can have a different set of processing parameters."

And further that the Scan Service semantic model may allow the End User to
specify a multi-document Job as a service output. If we have intentionally
decided to not consider multi-document jobs in IPP, that should be made
clear. I think it is to be determined if we decide to eliminate them from
the SM3. (Incidentally, I do not see a compelling Use Case for
multi-document Scan Jobs, although some may exist.)

2. I get your explanation that Get-Next-Document-Images refers to multiple
images of a document, and that "last-document" refers to the last image of a
document. But these are names are misleading. Do we use 'Images' to refer to
anything other than 'Document Images'?

I apologize for not commenting on the IPP Scan document earlier, but I think
the one document per job characteristic, despite what one might expect from
the names, should be made more clear. Also, as you suggest, the fact that
for Pull Scan,  the GetNextDocumentImages can redefine Compression Accepted
and Document Format Accepted for each image of potentially multiple images
document.

Thanks,

Bill Wagner 

-----Original Message-----

From: Michael Sweet [ <mailto:msweet at apple.com> mailto:msweet at apple.com]

Sent: Thursday, September 25, 2014 9:12 AM

To: William A Wagner

Cc: Zehler, Peter;  <mailto:ipp at pwg.org> ipp at pwg.org;
<mailto:cloud at pwg.org> cloud at pwg.org

Subject: Re: IPP Scan question.

Bill,

> On Sep 21, 2014, at 9:50 AM, William A Wagner <
<mailto:wamwagner at comcast.net> wamwagner at comcast.net>

wrote:

> ...

> It is also clear from the IPP Scan specification GetNextDocumentImages

operation that a scan job can have multiple documents.

I don't think these are multiple document objects, however.

Get-Next-Document-Images is a convenient way to pull one or more
images/pages from the scanner, but from the point of view of the model they
are part of one document object and would be delivered (in the case of push

scan) as a single file.

> 

> The Cloud conference call comment is that  FetchJob (corresponding to 

> Destination,  DestinationAccesses, and  InputElements for Scan with no

need to have a FetchDocument operation.  This  suggests that there is but
one document (possibly with multiple destinations) in a Scan Job.

Alternatively,  it may be that the Input Parameters and Destinations for
each one of multiple documents are defined in the CreateJob.  This seemes
inconsistent with the general Imaging Service model.

In the case of Scan, the CreateScanJob operation is instantiating a single
scan job containing a single document object that may have multiple digital
representations (e.g. PDF, TIFF, etc.) of the same images.  Figure 3 on page

22 of the MFD Scan spec seems pretty clear on that point.  This is similar
to how the Copy and FaxIn services work (single document jobs).

Print, FaxOut, and Transform can support multiple digital document inputs
(and thus multiple document objects).

I think the only inconsistency here is that some job services support
multiple document objects and some don't.  But I don't think that hurts the
overall model - just something worth pointing out.

(and perhaps as well worth considering/mentioning that most Print and FaxOut
service implementations only support single document jobs...)

> The IPP Scan specification definitely refers to multiple documents in 

> one

scan job.  However, Figure 1 can be interpreted to mean that  the only
operation necessary for Scan is a CreateJob, with GetNextDocumentImages
necessary if it is a Pull Scan Job. Indeed, InputAttributes is defined to be
in the CreateJob request as well as are the Job Template attributes defining
destination; but it does not appear that different InputAttributes and/or
destinations can be specified for different documents.

I think the choice of reusing the "last-document" operation attribute in the
response of Get-Next-Document-Images operation is causing confusion here. It
really is (semantically) "last-document-image".

Pete, do you think this is worth an editorial change before publication,
either the attribute name or the description ("indicating that the last
document IMAGE has been reached")?

> [Also,  Compression Accepted and Document Format Accepted are defined 

> in CreateJob, but also in GetNextDocumentImages for Pull Scans. Can it 

> be assumed that requests in GetNextDocumentImages takes precedence?]

I think this needs some clarification - you put those in Create-Job for a
Push Scan and in Get-Document-Images for a Pull Scan.

> Do I correctly understand that, although there may be multiple 

> documents

in a scan job, they must all have the same InputAttributes and the same
destination(s)?  An alternate approach might have been to send  a
SetDocumentAttributes sent for each document to be scanned, which contained
the input parameters and destination for each specific document/image file;
that would have been consistent with the Model.

Currently you scan whatever is at the input source and send it to the

destination(s) or pull the images with Get-Next-Document-Images.  The only
way to break things up is to create multiple jobs and specify the number of
images for each job in the "input-images-to-transfer" member attribute.

> For Cloud, we need to decide whether we should reflect the Semantic 

> Model

(with which we should bet be consistent) or the IPP Scan Binding. Or do we
need to change the semantic model?

The intent is that IPP Scan would update the SM definition of SM Scan, since
SM Scan doesn't deal with Pull Scan.

> Also, a few minor editorial comments/questions I had while looking up

stuff.

>  

> 1.                          Table 1 lists Get-Next-Document-Images and

refers to PWG 5100.SCAN.  I take it that this  means to have the
specification refer to itself, but it is confusing even if the proper number
is inserted. Better to refer to the internal paragraph.

Agreed.

> 2.                          Figure 1 refers to the operation as

GetNextDocumentImage rather than GetNextDocumentImages

> 

> 3.                          In para 7.1.1, under Group 2: Job Template

Attributes is a reference to section 8.28.1.7.2.  There is no such section
(should it be 8.2?)

> 

> 4.                          Although the text makes a distinction between

Print Jobs and Scan Jobs, section 8.2.1.1 refers to a Print Job.

Thanks for catching these!

_________________________________________________________

Michael Sweet, Senior Printing System Engineer, PWG Chair

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pwg.org/pipermail/ipp/attachments/20140929/3e7e80bd/attachment.html>