[Cloud] IPP Scan question.

Fri Sep 26 13:20:22 UTC 2014

All,

IPP Scan can support multiple document jobs.  There are attributes that allow the printer to declare that capability ( "multiple-document-jobs-supported") as well as operational attributes ("document-number", "last-document") to segment the data pulled from the scan service into multiple files (i.e. one file per document, number of images in a file is format and implementation specific).  During the prototype I used a scanner that emitted JPG or PDF.  When loading a stack of media into the ADF each image acquisition resulted in an image.  The number of documents objects generated was dictated by output file type.  In the IPP binding I limited the file to document object association to 1 to 1.  I did not want to deal with the complexities of associating multiple files with a single document object.    The abstract MFD Scan model did allow multiple files per document.

Running a stack of paper using JPG as the " document-format-accepted" resulted in a multiple files each of which was associated with a single document.  Running that same stack of paper using PDF as the "document-format-accepted" resulted in a single multipage file associated with a single document.  From the client perspective using Get-Next-Document-Images behaved a bit different for each job.  With the JPG output the responses had a document number that changed throughout the scan job retrieval.  The number of responses with the same document number varied based on the complexity of the image.  Each time the document number changed, the output file is closed and a new one is opened.  The last Get-Next-Document-Images for the last document in the job set the "last-document" to true.  In a push job version of this scan job, the same number of files are created at the destination. With the PDF output the responses had a document number remained the same throughout the scan job retrieval.    When the last Get-Next-Document-Images for the job had the "last-document" to true, the output file was closed.  In a push job version of this scan job, one file was created at the destination.

The MFD Scan model was created with the idea that the same protocol would be used locally or remotely.  Therefore the was considerable more control over the behavior of the scanner itself.  The IPP Scan service simplified a number of aspects to address the 98% needs for network scanning in a mobile environment.  I expect the MFD Scan service would be adjusted to better reflect implementation experience within the PWG (i.e., IPP Scan) and in the industry (e.g., WS-Scan, UPnP Scan, vendor specific scan).

Peter Zehler

PARC, A Xerox Company
800 Phillips Rd, 128-27E
Webster NY, 14580-9701
Email: Peter.Zehler at Xerox.com
Office: +1 (585) 265-8755
Mobile: +1 (585) 329-9508
FAX: +1 (585) 265-7441

-----Original Message-----
From: William A Wagner [mailto:wamwagner at comcast.net] 
Sent: Thursday, September 25, 2014 2:15 PM
To: 'Michael Sweet'
Cc: Zehler, Peter; ipp at pwg.org; cloud at pwg.org
Subject: RE: IPP Scan question.

Michael,

Thank you for your response.

1. I agree that Figure 3 of the MFD Scan spec definitely indicates that there can be multiple images in one scan document; I do not see where it indicates that there cannot be multiple documents is a job. Furthermore, Figure 4 of that same document (with the associated text) definitely states that, for a multi-document Job,  " Job object contains multiple Document objects. Each Document can have a different set of processing parameters."
And further that the Scan Service semantic model may allow the End User to specify a multi-document Job as a service output. If we have intentionally decided to not consider multi-document jobs in IPP, that should be made clear. I think it is to be determined if we decide to eliminate them from the SM3. (Incidentally, I do not see a compelling Use Case for multi-document Scan Jobs, although some may exist.)

2. I get your explanation that Get-Next-Document-Images refers to multiple images of a document, and that "last-document" refers to the last image of a document. But these are names are misleading. Do we use 'Images' to refer to anything other than 'Document Images'?

I apologize for not commenting on the IPP Scan document earlier, but I think the one document per job characteristic, despite what one might expect from the names, should be made more clear. Also, as you suggest, the fact that for Pull Scan,  the GetNextDocumentImages can redefine Compression Accepted and Document Format Accepted for each image of potentially multiple images document.
Thanks,
Bill Wagner 

-----Original Message-----
From: Michael Sweet [mailto:msweet at apple.com]
Sent: Thursday, September 25, 2014 9:12 AM
To: William A Wagner
Cc: Zehler, Peter; ipp at pwg.org; cloud at pwg.org
Subject: Re: IPP Scan question.

Bill,

> On Sep 21, 2014, at 9:50 AM, William A Wagner <wamwagner at comcast.net>
wrote:
> ...
> It is also clear from the IPP Scan specification GetNextDocumentImages
operation that a scan job can have multiple documents.

I don't think these are multiple document objects, however.
Get-Next-Document-Images is a convenient way to pull one or more images/pages from the scanner, but from the point of view of the model they are part of one document object and would be delivered (in the case of push
scan) as a single file.

> 
> The Cloud conference call comment is that  FetchJob (corresponding to 
> Destination,  DestinationAccesses, and  InputElements for Scan with no
need to have a FetchDocument operation.  This  suggests that there is but one document (possibly with multiple destinations) in a Scan Job.
Alternatively,  it may be that the Input Parameters and Destinations for each one of multiple documents are defined in the CreateJob.  This seemes inconsistent with the general Imaging Service model.

In the case of Scan, the CreateScanJob operation is instantiating a single scan job containing a single document object that may have multiple digital representations (e.g. PDF, TIFF, etc.) of the same images.  Figure 3 on page
22 of the MFD Scan spec seems pretty clear on that point.  This is similar to how the Copy and FaxIn services work (single document jobs).

Print, FaxOut, and Transform can support multiple digital document inputs (and thus multiple document objects).

I think the only inconsistency here is that some job services support multiple document objects and some don't.  But I don't think that hurts the overall model - just something worth pointing out.

(and perhaps as well worth considering/mentioning that most Print and FaxOut service implementations only support single document jobs...)

> The IPP Scan specification definitely refers to multiple documents in 
> one
scan job.  However, Figure 1 can be interpreted to mean that  the only operation necessary for Scan is a CreateJob, with GetNextDocumentImages necessary if it is a Pull Scan Job. Indeed, InputAttributes is defined to be in the CreateJob request as well as are the Job Template attributes defining destination; but it does not appear that different InputAttributes and/or destinations can be specified for different documents.

I think the choice of reusing the "last-document" operation attribute in the response of Get-Next-Document-Images operation is causing confusion here. It really is (semantically) "last-document-image".

Pete, do you think this is worth an editorial change before publication, either the attribute name or the description ("indicating that the last document IMAGE has been reached")?

> [Also,  Compression Accepted and Document Format Accepted are defined 
> in CreateJob, but also in GetNextDocumentImages for Pull Scans. Can it 
> be assumed that requests in GetNextDocumentImages takes precedence?]

I think this needs some clarification - you put those in Create-Job for a Push Scan and in Get-Document-Images for a Pull Scan.

> Do I correctly understand that, although there may be multiple 
> documents
in a scan job, they must all have the same InputAttributes and the same destination(s)?  An alternate approach might have been to send  a SetDocumentAttributes sent for each document to be scanned, which contained the input parameters and destination for each specific document/image file; that would have been consistent with the Model.

Currently you scan whatever is at the input source and send it to the
destination(s) or pull the images with Get-Next-Document-Images.  The only way to break things up is to create multiple jobs and specify the number of images for each job in the "input-images-to-transfer" member attribute.

> For Cloud, we need to decide whether we should reflect the Semantic 
> Model
(with which we should bet be consistent) or the IPP Scan Binding. Or do we need to change the semantic model?

The intent is that IPP Scan would update the SM definition of SM Scan, since SM Scan doesn't deal with Pull Scan.

> Also, a few minor editorial comments/questions I had while looking up
stuff.
>  
> 1.                          Table 1 lists Get-Next-Document-Images and
refers to PWG 5100.SCAN.  I take it that this  means to have the specification refer to itself, but it is confusing even if the proper number is inserted. Better to refer to the internal paragraph.

Agreed.

> 2.                          Figure 1 refers to the operation as
GetNextDocumentImage rather than GetNextDocumentImages
> 
> 3.                          In para 7.1.1, under Group 2: Job Template
Attributes is a reference to section 8.28.1.7.2.  There is no such section (should it be 8.2?)
> 
> 4.                          Although the text makes a distinction between
Print Jobs and Scan Jobs, section 8.2.1.1 refers to a Print Job.

Thanks for catching these!

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair