IPP Mail Archive: IPP> PRO - Suggested text for form-data

IPP> PRO - Suggested text for form-data

Carl-Uno Manros (cmanros@cp10.es.xerox.com)
Wed, 19 Mar 1997 10:32:53 PST

Hi,

I was in Palo Alto yesterday and managed to corner Larry Masinter on the
form-data document. He knocked up this version on-the-fly, and asked us to
give back any comments on the scope and content over the next couple of
days, make that Friday this week. The document will initially be published
as a personal Internet-Draft from Larry, but can then proceed directly from
there to the standards process, without the need to go through any WG
(according to Larry).

Larry will send the document to the IETF before the deadline for Memphis.

Regards,

Carl-Uno

-- 
Internet Draft					Larry Masinter
March 18, 1997
Expires in 6 months	

multipart/form-data: a format for returning the values obtained from filling out a form

Status of this Memo

Internet draft boilerplate This memo defines an Experimental Protocol for the Internet community. This memo does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited.

1. Abstract

This specification defines an Internet Media Type, multipart/form-data, which can be used by a wide variety of applications and transported by a wide variety of protocols as a way of returning a set of values as the result of a user filling out a form. Typical applications include form values generated by HTML forms and submitted by HTTP post or by electronic mail, but the format is independent of those contexts. This data type is unchanged from its original description as part of RFC 1867.

2 Use of multipart/form-data

The definition of multipart/form-data is included in section 3. A boundary is selected that does not occur in any of the data. (This selection is sometimes done probabilisticly.) Each field of the form is sent, in the order in which it occurs in the form, as a part of the multipart stream. Each part identifies the INPUT name within the original form. Each part should be labelled with an appropriate content-type if the media type is known (e.g., inferred from the file extension or operating system typing information) or as application/octet-stream.

If multiple files are selected, they should be transferred together using the multipart/mixed format.

While the HTTP protocol can transport arbitrary BINARY data, the default for mail transport (e.g., if the ACTION is a "mailto:" URL) is the 7BIT encoding. The value supplied for a part may need to be encoded and the "content-transfer-encoding" header supplied if the value does not conform to the default encoding. [See section 5 of RFC 1521 for more details.]

The original local file name may be supplied as well, either as a 'filename' parameter either of the 'content-disposition: form-data' header or in the case of multiple files in a 'content-disposition: file' header of the subpart. The client application should make best effort to supply the file name; if the file name of the client's operating system is not in US-ASCII, the file name might be approximated or encoded using the method of RFC 1522. This is a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description.

On the server end, the ACTION might point to a HTTP URL that implements the forms action via CGI. In such a case, the CGI program would note that the content-type is multipart/form-data, parse the various fields (checking for validity, writing the file data to local files for subsequent processing, etc.).

3. definition of multipart/form-data

The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in RFC 1521. It is intended for use in returning the data that comes about from filling out a form. In a form (in HTML, although other applications may also use forms), there are a series of fields to be supplied by the user who fills out the form. Each field has a name. Within a given form, the names are unique.

multipart/form-data contains a series of parts. Each part is expected to contain a content-disposition header where the value is "form- data" and a name attribute specifies the field name within the form, e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is the field name corresponding to that field. Field names originally in non-ASCII character sets may be encoded using the method outlined in RFC 1522.

As with all multipart MIME types, each part has an optional Content- Type which defaults to text/plain. If the contents of a file are returned via filling out a form, then the file input is identified as application/octet-stream or the appropriate media type, if known. If multiple files are to be returned as the result of a single form entry, they can be returned as multipart/mixed embedded within the multipart/form-data.

Each part may be encoded and the "content-transfer-encoding" header supplied if the value of that part does not conform to the default encoding.

File inputs may also identify the file name. The file name may be described using the 'filename' parameter of the "content-disposition" header. This is not required, but is strongly recommended in any case where the original filename is known. This is useful or necessary in many applications.

4. Other considerations

4.1 Compression, encryption

Some of the data in forms may be compressed or encrypted, using other MIME mechanisms.

4.2 Transmitting long files in form-data

<discussion of how this can work> In some situations, it might be advisable to have the server validate various elements of the form data (user name, account, etc.) before actually preparing to receive the data. However, after some consideration, it seemed best to require that servers that wish to do this should implement this as a series of forms, where some of the data elements that were previously validated might be sent back to the client as 'hidden' fields, or by arranging the form so that the elements that need validation occur first. This puts the onus of maintaining the state of a transaction only on those servers that wish to build a complex application, while allowing those cases that have simple input needs to be built simply.

The HTTP protocol may require a content-length for the overall transmission. Even if it were not to do so, HTTP clients are encouraged to supply content-length for overall file input so that a busy server could detect if the proposed file data is too large to be processed reasonably and just return an error code and close the connection without waiting to process all of the incoming data. Some current implementations of CGI require a content-length in all POST transactions.

In any case, a HTTP server may abort a file upload in the middle of the transaction if the file being received is too large.

4.3 Other choices for return transmission of binary data

Various people have suggested using new mime top-level type "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of "packet" to express indeterminate-length binary data, rather than relying on the multipart-style boundaries. While we are not opposed to doing so, this would require additional design and standardization work to get acceptance of "aggregate". On the other hand, the 'multipart' mechanisms are well established, simple to implement on both the sending client and receiving server, and as efficient as other methods of dealing with multiple combinations of binary data.

4.5 Transmitting form-data via mail

Some forms will allow the results to be mailed, e.g., by supplying a "mailto" URL as the form's action. In this case, a mail appropriate choice for encoding must be made for the form and its data.

4.6 Remote files with third-party transfer

In some scenarios, the user operating the client software might want to specify a URL for remote data rather than a local file. In this case, is there a way to allow the browser to send to the client a pointer to the external data rather than the entire contents? This capability could be implemented, for example, by having the client send to the server data of type "message/external-body" with "access-type" set to, say, "uri", and the URL of the remote data in the body of the message.

4.7 CRLF used as line separator

As with all MIME transmissions, CRLF is used as the separator for lines in a POST of the data in multipart/form-data.

4.8 Relationship to multipart/related

The MIMESGML group is proposing a new type called multipart/related. While it contains similar features to multipart/form-data, the use and application of form-data is different enough that form-data is being described separately.

It might be possible at some point to encode the result of HTML forms (including files) in a multipart/related body part; this is not incompatible with this proposal.

4.9 Non-ASCII field names

Note that mime headers are generally required to consist only of 7- bit data in the US-ASCII character set. Hence field names should be encoded according to the prescriptions of RFC 1522 if they contain characters outside of that set. In HTML 2.0, the default character set is ISO-8859-1, but non-ASCII characters in field names should be encoded.

5. Security Considerations

TBD

6. Author's Addresses

Larry Masinter Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304

Phone: (415) 812-4365 Fax: (415) 812-4333 EMail: masinter@parc.xerox.com

Carl-Uno Manros Principal Engineer - Advanced Printing Standards - Xerox Corporation 701 S. Aviation Blvd., El Segundo, CA, M/S: ESAE-231 Phone +1-310-333 8273, Fax +1-310-333 5514 Email: manros@cp10.es.xerox.com