IPP Mail Archive: IPP> URI changes in RFC 2396

IPP> URI changes in RFC 2396

Ira Mcdonald x10962 (imcdonal@eso.mc.xerox.com)
Wed, 19 Aug 1998 11:06:41 PDT

Hi folks, Wednesday (19 August 1998)

Below is Appendix G 'Summary of Non-editorial Changes' from RFC 2396
(the new URI Generic Syntax spec which is a Draft Standard).

Cheers,
- Ira McDonald (High North)

>------------------------------------------------------------------------
>RFC 2396 URI Generic Syntax August 1998
>
>G. Summary of Non-editorial Changes
>
>G.1. Additions
>
> Section 4 (URI References) was added to stem the confusion regarding
> "what is a URI" and how to describe fragment identifiers given that
> they are not part of the URI, but are part of the URI syntax and
> parsing concerns. In addition, it provides a reference definition
> for use by other IETF specifications (HTML, HTTP, etc.) that have
> previously attempted to redefine the URI syntax in order to account
> for the presence of fragment identifiers in URI references.
>
> Section 2.4 was rewritten to clarify a number of misinterpretations
> and to leave room for fully internationalized URI.
>
> Appendix F on abbreviated URLs was added to describe the shortened
> references often seen on television and magazine advertisements and
> explain why they are not used in other contexts.
>
>G.2. Modifications from both RFC 1738 and RFC 1808
>
> Changed to URI syntax instead of just URL.
>
> Confusion regarding the terms "character encoding", the URI
> "character set", and the escaping of characters with %<hex><hex>
> equivalents has (hopefully) been reduced. Many of the BNF rule names
> regarding the character sets have been changed to more accurately
> describe their purpose and to encompass all "characters" rather than
> just US-ASCII octets. Unless otherwise noted here, these
> modifications do not affect the URI syntax.
>
> Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters
> as if URI-interpreting software were limited to a single set of
> characters with a reserved purpose (i.e., as meaning something other
> than the data to which the characters correspond), and that this set
> was fixed by the URI scheme. However, this has not been true in
> practice; any character that is interpreted differently when it is
> escaped is, in effect, reserved. Furthermore, the interpreting
> engine on a HTTP server is often dependent on the resource, not just
> the URI scheme. The description of reserved characters has been
> changed accordingly.
>
> The plus "+", dollar "$", and comma "," characters have been added to
> those in the "reserved" set, since they are treated as reserved
> within the query component.
>
> The tilde "~" character was added to those in the "unreserved" set,
> since it is extensively used on the Internet in spite of the
> difficulty to transcribe it with some keyboards.
>
> The syntax for URI scheme has been changed to require that all
> schemes begin with an alpha character.
>
> The "user:password" form in the previous BNF was changed to a
> "userinfo" token, and the possibility that it might be
> "user:password" made scheme specific. In particular, the use of
> passwords in the clear is not even suggested by the syntax.
>
> The question-mark "?" character was removed from the set of allowed
> characters for the userinfo in the authority component, since testing
> showed that many applications treat it as reserved for separating the
> query component from the rest of the URI.
>
> The semicolon ";" character was added to those stated as being
> reserved within the authority component, since several new schemes
> are using it as a separator within userinfo to indicate the type of
> user authentication.
>
> RFC 1738 specified that the path was separated from the authority
> portion of a URI by a slash. RFC 1808 followed suit, but with a
> fudge of carrying around the separator as a "prefix" in order to
> describe the parsing algorithm. RFC 1630 never had this problem,
> since it considered the slash to be part of the path. In writing
> this specification, it was found to be impossible to accurately
> describe and retain the difference between the two URI
> <foo:/bar> and <foo:bar>
> without either considering the slash to be part of the path (as
> corresponds to actual practice) or creating a separate component just
> to hold that slash. We chose the former.
>
>G.3. Modifications from RFC 1738
>
> The definition of specific URL schemes and their scheme-specific
> syntax and semantics has been moved to separate documents.
>
> The URL host was defined as a fully-qualified domain name. However,
> many URLs are used without fully-qualified domain names (in contexts
> for which the full qualification is not necessary), without any host
> (as in some file URLs), or with a host of "localhost".
>
> The URL port is now *digit instead of 1*digit, since systems are
> expected to handle the case where the ":" separator between host and
> port is supplied without a port.
>
> The recommendations for delimiting URI in context (Appendix E) have
> been adjusted to reflect current practice.
>
>G.4. Modifications from RFC 1808
>
> RFC 1808 (Section 4) defined an empty URL reference (a reference
> containing nothing aside from the fragment identifier) as being a
> reference to the base URL. Unfortunately, that definition could be
> interpreted, upon selection of such a reference, as a new retrieval
> action on that resource. Since the normal intent of such references
> is for the user agent to change its view of the current document to
> the beginning of the specified fragment within that document, not to
> make an additional request of the resource, a description of how to
> correctly interpret an empty reference has been added in Section 4.
>
> The description of the mythical Base header field has been replaced
> with a reference to the Content-Location header field defined by
> MHTML [RFC2110].
>
> RFC 1808 described various schemes as either having or not having the
> properties of the generic URI syntax. However, the only requirement
> is that the particular document containing the relative references
> have a base URI that abides by the generic URI syntax, regardless of
> the URI scheme, so the associated description has been updated to
> reflect that.
>
> The BNF term <net_loc> has been replaced with <authority>, since the
> latter more accurately describes its use and purpose. Likewise, the
> authority is no longer restricted to the IP server syntax.
>
> Extensive testing of current client applications demonstrated that
> the majority of deployed systems do not use the ";" character to
> indicate trailing parameter information, and that the presence of a
> semicolon in a path segment does not affect the relative parsing of
> that segment. Therefore, parameters have been removed as a separate
> component and may now appear in any path segment. Their influence
> has been removed from the algorithm for resolving a relative URI
> reference. The resolution examples in Appendix C have been modified
> to reflect this change.
>
> Implementations are now allowed to work around misformed relative
> references that are prefixed by the same scheme as the base URI, but
> only for schemes known to use the <hier_part> syntax.
>