IPP Mail Archive: Re: IPP> ADM - Minutes from PWG IPP Phone Conference - 98011

Re: IPP> ADM - Minutes from PWG IPP Phone Conference - 98011

Carl Kugler (kugler@us.ibm.com)
Tue, 20 Jan 1998 12:22:07 -0500

I had to see for myself so I've tried to work out some rough numbers.

Assumptions:
1) The boundary delimiter has the maximum legal length of 70 characte=
rs.
2) The boundary delimiter and the encapsulated data are generated by =
random,
uncorrelated processes.
3) The encapsulated data is a string of 8-bit octets
4) The encapsulated data is more than 70 octets long.

Let N be the number of octets in the encapsulated data.
The number of substrings of length 70 in the encapsulated data is N - 7=
0 - 1 or
N - 71.
A substring matches the boundary delimiter with probability (1 / 256) ^=
70,
since there are 256 possibilities for each character and all characters=
have to
match, in order.
The expected number of matches is therefore (N - 71) * (1/256) ^ 70 or
(N - 71) / (256 ^ 70).

So, for example, transferring 1 GB files, you'd expect (1E9 - 71) / (25=
6 **
70) or 2.6E-160 failures per submission, which works out to a failure=
rate of
1 in 3.77E+159 trials. Of course, the failure rate is lower for small=
er
files; 1 in 3.77E+162 for 1 MB files.

If we challenge assumption 3 and say we're transferring 1 GB 7bit ASCII=
files,
then the failure rate increases to 1 in 1.85E+138.

In conclusion, I'd have to agree that this probability is insignificant=
(if my
assumptions are valid and I've done the math right).

-Carl

ipp-owner@pwg.org on 01/19/98 10:58:54 PM
Please respond to ipp-owner@pwg.org @ internet
To: Carl Kugler/Boulder/IBM@ibmus
cc: ipp@pwg.org @ internet
Subject: Re: IPP> ADM - Minutes from PWG IPP Phone Conference - 98011

> The weakness with the MIME way is that it's either unsafe or slow -- =
either
you > arbitrarily pick a boundary string and hope that it doesn't appea=
r in the
>
binary data, or you prescan the data to make sure. Content-length avoi=
ds those
>
problems.

Actually, the fact of the matter is that it doesn't have to be either -=
- it is
quite easy to generate boundaries which in practice are so statisticall=
y
unlikely to ever appear in the message text that the chances of, say me=
ssage
corruption as a result of undetected network errors are many orders of
magnitude greater.

Ned

=