We suggest to specify a datatype resp. a family of datatypes ExecutableRealKulischMiranker depending on roundings r, which we present later. These datatypes consist of
At the first sight it seems to be doubtful that formula (RG) can be met by an executable specification. In order to determine the approximation r(x ° y), the exact but unknown result x ° y seems to be required. But [KM81] shows by means of algorithms for ° e {+,*,-,/} that whenever x ° y is not representable in the finite floating point system, it is sufficient to replace it by an appropriate and representable value x õ y. The latter will have the property r(x ° y) = r(x õ y). Thus the algorithms compute first the value x õ y in a sligthly larger floating point system and then apply a rounding r on this value to obtain x ° r y = r( x ° y) = r(x õ y). The algorithms presented in [KM81] work for the roundings
forall x e R*: \¯/(x):= max{ y e S* | y < x }.
forall x e R*: /\(x):= min{ y e S* | y > x }.
forall x e R*, x > 0: []b(x):= \¯/(x) and forall x e R*, x < 0: []b(x):= - []b(-x).
forall x e R*, x > 0: []0(x):= /\(x) and forall x e R*, x < 0: []0(x):= - []0(-x).
where
forall x e R*, x e [0, be1 -1): []µ(x):= 0, forall x e R*, x > be1 -1: []µ(x):= {
\¯/(x) , x e [\¯/(x), sµ(x)) /\(x) , x e [sµ(x),\¯/(x)], forall x e R*, x < 0: []µ(x):= - []µ(-x),
forall x e R*: sµ:= \¯/(x) + (/\(x) - \¯/(x) ) ·µ/b.If b is an even number, []b/2 denotes the rounding to the nearest floating point number.
Further, [KM81] present algorithms to compute the result of the rounded sum operator for all of the above mentioned roundings.
Thus we suggest to specify ExecutableRealKulischMiranker for all theses roundings. This leads - depending on the algorithms presented in [KM81] - to one specification with a parameter for the rounding or to a family of specifications, where each specification "implements" a special rounding.
For a comprehensive discussion of the IEEE standards we refer to [Gol91]. He summarizes the standards as follows:
There are two different IEEE standards for floating-point computation. IEEE 754 is a binary standard that requires b=2, l=24 for single precision and l=53 for double precision [IEE87]. It also specifies the precise layout of bits in a single and double precision. IEEE 854 allows either b=2 or b=10 and unlike 754, does not specify how floating-point numbers are encoded into bits [Cod84]. It does not require a particular value for l, but instead it specifies constraints on the allowable values of l for single and double precision.8
The standard IEEE 754 e.g. defines the following format parameters:
parameter single single extended double double extended l 24 > 32 53 > 64 e2 +127 > +1023 +1023 > 16383 e1 -126 < +1022 -1022 < 16382
Both IEEE standards cover the basic operations +,-,*,/, and also specify square root, remainder and the two conversions between integer and floating point numbers and between the internal format and decimal output. They require all these operations to be exactly rounded. That is, the result must be computed exactly and then rounded9. There are several rounding modes to be provided: the default is rounding to the nearest floating point number. But also rounding toward 0, rounding toward + infty and rounding toward - infty are covered.
We suggest to specify both standards as ExecutableReal754 and ExecutableReal854 with suitable parameters like single, double, base, etc.
The datatype ExecutableRealKulischMiranker and both specifications based on the IEEE standards, ExecutableReal754 and ExecutableReal854, are on the one hand independent of the specifications BasicReal and AbstractExecutableReal as they rely on floating-point systems, which can be realized in CASL without any connection to the sort Real. On the other hand they are closely related with these specifications by the roundings.
The main differences between the two proposals are:
Compared with the proposal based on [KM81] the IEEE standards include as additional operations square root, remainder and the two conversions.