Computations with effective real numbers

A dyadic number is a number of the form

with

. We denote by

the set of dyadic numbers and

. Given

and

, an

-approximation for

is a number

with

. An approximation algorithm for

is an algorithm which takes a tolerance

on input and which returns an

-approximation for

. A real number

is said to be effective, if it admits an approximation algorithm. The aim of this paper is to review several recent techniques for computations with effective real numbers and to present a few new ones.

Effective real numbers can be useful in areas of numerical analysis where numerical instability is a major problem, like singularity theory. In such areas, multiple precision arithmetic is often necessary and the required precisions for auxiliary computations may heavily vary. We hope that the development of an adequate computational theory for effective real numbers will both allow us to automatically perform the error analysis and compute the required precisions at which computations have to take place.

We recall that there exists no general zero-test for effective real numbers. Nevertheless, exact or heuristically reliable zero-tests do exist for interesting subfields, which contain transcendental constants. However, this topic will not be studied in this paper and we refer to [1, ?; 2, ?] for some recent work on this matter.

In an object-oriented language like C++, a natural way to represent an effective real number is by an abstract object with a method which corresponds to the approximation algorithm. When using this representation, which will be discussed in more detail in section 2, it is natural to perform a priori error estimations for common operations on real numbers. In other words, if we want to compute

with precision

, then we determine tolerances

such that the multiple precision evaluation of

at any

-approximations for the

always yields an

-approximation for

In some cases, a priori error estimates are quite pessimistic. An alternative technique for computing with effective real numbers is interval arithmetic. In this case, we approximate an effective real number

by an interval

which contains

. Then the evaluation of

comes down to the determination of an interval

with

For continuous functions

, when starting with a very precise approximation of

(i.e.

is very small), the obtained a posteriori error estimate for

will also become as small as desired. In section 3, this technique of a posteriori error estimates will be reviewed in more detail, as well as an efficient representation for high-precision intervals.

Unfortunately, in some cases, both a priori and a posteriori error estimates are quite pessimistic. In sections 4 and 5, we will therefore present two new techniques for the computation of “adaptive error estimates”. These techniques combine the advantages of a priori and a posteriori error estimates, while eliminating their major disadvantages. Moreover, this new technique remains reasonably easy to implement in a general purpose system for computations with effective real numbers. Under certain restrictions, the new techniques are also close to being optimal. This point will be discussed in section 6.

The approaches presented in sections 2 and 3 have been proposed, often independently, by several authors [3, ?; 4, ?; 5, ?] and we notice the existence of several similarities with the theory of effective power series [6, ?; 7, ?]. In the past, we experimentally implemented the approaches of sections 2 and 3 in the case of power series (not officially distributed) resp. real numbers [8, ?]. We are currently working on a more robust C++ library based on the ideas in this paper [9, ?]. Currently, this library contains the basic arithmetic operations. Some other implementations with similar objectives are currently known to the author [10, ?; 11, ?]. All these implementations are free software and they are mainly based on the Gmp and Mpfr libraries [12, ?; 13, ?].

2A priori error estimates

A natural way to represent an effective real number

is by its approximation algorithm. Conceptually speaking, this means that we view

as a black box which can be asked for approximations up to any desired precision:

In an object oriented language like C++, this can be implemented using an abstract base class real_rep with a virtual method for the approximation algorithm. Effective real numbers will then be pointers to real_rep. For instance,

Here dyadic stands for the class of dyadic numbers. In practice, these may be taken to be arbitrary precision floating point numbers. For simplicity, we also do not care about memory management. In a real implementation, one would need a mechanism for reference counting or conservative garbage collection.

Now assume that we want to evaluate

for a given tolerance

, where

are effective real numbers and

is an operation like

. In order to make

effective, we need to construct an approximation algorithm for

as a function of

. This both involves the dyadic approximation of the evaluation of

in dyadic approximations of the

and the determination of tolerances

for the dyadic approximations of the

. More precisely,

should be such that for any

with

, we have

where

stands for a dyadic approximation algorithm for

which depends on the tolerance

. For instance, in the case when

is the addition, we may use exact arithmetic on

(so that

for all

) and take

. This yields the following class for representing sums of real numbers:

Notice that, in a sense, we have really represented the sum of

and

by the expression

(more generally, such expressions are dags (directed acyclic graphs)). Nevertheless, the representation using an abstract class real_rep provides additional flexibility. For instance, we may attach additional information to the class real_rep, like the best currently known approximation for the number (thereby avoiding unnecessary recomputations). In practice, it is also good to provide an additional abstract method for computing a rough upper bound for the number. This gives a fine-grained control over potential cancellations.

The above approach heavily relies on the computation of a priori error estimates (i.e. the computation of the

). If no additional techniques are used, then this leads to the following disadvantages:

3A posteriori error estimates

An alternative technique for computing with effective real numbers is interval arithmetic. The idea is to systematically compute intervals approximations instead of floating point approximations. These intervals must be such that the real numbers we are interested in are certified to lie in their respective interval approximations.

More precisely, given

and

, an -interval for

is a closed interval

with

and

. Concretely speaking, we may represent such intervals by their endpoints

and

. Alternatively, if the precisions of

and

are large, then it may be more efficient to represent the interval by its center

and its radius

. Indeed, the exact endpoints of the interval are not that important. Hence, modulo a slight increase of the radius, we may always assume that the radius can be stored in a “single precision dyadic number”

, where

is the number of bits in a machine word. This trick allows to reduce the number of multiple precision computations by a factor of two.

Now assume that we want to compute an

-approximation for

, where

are effective real numbers and where

is a continuous function. Assume also that we have reasonable initial

-intervals for the

. Then, starting with a low precision

, we first compute

-intervals

for the

. We next evaluate

using interval arithmetic. This yields an interval

with

, then

is an

-approximation for

. Otherwise, we increase the precisions

and repeat the computations. Under relatively mild assumptions on the way we evaluate

, this procedure will eventually stop, since

is continuous.

Although this technique of a posteriori error estimates does solve the problems P1 and P2 raised in the previous section, it also induces some new problems. Most importantly, we have lost the fine-grained control over the precisions in intermediate computations during the evaluation of

. Indeed, we have only control over the overall starting precision

. This disadvantage is reflected in two ways:

4Relaxed evaluation

Relaxed evaluation can be used in order to “combine the good ideas” of sections 2 and 3. The idea is to endow real_rep with an extra field interval best which corresponds to the best currently known interval approximation of the real number. Moreover, when requesting for a better approximation, we will actually compute a much better approximation, so as to avoid expensive recomputations when we repeatedly ask for slightly better approximations. This anticipating strategy was first introduced in the case of power series, where it leads to important algorithmic gains [14, ?; 7, ?]. In our context, the proposed method is quite different though, because of the problem with carries.

More precisely, assume that we have an

-ary operation

, such that the

-digit approximation of

at dyadic numbers with

digits has time complexity

. The complexity

for an elementary operation is usually a well-understood, regular function (in general,

is increasing, etc.). For instance, we have

for addition and

for multiplication, where

and

decreases slowly from

Now assume that we have an

-digit approximation of

and that we request a slightly better approximation. Then we let

be such that

and we replace our

-digit approximation by an

-digit approximation. This strategy has the property that the successive evaluation of

digits requires a time

where

are such that

and

for

. Consequently

More generally, the evaluation of

at different precisions

requires at most four times as much time as the evaluation of

at precision

The combination of a priori and a posteriori error estimates in the relaxed strategy clearly solves problems P1 and P4. Furthermore, at the level of the class real_rep, we have a fine-grained control over how to increase the computational precision. Moreover, we have shown how to perform this increase in an efficient way, as a function of

. This will avoid expensive recomputations when

occurs many times in a dag. In other words, the relaxed technique also solves problem P3.

Let us illustrate the relaxed strategy on the concrete example of the computation of the sine function. We assume that we have implemented a suitable class interval for certified arbitrary precision computations with intervals. First of all, real_rep now becomes:

Here interval::fuzzy stands for the interval

. Sines of numbers are represented by instances of the following class:

This algorithm requires some explanations. First of all, DELTA stands for a small number like

, where

is the machine word size. We use DELTA for countering the effect of rounding errors in the subsequent computations. Next, expo stands for the function which gives the exponent of a dyadic number. We have

for all

. The function next_prec (see below) applied to a precision

computes the precision

such that

, where

stands for the complexity of the evaluation of the sine function. Now we recall that xa may contain an approximation for x which is much better than the approximation we need (this is typically the case if a more precise value for the argument is needed in another part of the global computation). We therefore truncate the precision of xa at the precision we need, before computing the next approximation for the sine of x. The function next_prec may for instance be implemented as follows:

The different thresholds correspond to the precisions where we use naive multiplication, Karatsuba multiplication and two ranges of F.F.T. multiplication. We finally have the following algorithm for computing sines:

Remark 1A current limitation of the above approach is that it only takes into account the local computational complexity at each node and not the global complexity of all induced computations. For instance, in the routine sin_real_rep::approx of the computation of

, it may happen that the precision

of xa is only slightly higher than the precision

of best. In that case, the next approximation of

is only computed at precision

and not at a precision (like

) which doubles the computation time.

When the above problem occurs repeatedly, we thereby lose the principal advantage of the relaxed approach. Nevertheless, it should be noticed that the slight redundancy in the precision

can only occur if this higher precision was requested by another computation than

(except when

has an exponential computational complexity). In fact, the author currently does not know of any natural example of a computation where the above problem occurs systematically (so as to make the computation time increase by more than a constant factor).

One remedy is to delay the actual evaluation best=sin(xa) until the very end and first gradually increase precisions until we are sure that the total re-evaluation cost is about twice as large as the previous evaluation cost (considering only those nodes which are re-evaluated). However, this is only possible if we do not need the actual result of a re-evaluation during the “prospective phase” in which we increase the precisions.

Yet another approach would be to systematically use the real number analogue of relaxed power series evaluations [7, ?]. However, this requires the re-implementation of all basic operations on floating pointers in a relaxed way. It also involves an additional

overhead.

5Balanced error estimates

One of the main problems with a priori error estimates that we have mentioned in section 2 are badly balanced expressions, which lead to overly pessimistic bounds. In this section we introduce the general technique of “balanced error estimates” which solve this problem as well as possible in a general purpose system. The idea behind balanced error estimates is to “distribute the required tolerance over the subexpressions in a way which is proportional to their weights”. Here leaves have weight

and the

of an expression

is given by

Remark 2We recall that expressions are really dags, so the size of an expression may be exponentially smaller than its weight. Indeed, this can be seen by constructing an expression using repeated squarings x=x*x. It is therefore important to represent the weight by a dyadic number with exponent

, where

is the size of a machine word.

Let us first illustrate the strategy of balanced error estimates in the case of addition. So assume that we want to compute an

-interval for a sum

, where

has weight

as an expression and

has weight

. Then we will compute a

-interval for

and a

-interval for

. For example, in case of the expression

, a tolerance of

is distributed as follows:

Remark 3Consider a dag of weight

with one root and only additions and subtractions. If a tolerance

is specified at the root, then the above algorithm requires a tolerance

at each node of weight

. Notice that any correct algorithm should require a tolerance of at most

instead of

. In other words, the precision with respect to which we compute never exceeds the absolutely necessary precision by more than

Remark 4Unfortunately,

can be linear in the size

of the dag, as we saw in the case of repeated squarings (and more generally in the case of loops or recursive functions). Nevertheless, if

is exponential in the size, then the

precision loss generally only causes the overall complexity to increase by a constant factor. Let us illustrate this point by an example: consider the computation of a

-approximation of

, where

Then our algorithm requires the computation of a

-approximation of each

, which takes a time

. The optimal algorithm would only require a

-approximation of each

, which again leads to a global complexity of

. This example should be compared to the computation of a

-approximation of

Our balanced algorithm only requires a time

, which is optimal in this case, whereas a non-balanced algorithm would yield a quadratic complexity

In the case of multiplication, the above additive distribution of tolerances cannot longer be used and one has to resort to “relative tolerances”. Here a relative tolerance

for

corresponds to an absolute tolerance of

. Now let

and

be effective real numbers of sizes

and

. Assume for simplicity that we are in the regular case when we have good starting approximations

for

and

, say of relative tolerances

, where

is the machine word size. Then, in order to compute an

-interval for

, we first distribute

over

and

, and then compute a

-interval for

and a

-interval for

. The product of these approximations yields an

-interval for

More generally, assume that

are effective real numbers of weights

and that we want to compute an

-interval for

, where

is a differentiable

-ary operation for which we are given an arbitrary precision evaluation algorithm at dyadic intervals. Before balancing tolerances as above, we first have to determine the numbers which have to be subdivided (i.e.

in the case of addition and

(really

and

as we will see below) in the case of multiplication).

We define the optimal tolerances for

to be the minimal numbers

, such that for any intervals

with

we have

. Let us for simplicity that we are in the regular case when there exist (computable) bounds

which are valid for

. Then (1) implies in particular that

for

. Conversely,

implies (1). This proves that

. For instance, in the case of addition, we may take

and we obtain

. In the case of multiplication, we may take

and

and obtain

and

Now, assuming that we have sharp lower bounds

for the

(like

), we compute a

-interval for each

. Then we obtain the requested

-interval for

by evaluating

at these intervals at a sufficient precision.

Remark 5It can be checked by structural induction that the generalized scheme for the distribution of tolerances has the same essential property as in remark 3: given a dag of weight

, the precision with respect to which we compute never exceeds the absolutely necessary precision by more than

The approach of balanced error estimates is more delicate in the irregular case. It is instructive to well understand the case of multiplication first; the other operations can probably be treated in a similar way.

So assume that we want to compute a product

. Let us first consider the semi-regular case when we still have a good initial approximation for one of the arguments, say

, but not for the other. Then we first compute a

-interval for

. In case when this yields an approximation

for

with a good relative tolerance, then we may proceed as in the regular case. Otherwise, we obtain at least an upper bound for

and we let this upper bound play the role of

This leaves us with the purely irregular case when we neither have strictly positive lower bounds for

nor

. In this case, we have the choice between computing a more precise approximation for

or for

, but it is not clear a priori which choice is optimal from the complexity point of view. One solution to this problem may be to continuously improve the precisions of the approximations for both

and

, while distributing the available computation time equally over

and

. It is not clear yet to us how to estimate the complexity of such an algorithm. Another solution, which is easier to implement, is to use rough upper bounds for

and

instead of

and

, while increasing the precision continuously. However, this strategy is certainly not optimal in some cases.

6Conclusion and notes on optimality

We have presented two new “adaptive error estimating techniques” for computing with effective real numbers. We believe these techniques to be optimal in virtually all situations which occur in practice. It remains interesting to study the theoretical optimality in more detail. This first requires the introduction of a suitable notion of optimality.

Assume that we have a library of effective real functions

, some of which have arity

. Then any computation using this library involves only a finite number

of expressions constructed using the

. These expressions actually correspond to a directed acyclic graph, since they may share common subexpressions.

Assume that the aim of the computation is to obtain an

-approximation for each

with

for some

. A certified solution to this problem consists of assigning a dyadic interval to each node of the dag, such that

The certified solution is said to be optimal, if the time needed to compute all interval labels is minimal. Modulo rounding errors, finding such an optimal solution corresponds to determining optimal tolerances for the leaves of the dag (which is equivalent to determining the lengths of the intervals associated to the leaves).

Remark 6The above modeling does simplify reality a bit: in practice, some of the expressions

may themselves depend on the way we compute things. Also, we may require

-approximations for the

during the intermediate computations, and in an arbitrary order. A more elaborate model would therefore consider a sequence of problems of the above type, with growing dags, growing sets

and decreasing

Clearly, it is difficult to efficiently find optimal certified solutions. In many cases, it should nevertheless be possible to find almost optimal solutions for which the cost of the

-approximations exceeds the optimal cost by at most a constant factor.

We have already pointed out that our strategies are non-optimal in general (see remarks 1 and 4, and the discussion on the irregular case in section 5). Nevertheless, in some special cases, we do approach optimality.

First of all, if we are computing with expression trees (and not dags), then the problem in remark 1 cannot occur and the precision loss in remark 4 is at most logarithmic in the size of the expression.

Secondly, assume that all approximation algorithms for the

have a polynomial time complexity

, and consider a computation which essentially spends its time in high precision arithmetic. Then the precision loss in remark 4 only causes a slow-down by a constant factor, since

References

[1]: D. Richardson, How to recognize zero, J.S.C. 24 (1997) 627–645.
[2]: J. van der Hoeven, Zero-testing, witness conjectures and differential diophantine approximation, Tech. Rep. 2001-62, Prépublications d'Orsay (2001).
[3]: J. Blanck, V. Brattka, P. Hertling (Eds.), Computability and complexity in analysis, Vol. 2064 of Lect. Notes in Comp. Sc., Springer, 2001.
[4]: J. Blanck, General purpose exact real arithmetic, Tech. Rep. CSR 21-200, Luleå University of Technology, Sweden, http://www.sm.luth.se/~jens/ (2002).
[5]: M. Grimmer, K. Petras, N. Revol, Multiple precision interval packages: Comparing different approaches, Tech. Rep. RR 2003-32, LIP, École Normale Supérieure de Lyon (2003).
[6]: J. van der Hoeven, Automatic asymptotics, Ph.D. thesis, École polytechnique, France (1997).
[7]: J. van der Hoeven, Relax, but don't be too lazy, JSC 34 (2002) 479–542.
[8]: J. van der Hoeven, GMPX, a C-extension library for gmp, http://www.math.u-psud.fr/~vdhoeven/, no longer maintained (1999).
[9]: J. van der Hoeven, Mmxlib, a C++ core library for Mathemagix, http://www.math.u-psud.fr/~vdhoeven/ (2003–2004).
[10]: N. Müller, iRRAM, exact arithmetic in C++, http://www.informatik.uni-trier.de/iRRAM/ (2000–2004).
[11]: N. Revol, MPFI, a multiple precision interval arithmetic library, http://perso.ens-lyon.fr/nathalie.revol/software.html (2001–2004).
[12]: T. Granlund, Others, GMP, the GNU multiple precision arithmetic library, http://www.swox.com/gmp (1991–2004).
[13]: G. Hanrot, V. Lefèvre, K. Ryde, P. Zimmermann, MPFR, a c library for multiple-precision floating-point computations with exact rounding, http://www.mpfr.org (2000–2004).
[14]: J. van der Hoeven, Lazy multiplication of formal power series, in: W. W. Küchlin (Ed.), Proc. ISSAC '97, Maui, Hawaii, 1997, pp. 17–20.

1Introduction

2A priori error estimates

3A posteriori error estimates

4Relaxed evaluation

5Balanced error estimates

6Conclusion and notes on optimality

References