|
A real number is said to be effective if there exists an algorithm which, given a required tolerance , returns a binary approximation for with . Effective real numbers are interesting in areas of numerical analysis where numerical instability is a major problem.
One key problem with effective real numbers is to perform intermediate computations at the smallest precision which is sufficient to guarantee an exact end-result. In this paper we first review two classical techniques to achieve this: a priori error estimates and interval analysis. We next present two new techniques: “relaxed evaluations” reduce the amount of re-evaluations at larger precisions and “balanced error estimates” automatically provide good tolerances for intermediate computations.
Keywords: effective real number, algorithm, interval analysis, error estimates.
A dyadic number is a number of the form with . We denote by the set of dyadic numbers and . Given and , an -approximation for is a number with . An approximation algorithm for is an algorithm which takes a tolerance on input and which returns an -approximation for . A real number is said to be effective, if it admits an approximation algorithm. The aim of this paper is to review several recent techniques for computations with effective real numbers and to present a few new ones.
Effective real numbers can be useful in areas of numerical analysis where numerical instability is a major problem, like singularity theory. In such areas, multiple precision arithmetic is often necessary and the required precisions for auxiliary computations may heavily vary. We hope that the development of an adequate computational theory for effective real numbers will both allow us to automatically perform the error analysis and compute the required precisions at which computations have to take place.
We recall that there exists no general zero-test for effective real numbers. Nevertheless, exact or heuristically reliable zero-tests do exist for interesting subfields, which contain transcendental constants. However, this topic will not be studied in this paper and we refer to [1, ?; 2, ?] for some recent work on this matter.
In an object-oriented language like
In some cases, a priori error estimates are quite pessimistic. An alternative technique for computing with effective real numbers is interval arithmetic. In this case, we approximate an effective real number by an interval which contains . Then the evaluation of comes down to the determination of an interval with
For continuous functions , when starting with a very precise approximation of (i.e. is very small), the obtained a posteriori error estimate for will also become as small as desired. In section 3, this technique of a posteriori error estimates will be reviewed in more detail, as well as an efficient representation for high-precision intervals.
Unfortunately, in some cases, both a priori and a posteriori error estimates are quite pessimistic. In sections 4 and 5, we will therefore present two new techniques for the computation of “adaptive error estimates”. These techniques combine the advantages of a priori and a posteriori error estimates, while eliminating their major disadvantages. Moreover, this new technique remains reasonably easy to implement in a general purpose system for computations with effective real numbers. Under certain restrictions, the new techniques are also close to being optimal. This point will be discussed in section 6.
The approaches presented in sections 2 and 3
have been proposed, often independently, by several authors [3, ?; 4, ?;
5, ?] and we notice the existence of several similarities with the
theory of effective power series [6, ?; 7, ?]. In the past, we
experimentally implemented the approaches of sections 2 and
3 in the case of power series (not officially distributed)
resp. real numbers [8, ?]. We are currently working on a
more robust C++ library based on the ideas in this paper [9, ?].
Currently, this library contains the basic arithmetic operations. Some
other implementations with similar objectives are currently known to the
author [10, ?; 11, ?]. All these implementations are free software and
they are mainly based on the
A natural way to represent an effective real number is by its approximation algorithm. Conceptually speaking, this means that we view as a black box which can be asked for approximations up to any desired precision:
In an object oriented language like C++, this can be implemented using an abstract base class real_rep with a virtual method for the approximation algorithm. Effective real numbers will then be pointers to real_rep. For instance,
class real_rep {
public:
virtual dyadic approx (const dyadic& tol) = 0;
};
typedef real_rep* real;
Here dyadic stands for the class of dyadic numbers. In practice, these may be taken to be arbitrary precision floating point numbers. For simplicity, we also do not care about memory management. In a real implementation, one would need a mechanism for reference counting or conservative garbage collection.
Now assume that we want to evaluate for a given tolerance , where are effective real numbers and is an operation like , or . In order to make effective, we need to construct an approximation algorithm for as a function of . This both involves the dyadic approximation of the evaluation of in dyadic approximations of the and the determination of tolerances for the dyadic approximations of the . More precisely, should be such that for any with , we have
where stands for a dyadic approximation algorithm for which depends on the tolerance . For instance, in the case when is the addition, we may use exact arithmetic on (so that for all ) and take . This yields the following class for representing sums of real numbers:
class add_real_rep: public real_rep {
real x, y;
public:
add_real_rep (const real& x2, const real& y2):
x (x2), y (y2) {}
dyadic approx (const dyadic& tol) {
return x->approx (tol >> 1) + y->approx (tol
>> 1); }
};
The addition can now be implemented as follows:
inline real operator + (const real& x, const real& y)
{
return new add_real_rep (x, y); }
Notice that, in a sense, we have really represented the sum of and by the expression (more generally, such expressions are dags (directed acyclic graphs)). Nevertheless, the representation using an abstract class real_rep provides additional flexibility. For instance, we may attach additional information to the class real_rep, like the best currently known approximation for the number (thereby avoiding unnecessary recomputations). In practice, it is also good to provide an additional abstract method for computing a rough upper bound for the number. This gives a fine-grained control over potential cancellations.
The above approach heavily relies on the computation of a priori error estimates (i.e. the computation of the ). If no additional techniques are used, then this leads to the following disadvantages:
We do not take advantage of the fact that the numeric evaluation of may lead to an approximation of which is far better than the required tolerance . Indeed, multiple precision computations are usually done with a precision which is a multiple of the number of bits in a machine word. “On average”, we therefore gain something like bits of precision. In section 4, we will show that it may actually be profitable to systematically compute more than necessary.
The error estimates may be pessimistic, due to badly balanced expressions. For instance consider the -approximation of a sum , which corresponds to a tree
Then the above technique would lead to the computation of an -approximation of for and an -approximation of . If is large, then is unnecessarily small, since a mere -approximation for each would do. In section 5 we will consider a general technique for computing “balanced error estimates”. Badly balanced expressions naturally occur when evaluating polynomials using Horner's rule.
An alternative technique for computing with effective real numbers is interval arithmetic. The idea is to systematically compute intervals approximations instead of floating point approximations. These intervals must be such that the real numbers we are interested in are certified to lie in their respective interval approximations.
More precisely, given and , an -interval for is a closed interval with and . Concretely speaking, we may represent such intervals by their endpoints and . Alternatively, if the precisions of and are large, then it may be more efficient to represent the interval by its center and its radius . Indeed, the exact endpoints of the interval are not that important. Hence, modulo a slight increase of the radius, we may always assume that the radius can be stored in a “single precision dyadic number” , where is the number of bits in a machine word. This trick allows to reduce the number of multiple precision computations by a factor of two.
Now assume that we want to compute an -approximation for , where are effective real numbers and where is a continuous function. Assume also that we have reasonable initial -intervals for the . Then, starting with a low precision , we first compute -intervals for the . We next evaluate using interval arithmetic. This yields an interval with
If , then is an -approximation for . Otherwise, we increase the precisions and repeat the computations. Under relatively mild assumptions on the way we evaluate , this procedure will eventually stop, since is continuous.
Although this technique of a posteriori error estimates does solve the problems P1 and P2 raised in the previous section, it also induces some new problems. Most importantly, we have lost the fine-grained control over the precisions in intermediate computations during the evaluation of . Indeed, we have only control over the overall starting precision . This disadvantage is reflected in two ways:
It is not clear how to increase . Ideally speaking, should be increased in such a way that the computation time of is doubled at each iteration. In that case (see section 4), the overall computation time is bounded by a constant time the computation time w.r.t. the least precision which leads to an -computation for . Now the problem is that this “overall computation time” of can be estimated well by hand for elementary operations like , , , etc., but not necessarily for more complicated functions.
Consider the case when, somewhere during the evaluation of , we need to compute the sum of a very large number and a very small number . Assume moreover that can be approximated very fast, but that the approximation of requires a lot of time. Since is very small w.r.t. , it will then be possible to skip the computation of , by setting , unless is very large. However, the above technique of a posteriori error estimates does not allow for this optimization.
Relaxed evaluation can be used in order to “combine the good ideas” of sections 2 and 3. The idea is to endow real_rep with an extra field interval best which corresponds to the best currently known interval approximation of the real number. Moreover, when requesting for a better approximation, we will actually compute a much better approximation, so as to avoid expensive recomputations when we repeatedly ask for slightly better approximations. This anticipating strategy was first introduced in the case of power series, where it leads to important algorithmic gains [14, ?; 7, ?]. In our context, the proposed method is quite different though, because of the problem with carries.
More precisely, assume that we have an -ary operation , such that the -digit approximation of at dyadic numbers with digits has time complexity . The complexity for an elementary operation is usually a well-understood, regular function (in general, is increasing, etc.). For instance, we have for addition and for multiplication, where and decreases slowly from to .
Now assume that we have an -digit approximation of and that we request a slightly better approximation. Then we let be such that and we replace our -digit approximation by an -digit approximation. This strategy has the property that the successive evaluation of at digits requires a time where are such that and for . Consequently
More generally, the evaluation of at different precisions requires at most four times as much time as the evaluation of at precision .
The combination of a priori and a posteriori error estimates in the relaxed strategy clearly solves problems P1 and P4. Furthermore, at the level of the class real_rep, we have a fine-grained control over how to increase the computational precision. Moreover, we have shown how to perform this increase in an efficient way, as a function of . This will avoid expensive recomputations when occurs many times in a dag. In other words, the relaxed technique also solves problem P3.
Let us illustrate the relaxed strategy on the concrete example of the computation of the sine function. We assume that we have implemented a suitable class interval for certified arbitrary precision computations with intervals. First of all, real_rep now becomes:
class real_rep {
protected:
interval best;
public:
inline real_rep (): best (interval::fuzzy) {}
virtual interval approx (const dyadic& tol) = 0;
};
Here interval::fuzzy stands for the interval . Sines of numbers are represented by instances of the following class:
class sin_real_rep: public real_rep {
real x;
public:
inline sin_real_rep (const real& x2): x (x2) {
best= interval (-1, 1); }
interval approx (const dyadic& tol);
};
The approximation algorithm is given by
interval sin_real_rep::approx (const dyadic& tol) {
if (tol < radius (best)) {
interval xa= x->approx (tol * (1 - DELTA));
int required_prec= 2 - expo (tol);
int proposed_prec= next_prec (-expo (radius (best)));
xa= truncate (xa, max (required_prec, proposed_prec));
best= sin (xa);
}
return best;
}
This algorithm requires some explanations. First of all, DELTA stands for a small number like , where is the machine word size. We use DELTA for countering the effect of rounding errors in the subsequent computations. Next, expo stands for the function which gives the exponent of a dyadic number. We have for all . The function next_prec (see below) applied to a precision computes the precision such that , where stands for the complexity of the evaluation of the sine function. Now we recall that xa may contain an approximation for x which is much better than the approximation we need (this is typically the case if a more precise value for the argument is needed in another part of the global computation). We therefore truncate the precision of xa at the precision we need, before computing the next approximation for the sine of x. The function next_prec may for instance be implemented as follows:
int next_prec (int prec) {
if (prec <= (16 * WORD_SIZE)) return 1.41 * prec +
WORD_SIZE;
else if (prec <= 256 * WORD_SIZE) return 1.55 * prec;
else if (prec <= 4096 * WORD_SIZE) return 1.71 * prec;
else return 2 * prec;
}
The different thresholds correspond to the precisions where we use naive multiplication, Karatsuba multiplication and two ranges of F.F.T. multiplication. We finally have the following algorithm for computing sines:
inline real sin (const real& x) {
return new sin_real_rep (x); }
Remark
When the above problem occurs repeatedly, we thereby lose the principal advantage of the relaxed approach. Nevertheless, it should be noticed that the slight redundancy in the precision can only occur if this higher precision was requested by another computation than (except when has an exponential computational complexity). In fact, the author currently does not know of any natural example of a computation where the above problem occurs systematically (so as to make the computation time increase by more than a constant factor).
One remedy is to delay the actual evaluation best=sin(xa) until the very end and first gradually increase precisions until we are sure that the total re-evaluation cost is about twice as large as the previous evaluation cost (considering only those nodes which are re-evaluated). However, this is only possible if we do not need the actual result of a re-evaluation during the “prospective phase” in which we increase the precisions.
Yet another approach would be to systematically use the real number analogue of relaxed power series evaluations [7, ?]. However, this requires the re-implementation of all basic operations on floating pointers in a relaxed way. It also involves an additional overhead.
One of the main problems with a priori error estimates that we have mentioned in section 2 are badly balanced expressions, which lead to overly pessimistic bounds. In this section we introduce the general technique of “balanced error estimates” which solve this problem as well as possible in a general purpose system. The idea behind balanced error estimates is to “distribute the required tolerance over the subexpressions in a way which is proportional to their weights”. Here leaves have weight and the of an expression is given by .
Remark
Let us first illustrate the strategy of balanced error estimates in the case of addition. So assume that we want to compute an -interval for a sum , where has weight as an expression and has weight . Then we will compute a -interval for and a -interval for . For example, in case of the expression , a tolerance of is distributed as follows:
Subtraction is treated in a similar way as addition.
Remark
Remark
Then our algorithm requires the computation of a -approximation of each , which takes a time . The optimal algorithm would only require a -approximation of each , which again leads to a global complexity of . This example should be compared to the computation of a -approximation of
Our balanced algorithm only requires a time , which is optimal in this case, whereas a non-balanced algorithm would yield a quadratic complexity .
In the case of multiplication, the above additive distribution of tolerances cannot longer be used and one has to resort to “relative tolerances”. Here a relative tolerance for corresponds to an absolute tolerance of . Now let and be effective real numbers of sizes and . Assume for simplicity that we are in the regular case when we have good starting approximations for and , say of relative tolerances , where is the machine word size. Then, in order to compute an -interval for , we first distribute over and , and then compute a -interval for and a -interval for . The product of these approximations yields an -interval for .
More generally, assume that are effective real numbers of weights and that we want to compute an -interval for , where is a differentiable -ary operation for which we are given an arbitrary precision evaluation algorithm at dyadic intervals. Before balancing tolerances as above, we first have to determine the numbers which have to be subdivided (i.e. in the case of addition and (really and as we will see below) in the case of multiplication).
We define the optimal tolerances for to be the minimal numbers , such that for any intervals with
(1) |
we have . Let us for simplicity that we are in the regular case when there exist (computable) bounds
which are valid for . Then (1) implies in particular that for . Conversely, implies (1). This proves that . For instance, in the case of addition, we may take and we obtain . In the case of multiplication, we may take and and obtain and .
Now, assuming that we have sharp lower bounds for the (like ), we compute a -interval for each . Then we obtain the requested -interval for by evaluating at these intervals at a sufficient precision.
Remark
The approach of balanced error estimates is more delicate in the irregular case. It is instructive to well understand the case of multiplication first; the other operations can probably be treated in a similar way.
So assume that we want to compute a product . Let us first consider the semi-regular case when we still have a good initial approximation for one of the arguments, say , but not for the other. Then we first compute a -interval for . In case when this yields an approximation for with a good relative tolerance, then we may proceed as in the regular case. Otherwise, we obtain at least an upper bound for and we let this upper bound play the role of .
This leaves us with the purely irregular case when we neither have strictly positive lower bounds for nor . In this case, we have the choice between computing a more precise approximation for or for , but it is not clear a priori which choice is optimal from the complexity point of view. One solution to this problem may be to continuously improve the precisions of the approximations for both and , while distributing the available computation time equally over and . It is not clear yet to us how to estimate the complexity of such an algorithm. Another solution, which is easier to implement, is to use rough upper bounds for and instead of and , while increasing the precision continuously. However, this strategy is certainly not optimal in some cases.
We have presented two new “adaptive error estimating techniques” for computing with effective real numbers. We believe these techniques to be optimal in virtually all situations which occur in practice. It remains interesting to study the theoretical optimality in more detail. This first requires the introduction of a suitable notion of optimality.
Assume that we have a library of effective real functions , some of which have arity . Then any computation using this library involves only a finite number of expressions constructed using the . These expressions actually correspond to a directed acyclic graph, since they may share common subexpressions.
Assume that the aim of the computation is to obtain an -approximation for each with for some . A certified solution to this problem consists of assigning a dyadic interval to each node of the dag, such that
The interval associated to each with satisfies .
For each subexpression with , the interval associated to is obtained by evaluating at the intervals associated to .
The certified solution is said to be optimal, if the time needed to compute all interval labels is minimal. Modulo rounding errors, finding such an optimal solution corresponds to determining optimal tolerances for the leaves of the dag (which is equivalent to determining the lengths of the intervals associated to the leaves).
Remark
Clearly, it is difficult to efficiently find optimal certified solutions. In many cases, it should nevertheless be possible to find almost optimal solutions for which the cost of the -approximations exceeds the optimal cost by at most a constant factor.
We have already pointed out that our strategies are non-optimal in general (see remarks 1 and 4, and the discussion on the irregular case in section 5). Nevertheless, in some special cases, we do approach optimality.
First of all, if we are computing with expression trees (and not dags), then the problem in remark 1 cannot occur and the precision loss in remark 4 is at most logarithmic in the size of the expression.
Secondly, assume that all approximation algorithms for the have a polynomial time complexity , and consider a computation which essentially spends its time in high precision arithmetic. Then the precision loss in remark 4 only causes a slow-down by a constant factor, since .
D. Richardson, How to recognize zero, J.S.C. 24 (1997) 627–645.
J. van der Hoeven, Zero-testing, witness conjectures and differential diophantine approximation, Tech. Rep. 2001-62, Prépublications d'Orsay (2001).
J. Blanck, V. Brattka, P. Hertling (Eds.), Computability and complexity in analysis, Vol. 2064 of Lect. Notes in Comp. Sc., Springer, 2001.
J. Blanck, General purpose exact real arithmetic, Tech. Rep. CSR 21-200, Luleå University of Technology, Sweden, http://www.sm.luth.se/~jens/ (2002).
M. Grimmer, K. Petras, N. Revol, Multiple precision interval packages: Comparing different approaches, Tech. Rep. RR 2003-32, LIP, École Normale Supérieure de Lyon (2003).
J. van der Hoeven, Automatic asymptotics, Ph.D. thesis, École polytechnique, France (1997).
J. van der Hoeven, Relax, but don't be too lazy, JSC 34 (2002) 479–542.
J. van der Hoeven, GMPX, a C-extension library for gmp, http://www.math.u-psud.fr/~vdhoeven/, no longer maintained (1999).
J. van der Hoeven, Mmxlib, a C++ core library for Mathemagix, http://www.math.u-psud.fr/~vdhoeven/ (2003–2004).
N. Müller, iRRAM, exact arithmetic in C++, http://www.informatik.uni-trier.de/iRRAM/ (2000–2004).
N. Revol, MPFI, a multiple precision interval arithmetic library, http://perso.ens-lyon.fr/nathalie.revol/software.html (2001–2004).
T. Granlund, Others, GMP, the GNU multiple precision arithmetic library, http://www.swox.com/gmp (1991–2004).
G. Hanrot, V. Lefèvre, K. Ryde, P. Zimmermann, MPFR, a c library for multiple-precision floating-point computations with exact rounding, http://www.mpfr.org (2000–2004).
J. van der Hoeven, Lazy multiplication of formal power series, in: W. W. Küchlin (Ed.), Proc. ISSAC '97, Maui, Hawaii, 1997, pp. 17–20.