|
. This work has
been supported by the ANR-09-JCJC-0098-01
The technique of relaxed power series expansion provides an efficient way to solve so called recursive equations of the form , where the unknown is a vector of power series, and where the solution can be obtained as the limit of the sequence . With respect to other techniques, such as Newton's method, two major advantages are its generality and the fact that it takes advantage of possible sparseness of . In this paper, we consider more general implicit equations of the form . Under mild assumptions on such an equation, we will show that it can be rewritten as a recursive equation. If we are actually computing with analytic functions, then recursive equations also provide a systematic device for the computation of verified error bounds. We will show how to apply our results in this context.
|
Let be an effective field of constants of characteristic zero. This means that elements in can be encoded by data structures on a computer and that we have algorithms for performing the field operations of .
Let be a column vector of indeterminate series in . We may also consider as a power series . Let be a column vector of expressions built up from , and constants in using ring operations, differentiation and integration (with constant term zero). Finally, let be a finite number of initial conditions. Assume that the system
(1) |
admits a unique solution . In this paper, we are interested in the efficient computation of this solution up to a given order .
In the most favourable case, the equation is of the form
where the coefficient of in only depends on earlier coefficients of , for each . In that case,
actually provides us with a recurrence relation for the computation of the solution. Using the technique of relaxed power series expansions [vdH02, vdH07a], which will briefly be recalled in section 2.4, it is then possible to compute the expansion up till order in time
where is the number of multiplications occurring in , is the total size of as an expression, and denotes the complexity of relaxed multiplication of two power series up till order . Here we assume that is represented by a directed acyclic graph, with possible common subexpressions. For large , we have , where denotes the complexity [CT65, SS71, CK91] of multiplying two polynomials of degrees . If admits sufficiently many -th roots of unity, then we even have and . For moderate , when polynomial multiplication is done naively or using Karatsuba's method, relaxed multiplication is as efficient as the truncated multiplication of polynomials at order .
One particularly important example of an equation of the above type is the integration of a dynamical system
where is algebraic (i.e. does not involve differentiation or integration). In that case, given the solution up till order , we may consider the linearized system
up till order , where stands for the Jacobian matrix associated to at . If we have a fundamental system of solutions of up till order , then one step of Newton's method allows us to find the solution of (4) and a new fundamental system of solutions of the linearized equation up till order [BK78, BCO+06]. A careful analysis shows that this leads to an algorithm of time complexity
In [vdH10], this bound has been further improved to
under the assumptions that admits sufficiently many -th roots of unity and that .
Although the complexity (5) is asymptotically better than (3) for very large , the relaxed approach often turns out to be more efficient in practice. Indeed, Newton's method both suffers from a larger constant factor and the fact that we profit less from the potential sparsity of the system. In particular, if , then the relaxed approach is generally faster. Moreover, as long as multiplications are done in the naive or Karatsuba model, the relaxed approach is optimal in the sense that the computation of the solution takes roughly the same time as its verification. Another advantage of the relaxed approach is that it generalizes to more general functional equations and partial differential equations.
Let us now return to our original implicit system (1). If is a system of differentially algebraic equations, then we may also seek to apply Newton's method. For non degenerate systems and assuming that we have computed the solution and a fundamental system of solutions for the linearized equation up till order , one step of Newton's method yields an extension of the solutions up till order , for a fixed constant . From an asymptotic point of view, this means that the complexities (5) and (6) remain valid, modulo multiplication of by the differential order of the system in these bounds.
Another approach for the resolution of (1) is to keep differentiating the system with respect to until it becomes equivalent to a system of the form (2). For instance, if is algebraic, then differentiation of (1) yields
Consequently, if is invertible, then
provides us with an equivalent system which can be solved by one of the previous methods. Unfortunately, this method requires the computation of the Jacobian, so we do not longer exploit the potential sparsity of the original system.
Yet another recent approach [vdH09b] is to consider not yet computed coefficients of as formal unknowns, and solve the system of equations for increasing values of . For large , the system usually reduces to a linear system of equations. In particular, the coefficients of series with unknown coefficients are not polynomials but merely linear combinations. Using the so called “substitution product”, the multiplication of series with unknown coefficients can be done while taking advantage of this linearity.
In this paper, we will present a variant of the approach of [vdH09b]. Roughly speaking, we reconsider the series with unknown coefficients as vectors of partially unknown series. Technically speaking, this is done via the concept of anticipators, which will be introduced in section 3. Using this technique, and under mild assumptions, we show in section 4.1 how to rewrite the original system of equations into a new system which is both recursive and not much larger in size. We may then apply a standard relaxed algorithm for its resolution. This leads to slightly sharper complexity bounds than those from [vdH09b], which will be presented in section 4.2.
The main interest of the new technique though is the fact that it allows for the direct resolution of implicit equations by existing software for recursive equations. Moreover, recursive equations naturally occur in the area of reliable computing [Moo66, AH83, Neu90, MKC09, Rum10]. In this setting, our power series are replaced by so called Taylor models [MB96, MB04], which systematically require certified bounds for the remainders on polydisks. In section 5, we will show how our results apply in this case. In particular, the new algorithms are an asymptotically efficient device for the certified resolution of implicit equations and integration of dynamical systems on implicitly defined varieties.
Remark
Assume that we have fixed a set of function symbols, together with an arity for each . Then a dag over is a rooted finite directed acyclic -labeled graph, such that each -labeled node has successors, together with an ordering on the successors. For instance,
is a typical dag for the expression , with , and . We will denote by the number of nodes of a dag (also called its size) and by its number of multiplications (also called its multiplicative size). For our example dag , we thus have and . We will denote by the set of dags over .
More generally, we may consider multivariate dags with an arbitrary number of roots, which again come with ordering. For instance,
is a bivariate dag which represents a vector of two expressions and . We will denote by the number of roots of a multivariate dag , which we will also call its dimension. We will write , where stands for the subdag whose root is the -th root of . We will denote by the set of multivariate dags over of dimension and .
Consider a linear operator . We say that is a coefficientwise operator, if there exist fixed constants such that
for all . For every , the operator defined by
is an example of a coefficientwise operator. The truncation operators , which are defined by
constitute another family of examples. We will denote by and the sets of all operators of the form with resp. with . Finally, we define the coefficientwise operators and by
and we notice that is the inverse of on .
Let be “indeterminate series” in . We will sometimes consider as a series with formal coefficients
Let be a set of coefficientwise linear operators. In what follows, we will take
and denote by the set of dags over . Similarly, we set and . Dags in , and will respectively be called algebraic, differential and integral. Notice that polynomials in are regarded as dags of size , independently of their degree; this is motivated by the fact that coefficient extraction is trivial for explicit polynomials.
Clearly, any dag can be considered as a function . Given a small symbolic perturbation , we may expand as a Taylor series in
and truncation at order yields
We claim that can be regarded as a dag in . For instance, if
then
In general, the claim is easily shown by induction over .
We claim that any dag can be regarded as a series in
such that each coefficient is a dag over
Indeed, by induction over the size of , we first define the valuation of by
We next define the coefficients by another induction over the size of . If , then we take . Otherwise, we take
As a result of the claim, we emphasize that only depends on the coefficients up till order of .
Remark
where is computed using a relaxed multiplication algorithm.
Let us briefly recall the technique of relaxed power series computations, which is explained in more detail in [vdH02]. In this computational model, a power series is regarded as a stream of coefficients . When performing an operation on power series it is required that the coefficient of the result is output as soon as sufficiently many coefficients of the inputs are known, so that the computation of does not depend on the further coefficients. For instance, in the case of a multiplication , we require that is output as soon as and are known. In particular, we may use the naive formula for the computation of .
The additional constraint on the time when coefficients should be output admits the important advantage that the inputs may depend on the output, provided that we add a small delay. For instance, the exponential of a power series may be computed in a relaxed way using the formula
Indeed, when using the naive formula for products, the coefficient is given by
and the right-hand side only depends on the previously computed coefficients .
The main drawback of the relaxed approach is that we cannot directly use fast algorithms on polynomials for computations with power series. For instance, assuming that has sufficiently many -th roots of unity and that field operations in can be done in time , two polynomials of degrees can be multiplied in time , using FFT multiplication [CT65]. Given the truncations and at order of power series , we may thus compute the truncated product in time as well. This is much faster than the naive relaxed multiplication algorithm for the computation of . However, the formula for when using FFT multiplication depends on all input coefficients and , so the fast algorithm is not relaxed. Fortunately, efficient relaxed multiplication algorithms do exist:
In what follows, we will denote by the complexity of relaxed multiplication up till order . Let us now consider a general equation of the form
where an -dimensional dag. We say that (7) is a recursive equation, if each coefficient only depends on earlier coefficients of . That is, for all . In order to solve (7) up till order , we then need to perform relaxed multiplications at order and coefficientwise operations or at order . This yields the following complexity bound:
When solving an implicit equation in using a relaxed algorithm, the coefficients are computed only gradually. During the resolution process, it might happen that we wish to evaluate dags at higher orders than the number of known coefficients of . That is, given and , we might need , even though only are known. In that case, we have a problem, but we may still do the best we can, and compute instead of .
This motivates the introduction of the -th order anticipator of by
where
On the one hand, we will show in this section that can be computed simultaneously by a dag of multiplicative size and total size . On the other hand, we will show that is essentially a linear perturbation of , which can be computed explicitly.
Let us show how to compute a dag for . The following rules are straightforward:
As to multiplication, for , we have
Consequently,
for some polynomial with and (in particular, ). Notice also that
Now assume that and are known. Then we may simultaneously compute in the following way:
This computation involves one series product and additions and scalar multiplications. For large , we may further reduce the cost to since the computation of really comes down to the computation of two truncated power series products and at order . In summary, we obtain
Since only depends on , we notice that
In general, for and , we may expand
Let denote the -th basis element of , so that for all . When considering as a column vector, it follows by linearity that
where is a row matrix with entries
Notice that depends on , but does not.
Let us investigate the functions more closely. If is algebraic, then we have
whence
In particular, is actually constant. If is differential, of differential order (this means that is the maximal number of -nodes on a path from the root of to a leaf), then, considering as a differential polynomial in , we have
whence
is a polynomial of degree at most . Similarly, if is algebraic in , where , then
whence
Consequently, there exists a polynomial of degree with
for all .
For more general integral dags , it can be checked by induction over the size of that is still a rational function in , which remains bounded for , and whose denominator has integer coefficients. Similarly, if , then is a rational function in , whose denominator has integer coefficients.
Assume now that we want to solve a system of power series equations
(14) |
where is a vector of dags and a finite number of initial conditions. For definiteness, it is also important that (14) admits a unique solution . This will be guaranteed by an even stronger technical assumption to be detailed below. Roughly speaking, for a given index , we will assume that each coefficient with can be determined as a function of the previous coefficients using only the equations .
Let . For each and , we introduce the matrix
the block matrix
the , and block column vectors
and the column vector
In view of (10), the equations then translate into
We will say that (14) is -predictive or predictive of index , if, for all , there exist and matrices and , such that
In that case, we have
whence
provides us with an explicit formula for . Now let and be the operators on vectors of power series with the property that and . Then we may rewrite (16) into
This is the desired recursive equation for .
Remark
Let us first consider the case of an algebraic dag . In that case, the matrix does not depend on and its coefficients are explicitly given by (11). We may now determine and matrices and with
using Gaussian elimination in order, and whenever such matrices exist.
The equation
Proof. By what precedes, the operators and in (17) are really the constant matrices from (18). By lemma 6, the size of the righthand side of (17) as a dag is therefore bounded by and its multiplicative size is exactly . The result thus follows from proposition 5.
Assume now that . Then we claim that there exists an algorithms for checking -predictivity and constructing a general formula for the corresponding matrices and . Indeed, we recall from section 3.2 that is the evaluation at of a matrix with entries in and denominators in . We may thus use Gaussian elimination in order to compute and matrices and with entries in and
whenever such matrices exist. For those which are not positive integer roots of one of the denominators of the entries of , we now have and . For each of the finite number of integer roots , we may directly compute the matrices and by Gaussian elimination over , whenever such matrices exist.
Proof. The computation of (and the finite number of exceptional for which is a root of one of the denominators) is a precomputation. The determination of every next can be done in time , via a diagonal translation of and evaluation of the rational functions which are the entries of the bottom submatrix. Now assume that we maintain upper and lower triangular matrices , and a permutation matrix at each stage such that . Then the determination of , and as a function of and can be done in time using naive linear algebra. The determination of and from , and can again be done in time . Consequently, the cost of applying the operators and during the relaxed resolution of (17) at order is bounded by . The cost of the evaluation of the remaining dag is bounded by , as in the algebraic case.
If , then power series solutions to recursive or implicit equations often converge in a small disk around the origin. For instance, if is an algebraic dag and , then the recursive equation
admits a convergent solution, by Cauchy-Kovalevskaya's theorem. Given a point in a sufficiently small disk where converges, we might like to compute an approximation of , together with a bound for the error: . Interval arithmetic [Moo66, AH83, Neu90, MKC09, Rum10] provides a classical framework for deriving such enclosures in a systematic way. We will rather rely on ball arithmetic [vdH09a], which is more suitable for complex and multiple precision arithmetic.
Let us briefly recall the principles behind ball arithmetic. Given a normed vector space , we will denote by or the set of closed balls with centers in and radii in . Given such a ball , we will denote its center by and its radius by . Conversely, given and , we will denote by the closed ball with center and radius .
A continuous operation is said to lift into an operation on balls, which is usually also denoted by , if the inclusion property
is satisfied for any and . For instance, if is a Banach algebra, then we may take
Similar formulas can be given for division and elementary functions.
It is convenient to extend the notion of a ball to more general radius types, which only carry a partial ordering. This allows us for instance to regard a vector of balls as a “vectorial ball” with center and radius . If , then we write if and only if for all . A similar remark holds for matrices and power series with ball coefficients.
In concrete machine computations, numbers are usually approximated by
floating point numbers with a finite precision. Let
be the set of floating point numbers at a given working precision, which
we will assume fixed. It is customary to include the infinities in as well. The IEEE754
standard [ANS08] specifies how to perform basic arithmetic
with floating point numbers in a predictable way, by specifying a
rounding mode (down, up and nearest). A multiple
precision implementation of this standard is available in the
Let and or and . We will denote by or the set of closed balls in with centers in and radii in . In this case, we will also allow for balls with an infinite radius. A continuous operation is again said to lift to an operation on balls if (19) holds for any and . The formulas for the ring operations may now be adapted to
where , and are reliable bounds for the rounding errors induced by the corresponding floating point operations on the centers; see [vdH09a] for more details.
In order to ease the remainder of our exposition, we will avoid technicalities related to rounding problems, and compute with “idealized” balls with centers in and radii in . For those who are familiar with rounding errors, it should not be difficult though to adapt our results to more realistic machine computations.
If we are computing with analytic functions on a disk, or multivariate analytic functions on a polydisk, then Taylor models [MB96, MB04] provide a suitable functional analogue for ball arithmetic. We will use a multivariate setup with as our coordinates and a polydisk for a fixed , where we use vector notation. Taylor models come in different blends, depending on whether we use a global error bound on or individual bounds for the coefficients of the polynomial approximation. Individual bounds are sharper (especially if we truncate up to an small order such that the remainder is not that small), but more expensive to compute. Our general setup will cover all possible blends of Taylor models.
We first need some more definitions and notations. Assume that is given the natural partial ordering. Let denote the -th canonical basis vector of , so that and for . For every , we will write . A subset is called an initial segment, if for any and with , we have . In that case, we write and . In what follows, we assume that and are fixed initial segments of with . For instance, we may take and or or .
Recall that or . Given a series , we will write for its support. Given a subset and a subset , we write and . If is analytic on , then we denote its sup norm by
A Taylor model is a tuple , where , and are as above, and . We will write for the set of such Taylor models. Given and , we will also denote and . Given an analytic function on , we write , if there exists a decomposition
with and for all . Optionally, we may also require that for all . Given two Taylor models , we will say that is included in , and we write if for any . This holds in particular if , in which case we say that is strongly included in and write .
Addition, subtraction and scalar multiplication are defined in a natural way on Taylor models. For multiplication, we need a projection with for all and if . One way to construct such a mapping is as follows. For , we must take . For , let be largest such that . Then we recursively define . Given , we now define their product by
Using the observation that , this product satisfies the inclusion property that for any analytic functions and on . Finally, let be a coefficientwise operator on such that there exist constants with
for any which converges on . Then we say that is bounded and we may lift into an operator on Taylor models by taking
Indeed, the formula (20) implies the inclusion property for any .
Remark
Remark
Most of the definitions and results of sections 2.1 and 2.4 generalize in a straightforward way to the multivariate case. Only the complexity results need additional attention, since needs to be replaced by a suitable multivariate analogue; see [vdH02, LS03, vdH05, vdHL10, vdHS10] for various results in this direction. Let us take
with for all and , and where is set of bounded coefficientwise operators on . Given , consider the equation
This equation is said to be recursive if only depends on with , for every . In that case, the equation admits a unique solution, whose coefficients may be computed using a relaxed algorithm.
Let , , , , and be as in the previous section. Consider a polydisk with . Then any Taylor model can naturally be reinterpreted as a Taylor model in , which we will denote by .
Proof. Given power series and , we will say that is a majorant for , and write , if for all . Let be an arbitrary element of and consider the sequence . By induction, we have . In particular, there exists a bound with for all . For any , it follows that
Since (21) is recursive, the sequence tends coefficientwise to the unique power series solution of (21). More precisely, for any and such that , we have . Therefore, given , the valuation of is at least . Consequently,
In particular,
We conclude that for , whence converges to on the polydisk . Since for all , we also get , by continuity.
Remark
With the notations from the previous section, assume in addition that . Let and let be a power series with ball coefficients. Then the restriction can naturally be reinterpreted as a Taylor model in , which we will denote by . For what follows, we recall that is computed in , whereas is computed in .
Proof. The lemma follows by an easy induction over the size of . Let us for instance treat the case when . Then
for all .
Proof. Let be the exact solution to (21). Let , and
Consider the Taylor model
By lemma 14, we have
for sufficiently small .
Remark
We also notice that the coefficients of which are not in do not depend on . For large expansion orders, it may be wise to implement Taylor models in such a way that these coefficients need not to be recomputed for different values of . In [vdH07b, vdH09a], we describe how to do this in the univariate case.
Let us now return to the system (14) of implicit equations and assume that is algebraic in . We will use a domain of Taylor models with , and . Using the theory of sections 3 and 4.1, we may construct a recursive equation for the solution of (14). By (17), this equation has the form
for certain matrices and whose entries are coefficientwise operators. In order to apply theorems 12 and 15 we need to show that these operators and are bounded.
Let us first show that is bounded. Let be a convergent series on . For any and , we have
We may therefore take for and .
Now consider one of the entries of or . By choosing sufficiently large, we may assume without loss of generality that is given by the evaluation of a rational function in at for all . Moreover, the coefficients of the matrix (15) are given by (13), whence they are bounded for . Consequently, also remains bounded for , and there exists an absolutely convergent expansion
Hence, we may regard as an infinite sum
Taking sufficiently large, we may also ensure that (22) converges for . Consequently, it suffices to take
for all . We may use crude bounds for the remaining with . Combining theorems 9, 12 and 15, we now obtain the following theorem.
In the Taylor model setting, it is interesting to study the solution under small perturbations of the initial conditions. This can be done by switching to a multivariate context as in the previous subsections. In the system (14), we now replace the time by . We also replace the initial conditions by Taylor models in which only depend on , and such that for all . Although the above theory mostly generalizes in a straightforward way, there is a small quirk: when a matrix of the type (15) does not have full rank, then the rank of a small perturbation of it is generally strictly higher. Consequently, -predictivity is not preserved under small perturbations.
Nevertheless, if we restrict our attention to -predictive systems, for which the matrix (15) does have full rank, then small perturbations of the system remain -predictive and the generalization of theorem 17 provides us with the complete flow of the implicit equation at any order. Such -predictive systems are very common in practice and a well known example is the pendulum, whose equations are given by
More generally, dynamical systems on implicitly defined varieties are -predictive systems. Notice that the classical implicit function theorem also falls into this category.
Of course, if the solution is known in a certain disk (as well as some of its derivatives, if necessary), then we may use the value of at a given point as a new initial condition at and compute an analytic continuation of the solution around with the same method. In general -predictivity is preserved throughout this process, whence our method yields an efficient method for high precision integration of dynamical systems on implicitly defined varieties. Predictive systems of a higher index usually degrade into -predictive systems after one analytic continuation. If not, then there is usually an algebraic reason for that, in which case the system can be rewritten into a “simpler system” using differential algebra [Rit50, Kol73, Bou94]. Unfortunately, although the order (or ranking) of the new system is lower, its size as a dag usually explodes.
G. Alefeld and J. Herzberger. Introduction to interval analysis. Academic Press, New York, 1983.
ANSI/IEEE. IEEE standard for binary floating-point arithmetic. Technical report, ANSI/IEEE, New York, 2008. ANSI-IEEE Standard 754-2008. Revision of IEEE 754-1985, approved on June 12, 2008 by IEEE Standards Board.
A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, É. Schost, and A. Sedoglavic. Fast computation of power series solutions of systems of differential equation. preprint, april 2006. submitted, 13 pages.
R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.
J. Berthomieu and R. Lebreton. Relaxed -adic hensel lifting for algebraic systems. Work in preparation, 2011.
F. Boulier. Étude et implantation de quelques algorithmes en algèbre différentielle. PhD thesis, University of Lille I, 1994.
D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
G. Hanrot, V. Lefèvre, K. Ryde, and P. Zimmermann. MPFR, a C library for multiple-precision floating-point computations with exact rounding. http://www.mpfr.org, 2000.
E.R. Kolchin. Differential algebra and algebraic groups. Academic Press, New York, 1973.
G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.
K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.
K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
R.E. Moore, R.B. Kearfott, and M.J. Cloud. Introduction to Interval Analysis. SIAM Press, 2009.
R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, N.J., 1966.
A. Neumaier. Interval methods for systems of equations. Cambridge university press, Cambridge, 1990.
J.F. Ritt. Differential algebra. Amer. Math. Soc., New York, 1950.
S.M. Rump. Verification methods: Rigorous results using floating-point arithmetic. Acta Numerica, 19:287–449, 2010.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
J. van der Hoeven. Lazy multiplication of formal power series. In W. W. Küchlin, editor, Proc. ISSAC '97, pages 17–20, Maui, Hawaii, July 1997.
J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
J. van der Hoeven. Notes on the Truncated Fourier Transform. Technical Report 2005-5, Université Paris-Sud, Orsay, France, 2005.
J. van der Hoeven. New algorithms for relaxed multiplication. JSC, 42(8):792–802, 2007.
J. van der Hoeven. On effective analytic continuation. MCS, 1(1):111–175, 2007.
J. van der Hoeven. Ball arithmetic. Technical report, HAL, 2009. http://hal.archives-ouvertes.fr/hal-00432152/fr/.
J. van der Hoeven. Relaxed resolution of implicit equations. Technical report, HAL, 2009. http://hal.archives-ouvertes.fr/hal-00441977/fr/.
J. van der Hoeven. Newton's method and FFT trading. JSC, 45(8):857–878, 2010.
J. van der Hoeven and G. Lecerf. On the bit-complexity of sparse polynomial multiplication. Technical report, HAL, 2010. http://hal.archives-ouvertes.fr/hal-00476223/fr/.
J. van der Hoeven and É. Schost. Multi-point evaluation in higher dimensions. Technical report, HAL, 2010. http://hal.archives-ouvertes.fr/hal-00477658/fr/.