|
. This work was partially supported by the ANR Gecko project.
Let and be two convergent power series in or , whose first terms are given numerically with a -bit precision for a fixed constant . Assuming that , we will show in this paper that the first coefficients of can be computed with a -bit precision in time . Using Newton iteration, a similar complexity bound holds for power series reversion of . Our method relies on fast multi-point evaluation, which will be recalled and further detailed for numeric polynomials. We also discuss relaxed variants of our algorithm.
|
Let be an effective ring of coefficients (i.e. we have algorithms for performing the ring operations). Let be two power series with , so that the composition is well-defined. We are interested in algorithms for fast composition: given and , how much arithmetic operations in are needed in order to compute ?
A first efficient general purpose algorithm of time complexity was given in [BK78, CT65, SS71]. Here denotes the complexity for the multiplication of two polynomials of degrees and we have [CK91]. In the case when is polynomial [BK78] or algebraic [vdH02], then this complexity further reduces to . For some very special series , there even exist algorithms; see [BSS08] for an overview. In positive characteristic , right composition can also be performed in quasi-linear time [Ber98].
In this paper, we are interested in efficient algorithms when is a ring of numbers, such as , or a ring of floating point numbers. In that case, we are interested in the bit complexity of the composition, which means that we also have to take into account the bit precision of the underlying integer arithmetic. In particular, we will denote by the time needed for multiplying two -bit integers. We have [SS71] and even [Für07], where satisfies . If all coefficients , and correspond to -bit numbers, then we will search for a composition algorithm which runs in quasi-linear time . Throughout the paper, we will assume that is increasing and .
In section 2, we start by reviewing multiplication and division of polynomials with numeric coefficients. Multiplication of two polynomials of degrees with -bit coefficients can be done in time using Kronecker's substitution [Kro82]. Division essentially requires a higher complexity [Sch82], due to the fact that we may lose bits of precision when dividing by a polynomial of degree . Nevertheless, we will see that a best possible complexity can be achieved in terms of the output precision.
In section 3, we will study multi-point evaluation of a polynomial at points in the case of numeric coefficients. Adapting the classical binary splitting algorithm [AHU74] of time complexity to the numeric context, we will prove the complexity bound . In the case when , this complexity bound is again non-optimal in many cases. It would be nice if a bound of the form could be achieved and we will present some ideas in this direction in section 3.2.
In section 4, we turn to the problem of numeric power series composition. Under the assumptions and , we first show that the computation of the first coefficients of a convergent power series is essentially equivalent to its evaluation at equally spaced points on a given circle (this fact was already used in [Sch82], although not stated as explicitly). The composition problem can therefore be reduced to one fast Fourier transform, one multi-point evaluation and one inverse Fourier transform, leading to a complexity . We first prove this result under certain normalization conditions for floating point coefficients. We next extend the result to different types of numeric coefficients (including integers and rationals) and also consider non-convergent series. The good complexity of these algorithms should not come as a full surprise: under the analogy numbersseries, it was already known [BK77] how to compose multivariate power series fast.
In the last section, we conclude by giving relaxed versions [vdH02] of our composition algorithm. This allows for the resolution of functional equations for numeric power series, involving composition. Unfortunately, we did not yet achieve a quasi-linear bound, even though some progress has been made on the exponent. The special case of power series reversion may be reduced more directly to composition, using Newton's method [BK75].
Let be a normed vector space. Given and , we say that is an -approximation of if . We will also say that is an approximation of with absolute error or an absolute precision of bits. For polynomials , we will use the norm , and notice that . For vectors , we will use the norm .
Given two polynomials with , , and , take . Then the product can be read off from the integer product , since the coefficients of the result all fit into bits. This method is called Kronecker multiplication [Kro82] and its cost is bounded by . Clearly, the method generalizes to the case when or . A direct consequence for the computation with floating point polynomials is the following:
Proof. Let and consider polynomials with
By assumption, the bit-sizes of the coefficients of and are bounded by . Therefore, we may compute the exact product in time , using Kronecker multiplication. We have
This proves that is a -approximation of . One may optionally truncate the mantissas of the coefficients of so as to fit into bits.
Remark
Let and . Let be such that and . Given , we may compute a -approximation of in time .
Proof. Assume that we are given an approximation of . Then we may compute a better approximation using the classical Newton iteration
(1) |
If is “small”, then
(2) |
is about twice as “small”. We will apply the Newton iteration for a polynomial whose first coefficients are good approximations of and the remaining coefficients less good approximations.
Given a polynomial , we write
Assume that and , with . Setting and , we have
The relation (2) yields
whence
When starting with , we may take . If is sufficiently large, then one Newton iteration yields . Applying the Newton iteration once more for , we obtain . In view of lemma 1 and the assumption , the cost of two such Newton iterations is bounded by for a suitable constant .
We are now in a position to prove the lemma. Since an increase of by leaves the desired complexity bound unaltered, we may assume without loss of generality that . Then the cost of the inversion at order satisfies . We conclude that .
Remark
Assuming , it follows that
We may thus take . Inversely, we have
In other words, our choice of is at most a factor two worse than the optimal value.
Remark
Choose sufficiently small and sufficiently large, such that
(3) |
Evaluate at where , using one direct FFT for the polynomial and scalar inversions.
Let be the polynomial of degree we recover when applying the inverse FFT on . Then (3) implies . Consequently, .
Because of (5) below, we may always take and , which gives a complexity bound of the form . In fact the FFT of a numeric -bit polynomial of degree can be computed in time [Sch82, Section 3], which drops the bound further down to .
In practice, we may also start with and double until the computed approximation of satisfies the equation up to a sufficient precision. This leads to the same complexity bound as in the lemma.
It is not clear which of the two methods is most efficient: Newton's method performs a certain amount of recomputations, whereas the alternative method requires us to work at a sufficiently large degree for which (3) holds.
Given power series and , where , we will say that majorates and write if for all coefficients of . This notation applies in particular to polynomials.
Proof. Then and , whence
We conclude by applying lemma 3 for and .
The following result was first proved in [Sch82, Section 4].
for with for all . Consider the Euclidean division
with . Given , we may compute a -approximations of and in time .
Proof. Setting and , we may write
Setting , we then have
Our lemma now follows from lemmas 6 and 1.
Remark
For applications where Euclidean division is used as a subalgorithm, it is necessary to investigate the effect of small perturbations of the inputs and on the outputs and .
Proof. With the notations of the proof of lemma 7, let and be the truncations of the power series and at order . Let us first consider the case when , so that
Then and
Let us next consider the case when . Then
whence and
In general, the successive application of the second and the first case yield
We have also seen in the proof of lemma 6 that , and .
Proof. With the notations from the proof of lemma 7, we have
whence
and
Consider a complex polynomial
which has been normalized so that . Let
be pairwise distinct points with for all . The problem of multi-point evaluation is to find an efficient algorithm for the simultaneous evaluations of at . A -approximation of will also be called a -evaluation of at . In order to simplify our exposition, we will assume that is a power of two; this assumption only affects the complexity analysis by a constant factor.
An efficient and classical algorithm for multi-point evaluation [AHU74] relies on binary splitting: let , . Denoting by the remainder of the Euclidean division of by , we compute
Then
In other words, we have reduced the original problem to two problems of the same type, but of size .
For the above algorithm to be fast, the partial products and are also computed using binary splitting, but in the opposite direction. In order to avoid recomputations, the partial products are stored in a binary tree of depth . At depth , we have nodes, labeled by polynomials , where
For instance, for , the tree is given by
We have
For a given polynomial of degree we may now compute
This second computation can be carried out in place. At the last stage, we obtain
More generally,
(8) |
for all and .
Proof. In the algorithm, let and assume that all multiplications (6) and all euclidean divisions (7) are computed up to an absolute error . Let and denote the results of these approximate computations. We have
It follows by induction that
From (8) and lemma 10, we have
(11) |
In view of (7) and lemma 9, we also have
By induction over , we obtain
Using our choice of , we conclude that
Let us now estimate the complexity of the algorithm. In view of (9) and lemma 1, we may compute in time . In view of (11) and lemma 7, the computation of takes a time . For fixed , the computation of all and thus takes a time . Since there are stages, the time complexity of the complete algorithm is bounded by .
If , then the bound from lemma 11 reduces to , which is not very satisfactory. Indeed, in the very special case when for , we may achieve the complexity , by using the fast Fourier transform. In general, we may strive for a bound of the form . We have not yet been able to obtain this complexity, but we will now describe some partial results in this direction. Throughout the section, we assume that .
Proof. Let , and for all . Each of the points is at the distance of at most to one of the points . We will compute using a Taylor series expansion of at . We first observe that
(12) |
Using fast Fourier transforms of size , we first compute -approximations of for all and . This can be done in time
From (12) it follows that
The -evaluation of sums of the form using Horner's rule takes a time .
Proof. The proof of the above proposition adapts to the case when for all . Let us now subdivide the unit disk into annuli and the disk of radius . For a fixed annulus, we may evaluate at each of the inside the annulus in time . For , we have
provided that . Consequently, taking , we may evaluate at the remaining points in time . Under the condition , and up to logarithmic terms, the sum
becomes minimal for and .
Proof. We will only provide a sketch of the proof. As in the proof of proposition 12, we first compute the Taylor series expansion of at up to order . This task can be performed in time via a division of by followed by a Taylor shift. We next evaluate this expansion at for all . According to proposition 16, this can be done in time .
Proof. Again, we will only provide a sketch of the proof. The main idea is to use the binary splitting algorithm from the previous subsection. Let . If for all , then we obtain and the algorithm reduces to a fast Fourier transform. If each is a slight perturbation of , then becomes a slight perturbation of . In particular, the Taylor coefficients of only have a polynomial growth in . Consequently, the Euclidean division by accounts for a loss of at most instead of bits of precision. The binary splitting algorithm can therefore be carried out using a fixed precision of bits.
Sometimes, it is possible to decompose , such that a fast multi-point evaluation algorithm is available for each set of points . This leads to the question of evaluating at points.
Proof. Without loss of generality, we may assume that and are powers of two. We decompose as follows:
We may compute -evaluations of the at in time . We may also -approximate in time , using binary powering. Using Horner's rule, we may finally -evaluate
for in time .
Let be a convergent power series with radius of convergence . Given , we denote
Using Cauchy's formula, we have
(13) |
For with , it follows that
(14) |
Let be the time needed to compute -approximations of . Given , let be the time needed to -evaluate at for .
Proof. We first consider the case when . Let be the smallest power of two with
(15) |
For all , the bound (14) implies
(16) |
The computation of -approximations of takes a time . The -evaluation of at primitive roots of unity can be done using fast Fourier transforms of size . This requires a time . If , we thus obtain the bound
The general case is reduced to the case via a change of variables . This requires computations to be carried out with an additional precision of bits, leading to the complexity bound in (a).
As to (b), we again start with the case when . Taking to be the smallest power of two with (15), we again obtain (16) for all . If , then we also notice that for all . We may -evaluate at the primitive -th roots of unity in time . We next retrieve the coefficients of the polynomial up to precision using one inverse fast Fourier transform of size . In the case when , we thus obtain
In general, replacing by yields the bound in (b).
Let us now consider two convergent power series with , and . Then is well-defined and . In fact, the series , and still converge on a compact disc of radius . Let be such that , and . Then (14) becomes
(17) |
and similarly for and . In view of lemma 17, it is natural to use an evaluation-interpolation scheme for the computation of -approximations of .
For a sufficiently large and , we evaluate on the -th roots of unity using one direct FFT on . Using the fact that for all and the algorithm for multi-point evaluation from the previous section, we next compute . Using one inverse FFT, we finally recover the coefficients of the polynomial with for all . Using the tail bound (17) for , , and a sufficiently large , the differences can be made as small as needed. More precisely, the algorithm goes as follows:
Algorithm compose
such that we have bounds , and
, and for certain
Step 1. [Determine auxiliary degree and precision]
Let be smallest with and
Let , so that
Step 2. [Evaluate on roots of unity ]
Let
Compute a -approximation with
We will show that for all
Step 3. [Evaluate on ]
Let
Compute a -approximation
We will show that for all
Step 4. [Interpolate]
Compute a -approximation
We will show that for all
Return
Proof. Let us first prove the correctness of the algorithm. The choice of and the tail bound (17) for imply . This ensures that we indeed have for all at the end of step 2. For , we also have
This proves the bound stated at the end of step 3. As to the last bound, let . Then and . Using the fact that for all vectors of length , we obtain
This proves the second bound. Our choice of implies the correctness of the algorithm.
As to the complexity bound, the FFT transform in step 2 can be done in time
By lemma 11, the multi-point evaluation in step 3 can be performed in time
The inverse FFT transform in step 4 can be performed within the same time as a direct FFT transform. This leads to the overall complexity bound
since .
Proof. Let be sufficiently small such that and . Consider the series and , so that
Let . By the theorem, we may compute -approximations of in time . Using additional -bit multiplications, we may compute
with for all . This can again be done in time .
Proof. It suffices to take in corollary 19 and round the results.
Proof. Applying corollary 19 for , we obtain -approximations of in time . Using additional -bit multiplications of total cost , we may compute
with for all . The are obtained by rounding the to the nearest integers.
Proof. Without loss of generality, we may assume that for . In the proof of corollary 19, we may therefore choose , which yields and . It follows that and we conclude in a similar way as in the proof of corollary 19.
Let be such that and . Given an order , we have shown in the previous section how to compute efficiently, as a function of and . Since only depends on and , we may consider the relaxed composition problem in which and are given progressively, and where we require each coefficient to be output as soon as and are known. Similarly, in the semi-relaxed composition problem, the coefficients are known beforehand, but are given progressively. In that case, we require to be output as soon as are known. The relaxed and semi-relaxed settings are particularly useful for the resolution of implicit equations involving functional composition. We refer to [vdH02, vdH07] for more details about relaxed computations with power series.
Let . In this section, we consider the problem of computing the semi-relaxed composition up to order . If , then we simply have . For , we denote
and similarly for and . Our algorithm relies on the identity
(18) |
The first part of is computed recursively, using one semi-relaxed composition at order :
As soon as is completely known, we compute at order , using one of the algorithms for composition from the previous section. We also compute at order using binary powering. We are now allowed to recursively compute the second part of , using
This involves one semi-relaxed composition at order and one semi-relaxed multiplication at order .
Assume now that for all and . Consider the problem of computing -approximations of . In our algorithm, it will suffice to conduct the various computations with the following precisions:
We recursively compute -approximations of .
We compute -approximations of .
We recursively compute -approximations of .
We compute -approximations of the coefficients of .
We compute -approximations of the coefficients of .
Let us show that this indeed enables us to obtain the desired -approximations for . This is clear for the coefficients of . As to the second half, we have the bounds
and
These bounds justify the extra number of bits needed in the computations of and respectively.
Let us finally analyze the complexity of the above algorithm. We will denote by the complexity of multiplying two -bit integer polynomials of degrees . Using Kronecker multiplication, we have . We denote by the cost of a semi-relaxed multiplication of two -bit integer polynomials of degrees . Using the fast relaxed multiplication algorithm from [vdH02], we have . We will denote by and the cost of classical and semi-relaxed composition for , and as described above. By theorem 18, we have . The complexity of the semi-relaxed composition satisfies the following recursive bound:
Using theorem 18, it follows by induction that
(19) |
From [BK75, vdH02], we also have
(20) |
Applying (19) for and (20) on , we obtain
We have proved the following theorem:
Let be as in the previous section and . Assume also that . In order to compute the relaxed composition , we again use the formula (18), in combination with
The first coefficients of are still computed recursively, by performing a relaxed composition at order . As soon as and are completely known, we may compute the composition at order using the algorithm from section 4. Differentiation and division by also yields at order . The product can therefore be computed using one semi-relaxed multiplication.
Let . This time, the intermediate computations should be conducted with the following precisions:
We recursively compute -approximations of the coefficients of .
We compute -approximations of the coefficients of .
We compute -approximations of the coefficients of .
We compute -approximations of the coefficients of .
Indeed, we have
Denoting by the complexity of relaxed composition, we obtain
It follows that
Remark
Theorem 24 improves on this bound in the frequent case when . Unfortunately, we have not yet been able to prove a quasi-linear bound .
Remark
can be evaluated efficiently on small disks. Consequently, the Taylor coefficients of can be computed efficiently using lemma 17.
A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974.
D.J. Bernstein. Composing power series over a finite ring in essentially linear time. JSC, 26(3):339–341, 1998.
R.P. Brent and H.T. Kung. algorithms for composition and reversion of power series. In J.F. Traub, editor, Analytic Computational Complexity, Pittsburg, 1975. Proc. of a symposium on analytic computational complexity held by Carnegie-Mellon University.
R.P. Brent and H.T. Kung. Fast algorithms for composition and reversion of multivariate power series. In Proc. Conf. Th. Comp. Sc., pages 149–158, Waterloo, Ontario, Canada, August 1977. University of Waterloo.
R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.
Alin Bostan, Bruno Salvy, and Éric Schost. Power series composition and change of basis. In J. Rafael Sendra and Laureano González-Vega, editors, ISSAC, pages 269–276, Linz/Hagenberg, Austria, July 2008. ACM.
D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.
L. Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. Jour. für die reine und ang. Math., 92:1–122, 1882.
A. Schönhage. Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients. In J. Calmet, editor, EUROCAM '82: European Computer Algebra Conference, volume 144 of Lect. Notes Comp. Sci., pages 3–15, Marseille, France, April 1982. Springer.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
Joris van der Hoeven. New algorithms for relaxed multiplication. JSC, 42(8):792–802, 2007.