|
In previous work, we have introduced the technique of relaxed
power series computations. With this technique, it is possible to
solve implicit equations almost as quickly as doing the operations
which occur in the implicit equation. Here “almost as
quickly” means that we need to pay a logarithmic overhead.
In this paper, we will show how to reduce this logarithmic factor
in the case when the constant ring has sufficiently many
|
Let be an effective ring and consider two power
series
and
in
. In this paper we will be
concerned with the efficient computation of the first
coefficients of the product
.
If the first coefficients of
and
are known beforehand, then we may use any
fast multiplication for polynomials in order to achieve this goal, such
as divide and conquer multiplication [KO63, Knu97],
which has a time complexity
,
or F.F.T. multiplication [CT65, SS71,
CK91, vdH02a], which has a time complexity
.
For certain computations, and most importantly the resolution of
implicit equations, it is interesting to use so called “relaxed
algorithms” which output the first
coefficients of
as soon as the first
coefficients of
and
are known for each
.
This allows for instance the computation of the exponential
of a series
with
using the formula
![]() |
(1) |
More precisely, this formula shows that the computation of reduces to one differentiation, one relaxed product and
one relaxed integration. Differentiation and relaxed integration being
linear in time, it follows that
terms of
can be computed in time
,
where
denotes the time complexity of relaxed
multiplication. In [vdH97, vdH02a], we proved
the following theorem:
and space complexity .
In this paper, we will improve the time complexity bound in this theorem
in the case when admits
-th roots of unity for any
. In section 2, we first reduce this
problem to the case of “semi-relaxed multiplication”, when
one of the arguments is fixed and the other one relaxed. More precisely,
let
and
be power series,
such that
is known up to order
. Then a semi-relaxed multiplication algorithm
computes the product
up to order
and outputs
as soon as
are known, for all
.
In section 3, we show that the
overhead in theorem 1 can be reduced to
. In section 4, the technique of
section 3 is further improved so as to yield an
overhead.
In the sequel, we will use the following notations from [vdH02a]:
we denote by the set of truncated power series
of order
, like
. Given
and
, we will denote
.
Remark
Remark of the new complexity for relaxed multiplication
might surprise the reader. It should be noticed that the time complexity
of Toom-Cook's algorithm for polynomial multiplication [Too63,
Coo66] has a similar complexity
[Knu97, Section 4.3, p. 286 and exercise 5, p. 300]. Indeed,
whereas our algorithm from section 3 has a Karatsuba-like
flavour, the algorithm from section 4 uses a generalized
subdivision which is similar to the one used by Toom and Cook.
An interesting question is whether even better time complexities can be
obtained (in analogy with FFT-multiplication). However, we have not
managed so far to reduce the cost of relaxed multiplication to or
.
Nevertheless, it should be noticed that the function
grows very slowly; in practice, it very much behaves like as a constant
(see section 5).
Remark . In fact, we expect fast algorithms for formal
power series to be one of the building bricks for effective analysis [vdH06b]. Therefore, even small improvements in the complexity
of relaxed multiplication should lead to global speed-ups for this kind
of software.
In [vdH97, vdH02a], we have stated
several fast algorithms for relaxed multiplication. Let us briefly
recall some of the main concepts and ideas. For details, we refer to [vdH02a]. Throughout this section, and
are two power series in
.
![]() |
(2) |
the full product of and
at order
.
![]() |
(3) |
the truncated product of and
at order
.
and
at order
takes
and
on input and
computes
as in (2)
(resp. (3)).
and
at order
successively takes the pairs
on input and
successively computes
(resp.
). Here it is understood that
is output as soon as
are
known.
and
takes
and the successive values
on input and
successively computes
(resp.
). Here it is understood that
is output as soon as
are
known.
We will denote by ,
and
the time complexities of full
zealous, relaxed and semi-relaxed multiplication at order
, where it is understood that the ring
operations in
can be performed in time
. We notice that full zealous
multiplication is equivalent to polynomial multiplication. Hence,
classical fast multiplication algorithms can be applied in this case [KO63, Too63, Coo66, CT65,
SS71, CK91, vdH02a].
The main idea behind efficient algorithms for relaxed multiplication is
to anticipate on future computations. More precisely, the computation of
a full product (2) can be represented by an square with entries
,
. As soon as
and
are known, it becomes possible to compute
the contributions of the products
with
to
, even
though the contributions of
with
are not yet needed. The next idea is to subdivide the
square into smaller squares, in such a way that the
contribution of each small square to
can be
computed using a zealous algorithm. Now the contribution of such a small
square is of the form
.
Therefore, the requirement
suffices to ensure
that the resulting algorithm will be relaxed. In the left hand image of
figure 1, we have shown the subdivision from the main
algorithm of [vdH97, vdH02a], which has time
complexity
.
![]() ![]() |
There is an alternative interpretation of the left hand image in figure
1: when interpreting the big square as a
multiplication
we may regard it as the sum
of four multiplications
Now is a relaxed multiplication at order
, but
is
even semi-relaxed, since
are already known by
the time that we need
.
Similarly,
corresponds to a semi-relaxed product
and
to a zealous product. This shows that
Similarly, we have
as illustrated in the right-hand image of figure 1. Under
suitable regularity hypotheses for and
, the above relations imply:
A consequence of part (b) of the theorem is that it suffices to
design fast algorithms for semi-relaxed multiplication in order to
obtain fast algorithms for relaxed multiplication. This fact may be
reinterpreted by observing that the fast relaxed multiplication
algorithm actually applies Newton's method in a hidden way. Indeed,
since Brent and Kung [BK78], it is well known that Newton's
method can also be used in the context of formal power series in order
to solve differential or functional equations. One step of Newton's
method at order involves the recursive
application of the method at order
and the
resolution of a linear equation at order
.
The resolution of the linear equation corresponds to the computation of
the two semi-relaxed products.
Assume from now on that admits an
-th root of unity
for every power of two
.
Given an element
, let
denote its Fourier transform
and let be the inverse mapping of
. It is well known that both
and
can be computed in time
. Furthermore, if
are
such that
, then
where the product in is scalar multiplication
.
Now consider a decomposition with
and
. Then a
truncated power series
can be rewritten as a
series
in , where
. This series may again be reinterpreted as a
series
, and we have
where is the mapping which substitutes
for
. Also, the
FFT-transform
may be extended to a mapping
for each , and similarly for
its inverse
. Now the formula
yields a way to compute by reusing the Fourier
transforms of the “bunches of coefficients”
and
many times.
In the context of a semi-relaxed multiplication
with fixed argument
, the
above scheme almost reduces the computation of an
product with coefficients in
to the computation
of an
product with coefficients in
. The only problem which remains is that we can
only compute
when
are
all known. Consequently, the products
should be
computed apart, using a traditional semi-relaxed multiplication. In
other words, we have reduced the computation of a semi-relaxed
product with coefficients in
to
the computation of
semi-relaxed
products with coefficients in
,
one semi-relaxed
product with coefficients in
and
FFT-transforms of
length
. This has been
illustrated in figure 2.
![]() |
In order to obtain an efficient algorithm, we may choose and
:
-th root of unity
for each
. Then there
exists a relaxed multiplication algorithm of time complexity
and space complexity
.
Proof. In view of section 2, it
suffices to consider the case of a semi-relaxed product. Let denote the time complexity of the above method. Then we
observe that
Taking ,
and
, we obtain
from which we deduce that and
. Similarly, the space complexity
satisfies the bound
Setting , it follows that
Consequently, and
.
More generally, if with
, then we may reduce the computation of a
semi-relaxed
product with coefficients in
into the computation of
semi-relaxed
products over
of the form
;
FFT-transforms of length
;
semi-relaxed
products over
;
FFT-transforms of length
;
semi-relaxed
products over
;
FFT-transforms of length
;
one semi-relaxed product over
.
This computation is illustrated in 3. From the complexity point of view, it leads to the following theorem:
-th root of unity
for each
. Then there
exists a relaxed multiplication algorithm of time complexity
and space complexity
.
Proof. In view of theorem 10(b),
it suffices to consider the case of a semi-relaxed product. Denoting by
the time complexity of the above method, we have
![]() |
(4) |
Let
Taking in (4), it follows for any
that
![]() |
(5) |
Applying this relation times, we obtain
![]() |
(6) |
For a fixed such that
is
an integer, we obtain
![]() |
(7) |
The minimum of is reached when its derivative
w.r.t.
cancels. This happens for
Plugging this value into (7), we obtain
Substitution of finally gives the desired
estimate
![]() |
(8) |
In order to be painstakingly correct, we notice that we really proved
(7) for of the form
and (8) for
of the
form
. Of course, we may
always replace
and
by
larger values which do have this form. Since these replacements only
introduce additional constant factors in the complexity bounds, the
bound (8) holds for general
.
As to the space complexity ,
we have
Let
Taking , it follows for any
that
for some fixed constant .
Applying this bound
times, we obtain
For , this bound simplifies
to
Taking and
as above, it
follows that
Substitution of finally gives us the desired
estimate
for the space complexity. For similar reasons as above, the bound holds
for general .
We implemented the algorithm from section 3 in
the ,
we took
small (with
in
the FFT range up to
), and
used a naive multiplication algorithm on the FFT-ed blocks. The reason
behind this change is that
needs to be
reasonably large in order to profit from the better asymptotic
complexity of relaxed multiplication. In practice, the optimal choice of
is obtained by taking
quite small.
Moreover, our implementation uses a truncated version of relaxed
multiplication [vdH02a, Section 4.4.2]. In particular, the
use of naive multiplication on the FFT-ed blocks allows us to gain a
factor at the top-level. For small values of
, we also replaced FFT
transforms by “Karatsuba transforms”: given a polynomial
, we may form a polynomial
in
variables with
coefficients
for
.
Then the Karatsuba transform of
is the vector
of size
,
where
.
We have both tested (truncated) relaxed and semi-relaxed multiplication
for different types of coefficients on an Intel Xeon processor at with
of memory. The results
of our benchmarks can be found in tables 1 and 2
below. Our benchmarks start at the order
where
FFT multiplication becomes useful. Notice that working with orders in
does not give us any significant advantage,
because the top-level product on FFT-ed blocks is naive. In table 1, the choice of
as a function of
has been optimized for complex double coefficients.
No particular optimization effort was made for the coefficient types in
table 2, and it might be possible to gain about
on our timings.
|
|||||||||||||||||||||||||||||||||||||||||||||
Remark .
Although this is better from an asymptotic point of view, the ratio
rarely reaches
in our tables.
Consequently, relaxed algorithms are often better. A similar phenomenon
was already observed in [vdH02a, Tables 4 and 5]. It would
be interesting to pursue the comparisons in view of some recent advances
concerning Newton's method [BCO+06, vdH06a];
see also [Sed01, Section 5.2.1].
Remark small (typically
) and use evaluation
(interpolation) for polynomials of degree
(
) at
points. From an asymptotic point of view, this yields
for relaxed multiplication. Moreover, the approach naturally combines
with the generalization of pair/odd decompositions [HZ02],
which also yields an optimal bound for truncated multiplications. In
fact, we notice that truncated pair/odd Karatsuba multiplication is
“essentially relaxed” [vdH02a, Section 4.2].
On the negative side, these theoretically fast algorithms have bad space
complexities and they are difficult to implement. In order to obtain
good timings, it seems to be necessary to use dedicated code generation
at different (ranges of) orders ,
which can be done using the
We have shown how to improve the complexity of relaxed multiplication in
the case when the coefficient ring admits sufficiently many -th roots of unity. The improvement is based on
reusing FFT-transforms of pieces of the multiplicands at different
levels of the underlying binary splitting algorithm. The new approach
has proved to be efficient in practice (see tables 1 and 2).
For further studies, it would be interesting to study the price of
artificially adding -th roots
of unity, like in Schönhage-Strassen's algorithm. In practice, we
notice that it is often possible, and better, to “cut the
coefficients into pieces” and to replace them by polynomials over
the complexified doubles
or
with
. However, this approach
requires more implementation effort.
A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, É. Schost, and A. Sedoglavic. Fast computation of power series solutions of systems of differential equation. preprint, april 2006. submitted, 13 pages.
R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.
D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
S.A. Cook. On the minimum computation time of functions. PhD thesis, Harvard University, 1966.
J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
Guillaume Hanrot, Michel Quercia, and Paul Zimmermann. The middle product algorithm I. speeding up the division and square root of power series. AAECC, 14(6):415–438, 2004.
Guillaume Hanrot and Paul Zimmermann. A long note on Mulders' short product. Research Report 4654, INRIA, December 2002. Available from http://www.loria.fr/ hanrot/Papers/mulders.ps.
D.E. Knuth. The Art of Computer Programming, volume 2: Seminumerical Algorithms. Addison-Wesley, 3-rd edition, 1997.
A. Karatsuba and J. Ofman. Multiplication of multidigit numbers on automata. Soviet Physics Doklady, 7:595–596, 1963.
Alexandre Sedoglavic. Méthodes seminumériques en algèbre différentielle ; applications à l'étude des propriétés structurelles de systèmes différentiels algébriques en automatique. PhD thesis, École polytechnique, 2001.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing 7, 7:281–292, 1971.
A.L. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. Soviet Mathematics, 4(2):714–716, 1963.
J. van der Hoeven. Lazy multiplication of formal power series. In W. W. Küchlin, editor, Proc. ISSAC '97, pages 17–20, Maui, Hawaii, July 1997.
J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
J. van der Hoeven et al. Mmxlib: the standard library for Mathemagix, 2002. http://www.mathemagix.org/mml.html.
J. van der Hoeven. New algorithms for relaxed multiplication. Technical Report 2003-44, Université Paris-Sud, Orsay, France, 2003.
J. van der Hoeven. Relaxed multiplication using the middle product. In Manuel Bronstein, editor, Proc. ISSAC '03, pages 143–147, Philadelphia, USA, August 2003.
J. van der Hoeven. Newton's method and FFT trading. Technical Report 2006-17, Univ. Paris-Sud, 2006. Submitted to JSC.
J. van der Hoeven. On effective analytic continuation. Technical Report 2006-15, Univ. Paris-Sud, 2006.