The Truncated Fourier Transform and Applications^^

Let

be an effective ring of constants (i.e. the usual arithmetic operations

and

can be carried out by algorithm). If

has a primitive

-th root of unity with

, then the product of two polynomials

with

can be computed in time

using the Fast Fourier Transform or FFT [CT65]. If

does not admit a primitive

-th root of unity, then one needs an additional overhead of

in order to carry out the multiplication, by artificially adding new root of unity [SS71, CK91].

Besides the fact that the asymptotic complexity of the FFT involves a large constant factor, another classical drawback is that the complexity function admits important jumps at each power of two. These jumps can be reduced by using

-th roots of unity for small

. They can also be smoothened by decomposing

-multiplications as

- and

-multiplications. However, these tricks are not very elegant, cumbersome to implement, and they do not allow to completely eliminate the jump problem.

In section 3, we present a new kind of “Truncated Fourier Transform” or TFT, which allows for the fast evaluation of a polynomial

in any number

of well-chosen roots of unity. This algorithm coincides with the usual FFT if

is a power of two, but it behaves smoothly for intermediate values. In section 4, we also show that the inverse operation of interpolation can be carried out with the same complexity (modulo a few additional shifts).

The TFT permits to speed up the multiplication of univariate polynomials with a constant factor between

and

. In the case of multivariate polynomials, the repeated gain of such a constant factor leads to the gain of a non-trivial asymptotic factor. More precisely, assuming that

admits sufficiently

-th roots of unity, we will show in section 5 that the product of two multivariate polynomials

can be computed in time

, where

and

. The best previously known algorithm [CKL89], based on sparse polynomial multiplication, has time complexity

In section 6 we finally give an algorithm for the multiplication of truncated multivariate power series. This algorithm, which has time complexity

, again improves the best previously known algorithm [LS03] by a factor of

. Moreover, both in the cases of multivariate polynomials and power series, we expect the corresponding constant factor to be better.

2.The Fast Fourier Transform

Let

be an effective ring of constants,

with

and

a primitive

-th root of unity (i.e.

). The discrete Fast Fourier Transform (FFT) of an

-tuple

(with respect to

) is the

-tuple

with

This algorithm requires

multiplications with powers of

and

additions (or subtractions).

In practice, it is most efficient to implement an in-place variant of the above algorithm. We will denote by

the bitwise mirror of

at length

(for instance,

and

). At step

, we start with the vector

for all

. This algorithm of “repeated crossings” is illustrated in figure 1.

A classical application of the FFT is the multiplication of polynomials

and

. Assuming that

, we first evaluate

and

using the FFT:

We next compute the evaluations

. We finally have to recover

from these values using the inverse FFT. But the inverse FFT with respect to

is nothing else as

times the direct FFT with respect to

. Indeed, for all

and all

, we have

whenever

. This yields a multiplication algorithm of time complexity

, when assuming that

admits enough primitive

-th roots of unity. In the case that

does not, then new roots of unity can be added artificially [SS71, CK91, vdH02] so as to yield an algorithm of time complexity

3.The Truncated Fourier Transform

The algorithm from the previous section has the disadvantage that

needs to be a power of two. If we want to multiply two polynomials

such that

, then we need to carry out the FFT at precision

, thereby losing a factor of

. This factor can be reduced using several tricks. For instance, one may decompose the

-product into an

product, an

-product and an

-product. This is efficient for small

, but not very good if

. In the latter case, one may also use an FFT at precision

, by using

-matrices at one step of the FFT computation. However, all these tricks of the trade require a large amount of hacking and one always continues to lose a non-trivial factor between

and

The idea behind the Truncated Fourier Transform is to provide an efficient algorithm for the evaluation of polynomials in any number of distinct points. Moreover, the inverse operation of interpolation can be carried out with the same complexity (modulo a few additional shifts). This technique will eliminate the “jumps” in the complexity of FFT multiplication.

So let

(usually,

) and let

be a primitive

-th root of unity. Given an

-tuple

, we will evaluate the corresponding polynomial

. We call

the Truncated Fourier Transform (TFT) of

. Now consider the completion of the

-tuple

into an

-tuple

. When using the in-place algorithm from the previous section in order to compute

, we claim that many of the computations of the

can actually be skipped (see figure 2). Indeed, at stage

, it suffices to compute the vector

. Besides

, we therefore compute at most

additional values. In total, we therefore compute at most

values

. This proves the following result:

Remark 2. Assume that

admits a privileged primitive

-th root of unity

for every

, such that

for all

. Then the TFT

of an

-tuple

w.r.t.

with

does not depend on the choice of

. We call

the TFT of

w.r.t. the privileged sequence

of roots of unity.

Remark 3. Since the only operations we need for computing the TFT are additions, subtractions and multiplications by powers of

, the algorithm naturally combines with Schönhage-Strassen's algorithm when

is a symbolically added root of unity.

Remark 4. If

, then the TFT of

can be computed using

ring operations using a similar algorithm as above. More generally, this allows the rapid transformation of “unions of segments”.

Inverting the Truncated Fourier Transform

Unfortunately, the inverse TFT cannot be computed using a similar formula as (2). Indeed, starting with the

, we need to compute an increasing number of

when

decreases. Therefore we will rather invert the algorithm which computes the TFT, but with this difference that we will sometimes need

with

in order to compute

. We will use the fact that whenever one value among

and one value among

are known in the cross relation (1), then we can deduce the others from them using one multiplication by a power of

and two “shifted” additions or subtractions (i.e. the results may have to be divided by

More precisely, let us denote

and

at each stage

. We use a recursive algorithm which takes the values

and

on input, and which computes

. If

, then we have nothing to do. Otherwise, we distinguish two cases:

The two cases are illustrated in figures 3 resp. 4. Since

, the application of our algorithm for

computes the inverse TFT. We notice that the values

with

are computed in decreasing order (for

) and the values

with

in increasing order. In other words, the algorithm may be designed in such a way to remain in place. We have proved:

Remark 6. Besides

shifted additions, subtractions or multiplications by powers of

, the algorithm essentially computes inverse FFT-transforms of sizes

with

. Using (2), it is therefore possible to replace all but

shifted additions and subtractions by normal additions and subtractions.

5.Multiplying multivariate polynomials

Let

be a ring with a privileged sequence

of roots of unity (see remark 2). Given a non-zero multivariate polynomial

We let

. Now let

be such that

. In this section we present an algorithm to compute

, which has a good complexity in terms of the number

of expected coefficients of

. When computing

using the classical FFT with respect to each of the variables

, we need a time

which is much bigger than

, in general. When using multiplication of sparse polynomials [CKL89], we need a time

with a non-trivial constant factor. Our algorithm is based on the TFT w.r.t. all variables and we will show that it has a complexity

where

. We recall that the result does not depend on the choice of

. The TFT with respect to all variables

at order

is defined by

In order to compute

, say for

, we compute the TFT of

with

for all

with

(if

, then the TFT of

is given by itself, so we have nothing to do). One such computation takes a time

for some universal constant

, by using the TFT w.r.t.

with minimal

(so

may vary as a function of

, but not

). The computation of

therefore takes a time

with

Consequently,

and even

for fixed

. If

, then for

and

, Stirling's formula yields

It follows that only the first

terms in (3) contribute to the asymptotic behaviour of

, so that

6.Multiplying multivariate power series

Since power series have infinitely many terms, implementing an operation on power series really corresponds to implementing the operation for polynomial approximations at all degrees. As usual, multiplication is a particularly important operation. Given

with

and

, we will show how to compute the truncated product

and

The first idea [LS03] is to use homogeneous coordinates instead of the usual ones:

This transformation takes no time since it corresponds to some re-indexing. We next compute the TFTs

and

at order

We next compute the

truncated products

of the obtained polynomials

and

. After transforming the results of these multiplication back using

The total computation time is bounded by

. Using the fact that

, we have proved the following theorem:

Remark 9. In practice, if the coefficients

have different growths in

, then it may be useful to consider truncations along more general degrees of the form

The “slicing technique” from section 6.3.5 in [vdH02] may then be used in order to obtain complexity bounds of the same type.

Remark 10. Using remark 4, the polynomial and truncated multiplication algorithms can be used in combination with the strategy of relaxed evaluation [vdH97, vdH02, vdH03b] for solving partial differential equations in multivariate power series with an additional overhead of

. A recent technique [vdH03a] allows to reduce this overhead even further and it would be interesting to study more precisely what happens in the multivariate case.

7.Final notes

The author would like to thank the first referee for his enthusiastic and helpful comments. This referee also implemented the algorithms from sections 3 and 4 and he reports a behaviour which is close to the expected one. In response to some other comments and suggestions, we conclude with the following remarks:

Bibliography

[Ber]: D. Bernstein. Fast multiplication and its applications. Available from http://cr.yp.to/papers.html#multapps. See section 4, page 11.
[CK91]: D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
[CKL89]: J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proc. ISSAC '89, pages 121–128, Portland, Oregon, A.C.M., New York, 1989. ACM Press.
[CT65]: J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
[HQZ00]: Guillaume Hanrot, Michel Quercia, and Paul Zimmermann. Speeding up the division and square root of power series. Research Report 3973, INRIA, July 2000. Available from http://www.inria.fr/RRRT/RR-3973.html.
[HQZ02]: Guillaume Hanrot, Michel Quercia, and Paul Zimmermann. The middle product algorithm I. speeding up the division and square root of power series. Accepted for publication in AAECC, 2002.
[HZ04]: Guillaume Hanrot and Paul Zimmermann. A long note on Mulders' short product. JSC, 37(3):391–401, 2004.
[LS03]: G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.
[Mul00]: T. Mulders. On short multiplication and division. AAECC, 11(1):69–88, 2000.
[Pan94]: Victor Y. Pan. Simple multivariate polynomial multiplication. JSC, 18(3):183–186, 1994.
[SS71]: A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing 7, 7:281–292, 1971.
[vdH97]: J. van der Hoeven. Lazy multiplication of formal power series. In W. W. Küchlin, editor, Proc. ISSAC '97, pages 17–20, Maui, Hawaii, July 1997.
[vdH02]: J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
[vdH03a]: J. van der Hoeven. New algorithms for relaxed multiplication. Technical Report 2003-44, Université Paris-Sud, Orsay, France, 2003.
[vdH03b]: J. van der Hoeven. Relaxed multiplication using the middle product. In Manuel Bronstein, editor, Proc. ISSAC '03, pages 143–147, Philadelphia, USA, August 2003.

1.Introduction

2.The Fast Fourier Transform

3.The Truncated Fourier Transform

4.
Inverting the Truncated Fourier Transform

5.Multiplying multivariate polynomials

6.Multiplying multivariate power series

7.Final notes

Bibliography

1.Introduction

2.The Fast Fourier Transform

3.The Truncated Fourier Transform

4. Inverting the Truncated Fourier Transform

5.Multiplying multivariate polynomials

6.Multiplying multivariate power series

7.Final notes

Bibliography

4.
Inverting the Truncated Fourier Transform