|
In this paper, we present fast algorithms for the product of two
multivariate polynomials in sparse representation. The bit
complexity of our algorithms are studied in detail for various
types of coefficients, and we derive new complexity results for
the power series multiplication in many variables. Our algorithms
are implemented and freely available within the
|
It is classical [SS71, Sch77, CK91, Für07] that the product of two integers, or two univariate polynomials over any ring, can be performed in softly linear time for the usual dense representations. More precisely, two integers of bit-size at most can be multiplied in time on a Turing machine, and the product of two univariate polynomials can be done with arithmetic operations in the coefficient ring.
Multivariate polynomials often admit many zero coefficients, so a sparse representation is usually preferred over a dense one: each monomial is a pair made of a coefficient and an exponent written in the dense binary representation. A natural problem is whether the product of two sparse multivariate polynomials in can be computed in softly linear time. We will assume to be given a subset of size that contains the support of . We also let be the minimal numbers with .
For coefficient fields of characteristic zero, it is proved in [CKL89] that can be computed using operations over the ground field. This algorithm uses fast evaluation and interpolation at suitable points built from prime numbers. Unfortunately, the method hides an expensive growth of intermediate integers involved in the linear transformations, which prevents the algorithm from being softly linear in terms of bit-complexity.
In this paper, we turn our attention to various concrete coefficient rings for which it makes sense to study the problem in terms of bit-complexity. For these rings, we will present multiplication algorithms which admit softly linear bit-complexities, under the mild assumption that . Our approach is similar to [CKL89], but relies on a different kind of evaluation points. Furthermore, finite fields form an important special case for our method.
Let us briefly outline the structure of this paper. In section 2,
we start with a presentation of the main algorithm over an arbitrary
effective algebra with elements of sufficiently
high order. In section 3, we treat various specific
coefficient rings. In section 4 we give an application to
multivariate power series multiplication. In the last section 5,
we report on timings with our
Let be an effective algebra over an effective field , i.e. all algebra and field operations can be performed by algorithm.
We will denote by the cost for multiplying two univariate polynomials of degree over in terms of the number of arithmetic operations in . Similarly, we denote by the time needed to multiply two integers of bit-size at most . One can take [CK91] and [Für07], where represents the iterated logarithm of . Throughout the paper, we will assume that and are increasing. We also assume that and .
Given a multivariate polynomial and an index , we denote and let be the coefficient of in . The support of is defined by and we denote by its cardinality.
In the sparse representation, the polynomial is stored as a sequence of exponent-coefficient pairs . Natural numbers are represented by their sequences of binary digits. The bit-size of an exponent is , where . We let be the bit-size of .
In this and the next section, we are interested in the multiplication of two sparse polynomials . We assume given a finite set , such that . We will write for its size and for its bit-size. We also denote and . For each , we introduce , which sharply bounds the partial degree in for each monomial in . We denote .
Given pairwise distinct points and , let be the linear map which sends to , with . In the canonical basis, this map corresponds to left multiplication by the generalized Vandermonde matrix
The computation of and its inverse (if ) correspond to the problems of multi-point evaluation and interpolation of a polynomial. Using binary splitting, it is classical [MB72, Str73, BM74] that both problems can be solved in time . Notice that the algorithms only require vectorial operations in (additions, subtractions and multiplications with elements in ).
Our main algorithm relies on the efficient computations of the transpositions of and . The map corresponds to left multiplication by
By the transposition principle [Bor56, Ber], the operations and can again be computed in time .
There is an efficient direct approach for the computation of [BLS03]. Given a vector with entries , the entries of are identical to the first coefficients of the power series
The numerator and denominator of this rational function can be computed using binary splitting. If , then this requires vectorial operations in [GG02, Theorem 10.10]. The truncated division of the numerator and denominator at order requires vectorial operations in . If , then we cut the sum into parts of size , and obtain the complexity bound .
Inversely, assume that we wish to recover from , when . For simplicity, we assume that the are non-zero (this will be the case in the sequel). Setting , and , we notice that for all . Hence, the computation of the reduces to two multi-point evaluations of and at and divisions. This amounts to a total of vectorial operations in and divisions in .
Let be a new variable. We introduce the vector spaces
Given , let . The Kronecker isomorphism , is the unique -linear map with
It corresponds to the evaluation at , so that .
Assume now that we are given an element of multiplicative order at least and consider the following evaluation map
We propose to compute though the equality .
Given , let be the matrix of restricted to the space of polynomials with support included in . Setting , we have
Taking resp. , this allows us to compute and using our algorithm for transposed multi-point evaluation from section 2.2. We obtain using one Hadamard product . Taking , the points are pairwise distinct, since the are smaller than the order of . Hence is invertible and we recover from using transposed multi-point interpolation.
Proof. By classical binary powering, the computation of the sequence takes operations in because each does appear in the entries of . Then the computation of all the for (resp. and ) requires (resp. and ) products in . Using the complexity results from section 2.2, we may compute and using vectorial operations in . We deduce using more multiplications in . Again using the results from section 2.2, we retrieve the coefficients after further vectorial operations in and divisions in . Adding up the various contributions, we obtain the theorem.
Remark
If is the finite field with elements, then its multiplicative group is cyclic of order . Whenever , it follows that the main theorem 1 applies for any primitive element of this group.
Usually, is given as the quotient for some monic and irreducible polynomial of degree . In that case, a multiplication in amounts to ring operations in . An inversion in requires an extended gcd computation in and gives rise to operations in . Using Kronecker multiplication, we can also take . Using these estimates, Theorem 1 implies:
ring operations in and inversions in .
Applying the general purpose algorithm from [CK91], two polynomials of degree over can be multiplied in time . Alternatively, we may lift the multiplicants to polynomials in , use Kronecker multiplication and reduce modulo . As long as , this yields the better complexity bound . Theorem 1 therefore implies:
Remark
Remark
One approach for the multiplication of polynomials with integer coefficients is to reduce the problem modulo a suitable prime number . This prime number should be sufficiently large such that can be read off from and such that admits elements of order .
Let denote the maximal bit-length of the coefficients of and similarly for and . Since
we have
It therefore suffices to take . Corollary 4 now implies:
Remark
This conjecture is supported by numerical evidence. Setting
the conjecture implies that the smallest prime number with satisfies . Using a polynomial time primality test [AKS04], it follows that this number can be computed by brute force in time . In addition, in order to satisfy the complexity bound it suffices to tabulates prime numbers of sizes 2, 4, 8, 16, etc.
In our algorithm and Theorem 1, we regard the computation of a prime number as a precomputation. This is reasonable if is not too large. Now the quantity usually remains reasonably small. Hence, our assumption that is not too large only gets violated if becomes large. In that case, we will rather use Chinese remaindering. We first compute prime numbers with
Each contains a primitive root of unity of order . We next proceed as before, with and such that for each . Indeed, the fact that each is invertible implies that is invertible.
We will say that form a reduced sequence of prime moduli with order and capacity , whenever , , and . We then have the following refinement of Corollary 7:
An important kind of sparse polynomials are power series in several variables, truncated by total degree. Such series are often used in long term integration of dynamical systems [MB96, MB04], in which case their coefficients are floating point numbers rather than integers. Assume therefore that and are polynomials with floating coefficients with a precision of bits.
Let be the maximal exponent of the coefficients of . For a so called discrepancy , fixed by the user, we let be the integer polynomial with
for all . We have and
for the sup-norm on the coefficients. If all coefficients of have a similar order of magnitude, in the sense that the minimal exponent of the coefficients is at least , then we actually have . Applying a similar decomposition to , we may compute the product
using the algorithm from section 2 and convert the resulting coefficients back into floating point format.
Usually, the coefficients of a univariate power series are approximately in a geometric progression . In that case, the coefficients of the power series with are approximately of the same order of magnitude. In the multivariate case, the coefficients still have a geometric increase on diagonals , but the parameter depends on the diagonal. After a suitable change of variables , the coefficients in a big zone near the main diagonal become of approximately the same order of magnitude. However, the discrepancy usually needs to be chosen proportional to the total truncation degree in order to ensure sufficient accuracy elsewhere.
Let us now consider the case when . Let and denote the least common multiples of the denominators of the coefficients of resp. . One obvious way to compute is to set , , and compute using one of the methods from section 3.2. This approach works well in many cases (e.g. when and are truncations of exponential generating series). Unfortunately, this approach is deemed to be very slow if the size of or is much larger than the size of any of the coefficients of .
An alternative, more heuristic approach is the following. Let be an increasing sequence of prime numbers with and such that each is relatively prime to the denominators of each of the coefficients of and . For each , we may then multiply and using the algorithm from section 2. For , we may recover using Chinese remaindering and attempt to reconstruct from using rational number reconstruction [GG02, Chapter 5]. If this yields the same result for a given and , then the reconstructed is likely to be correct at those stages.
Of course, if we have an a priori bound on the bit sizes of the coefficients of , then we may directly take a sufficient number of primes such that can be reconstructed from its reduction modulo .
Let be an algebraic number field. For some algebraic integer , we may write . Let be the monic polynomial of minimal degree with . Given a prime number , the polynomial induces an algebraic extension of , where . Reduction modulo of a sparse polynomial then yields a sparse polynomial . We have seen in section 3.1 how to multiply sparse polynomials over the finite field . Choosing one or more sufficiently large prime numbers , we may thus apply the same approaches as in section 3.2 in order to multiply sparse polynomials over . Using the techniques from section 3.4, we next deal with the case of sparse polynomials over .
Given , let . The total degree of a polynomial is defined by
Given a subset , we define the restriction of to by
For , we define initial segments of by
Then
is the set of polynomials of total degree . Given , the aim of this section is to describe efficient algorithms for the computation of . We will follow and extend the strategy described in [LS03].
Remark
Given a polynomial , we define its projective transform by
If , then , where
Inversely, for any , there exists a unique with . The transformation is an injective morphism of -algebras. Consequently, given , we will compute the truncated product using
Given a polynomial and , let
If , then , with
Let be an element of of sufficiently high order . Taking as above, the construction in section 2.3 yields a -linear and invertible evaluation mapping
such that for all with , we have
(1) |
This map extends to using
Given and , the relation (1) yields
In particular, if , then
Since is invertible, this yields an efficient way to compute .
The number of coefficients of a truncated series in is given by
The size of is smaller by a factor between and :
Proof. The transforms and require a negligible amount of time. The computation of the evaluation points only involves products in , when exploiting the fact that is an initial segment. The computation of and requires vectorial operations in . The computation of can be done using general operations in . Recovering again requires vectorial operations in , as well as divisions in .
In the case when , the assumption , with , guarantees that the coefficients of the result can be reconstructed from their reductions modulo . Combining this observation with Chinese remaindering, we obtain:
We have implemented the fast series product of the previous section
within the C++ library multimix of
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||
|
||||||||||||||||||||||||||||||
M. Agrawal, N. Kayal, and N. Saxena. Primes is in p. Annals of Mathematics, 160(2):781–793, 2004.
D. Bernstein. The transposition principle. Available
from
http://cr.yp.to/transposition.html.
A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Proceedings of ISSAC 2003, pages 37–44. ACM, 2003.
A. Borodin and R.T. Moenck. Fast modular transforms. Journal of Computer and System Sciences, 8:366–386, 1974.
J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research B: Electrophysics, Acoustics, Optics, Mathematical Methods, 6:1–74, 1956.
Johannes Buchmann and Victor Shoup. Constructing nonresidues in finite fields and the extended riemann hypothesis. In STOC '91: Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 72–79, New York, NY, USA, 1991. ACM.
D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proc. ISSAC '89, pages 121–128, Portland, Oregon, A.C.M., New York, 1989. ACM Press.
H. Cramér. On the order of magnitude of the difference between consecutive prime numbers. Acta Arithmetica, 2:23–46, 1936.
M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.
J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.
G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.
R.T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.
K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.
K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
A. Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. Acta Informatica, 7:395–398, 1977.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
V. Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. Numer. Math., 20:238–251, 1973.
J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.