|
In this paper, we present fast algorithms for the product of two
multivariate polynomials in sparse representation. The bit
complexity of our algorithms are studied in detail for various
types of coefficients, and we derive new complexity results for
the power series multiplication in many variables. Our algorithms
are implemented and freely available within the
|
It is classical [SS71, Sch77, CK91,
Für07] that the product of two integers, or two
univariate polynomials over any ring, can be performed in softly
linear time for the usual dense representations. More precisely,
two integers of bit-size at most can be
multiplied in time
on a Turing machine, and the
product of two univariate polynomials can be done with
arithmetic operations in the coefficient ring.
Multivariate polynomials often admit many zero coefficients, so a
sparse representation is usually preferred over a dense one:
each monomial is a pair made of a coefficient and an exponent written in
the dense binary representation. A natural problem is whether the
product of two sparse multivariate polynomials
in
can be computed in softly linear time. We
will assume to be given a subset
of size
that contains the support of
. We also let
be the minimal
numbers with
.
For coefficient fields of characteristic zero, it is proved in [CKL89]
that can be computed using
operations over the ground field. This algorithm uses fast evaluation
and interpolation at suitable points built from prime numbers.
Unfortunately, the method hides an expensive growth of intermediate
integers involved in the linear transformations, which prevents the
algorithm from being softly linear in terms of bit-complexity.
In this paper, we turn our attention to various concrete coefficient
rings for which it makes sense to study the problem in terms of
bit-complexity. For these rings, we will present multiplication
algorithms which admit softly linear bit-complexities, under the mild
assumption that . Our
approach is similar to [CKL89], but relies on a different
kind of evaluation points. Furthermore, finite fields form an important
special case for our method.
Let us briefly outline the structure of this paper. In section 2,
we start with a presentation of the main algorithm over an arbitrary
effective algebra with elements of sufficiently
high order. In section 3, we treat various specific
coefficient rings. In section 4 we give an application to
multivariate power series multiplication. In the last section 5,
we report on timings with our
Let be an effective algebra over an effective
field
, i.e. all
algebra and field operations can be performed by algorithm.
We will denote by the cost for multiplying two
univariate polynomials of degree
over
in terms of the number of arithmetic operations in
. Similarly, we denote by
the time needed to multiply two integers of bit-size at
most
. One can take
[CK91] and
[Für07],
where
represents the iterated logarithm of
. Throughout the paper, we will
assume that
and
are
increasing. We also assume that
and
.
Given a multivariate polynomial and an index
, we denote
and let
be the coefficient of
in
. The support of
is defined by
and we
denote by
its cardinality.
In the sparse representation, the polynomial is
stored as a sequence of exponent-coefficient pairs
. Natural numbers are represented by their
sequences of binary digits. The bit-size of an exponent
is
, where
. We let
be the bit-size of
.
In this and the next section, we are interested in the multiplication
of two sparse polynomials
. We assume given a finite set
, such that
.
We will write
for its size and
for its bit-size. We also denote
and
. For each
, we introduce
,
which sharply bounds the partial degree in
for
each monomial in
. We denote
.
Given pairwise distinct points
and
, let
be the linear map which sends
to
, with
.
In the canonical basis, this map corresponds to left multiplication by
the generalized Vandermonde matrix
The computation of and its inverse
(if
)
correspond to the problems of multi-point evaluation and interpolation
of a polynomial. Using binary splitting, it is classical [MB72,
Str73, BM74] that both problems can be solved
in time
. Notice that the
algorithms only require vectorial operations in
(additions, subtractions and multiplications with elements in
).
Our main algorithm relies on the efficient computations of the
transpositions of
and
. The map
corresponds to left multiplication by
By the transposition principle [Bor56, Ber],
the operations and
can
again be computed in time
.
There is an efficient direct approach for the computation of [BLS03]. Given a vector
with entries
, the entries
of
are identical to the
first
coefficients of the power series
The numerator and denominator of this rational function can be computed
using binary splitting. If ,
then this requires
vectorial operations in
[GG02, Theorem 10.10]. The truncated
division of the numerator and denominator at order
requires
vectorial operations in
. If
,
then we cut the sum into
parts of size
, and obtain the complexity bound
.
Inversely, assume that we wish to recover from
, when
. For simplicity, we assume that the
are non-zero (this will be the case in the sequel).
Setting
,
and
, we notice that
for all
.
Hence, the computation of the
reduces to two
multi-point evaluations of
and
at
and
divisions. This
amounts to a total of
vectorial operations in
and
divisions in
.
Let be a new variable. We introduce the vector
spaces
Given , let
. The Kronecker isomorphism
, is the unique
-linear map with
It corresponds to the evaluation at ,
so that
.
Assume now that we are given an element of
multiplicative order at least
and consider the
following evaluation map
We propose to compute though the equality
.
Given , let
be the matrix of
restricted to the space of
polynomials with support included in
.
Setting
, we have
Taking resp.
, this allows us to compute
and
using our algorithm for transposed
multi-point evaluation from section 2.2. We obtain
using one Hadamard product
. Taking
,
the points
are pairwise distinct, since the
are smaller than the order of
. Hence
is invertible and we
recover
from
using
transposed multi-point interpolation.
and
in
and an element
of order at least d, then the product
can be computed using
products in
,
inversions in
,
products in
, and
vectorial operations in
.
Proof. By classical binary powering, the
computation of the sequence takes
operations in
because each
does appear in the entries of
. Then the computation of all the
for
(resp.
and
) requires
(resp.
and
)
products in
. Using the
complexity results from section 2.2, we may compute
and
using
vectorial operations in
. We
deduce
using
more
multiplications in
. Again
using the results from section 2.2, we retrieve the
coefficients
after
further vectorial operations in
and
divisions in
.
Adding up the various contributions, we obtain the theorem.
Remark
If is the finite field
with
elements, then its multiplicative group is
cyclic of order
. Whenever
, it follows that the main
theorem 1 applies for any primitive element
of this group.
Usually, is given as the quotient
for some monic and irreducible polynomial
of degree
. In that case, a
multiplication in
amounts to
ring operations in
. An
inversion in
requires an extended gcd
computation in
and gives rise to
operations in
.
Using Kronecker multiplication, we can also take
. Using these estimates, Theorem 1
implies:
. Given two
polynomials
and
in
, the product
can be computed using
ring operations in and
inversions in
.
Applying the general purpose algorithm from [CK91], two
polynomials of degree over
can be multiplied in time
.
Alternatively, we may lift the multiplicants to polynomials in
, use Kronecker multiplication and
reduce modulo
. As long as
, this yields the better
complexity bound
. Theorem 1 therefore implies:
Remark then it is always possible to build an algebraic extension
of suitable degree
in order to apply the
corollary. Such constructions are classical, see for instance [GG02,
Chapter 14]. We need to have
,
so
should be taken of the order
, which also corresponds to the additional
overhead induced by this method.
Remark
can be constructed in polynomial time [BS91]. If
is odd, then
is a primive element
if and only if
. In
, the smallest
such that
is a primitive element satisfies
.
One approach for the multiplication of
polynomials with integer coefficients is to reduce the problem modulo a
suitable prime number
. This
prime number should be sufficiently large such that
can be read off from
and such that
admits elements of order
.
Let denote the maximal bit-length of the
coefficients of
and similarly for
and
. Since
we have
It therefore suffices to take .
Corollary 4 now implies:
Remark denote the
-th
prime number. The prime number theorem implies that
. Cramér's conjecture [Cra36]
states that
This conjecture is supported by numerical evidence. Setting
the conjecture implies that the smallest prime number
with
satisfies
.
Using a polynomial time primality test [AKS04], it follows
that this number can be computed by brute force in time
. In addition, in order to satisfy the
complexity bound it suffices to tabulates prime numbers of sizes 2, 4,
8, 16, etc.
In our algorithm and Theorem 1, we regard the computation
of a prime number as a precomputation. This is
reasonable if
is not too large. Now the quantity
usually remains reasonably small. Hence, our
assumption that
is not too large only gets
violated if
becomes large. In that case, we will
rather use Chinese remaindering. We first compute
prime numbers
with
Each contains a primitive root of unity
of order
.
We next proceed as before, with
and
such that
for each
. Indeed, the fact that each
is invertible implies that
is invertible.
We will say that form a reduced sequence of
prime moduli with order
and capacity
, whenever
,
,
and
. We
then have the following refinement of Corollary 7:
and a reduced sequence
of prime moduli with order
and
capacity
, we can compute
in time
An important kind of sparse polynomials are power series in several
variables, truncated by total degree. Such series are often used in long
term integration of dynamical systems [MB96, MB04],
in which case their coefficients are floating point numbers rather than
integers. Assume therefore that and
are polynomials with floating coefficients with a
precision of
bits.
Let be the maximal exponent of the coefficients
of
. For a so called
discrepancy
, fixed
by the user, we let
be the integer polynomial
with
for all . We have
and
for the sup-norm on the coefficients. If all coefficients of have a similar order of magnitude, in the sense that the
minimal exponent of the coefficients is at least
, then we actually have
. Applying a similar decomposition to
, we may compute the product
using the algorithm from section 2 and convert the resulting coefficients back into floating point format.
Usually, the coefficients of a univariate power
series
are approximately in a geometric
progression
. In that case,
the coefficients of the power series
with
are approximately of the same order of magnitude. In
the multivariate case, the coefficients still have a geometric increase
on diagonals
, but the
parameter
depends on the diagonal. After a
suitable change of variables
,
the coefficients in a big zone near the main diagonal become of
approximately the same order of magnitude. However, the discrepancy
usually needs to be chosen proportional to the total truncation degree
in order to ensure sufficient accuracy elsewhere.
Let us now consider the case when .
Let
and
denote the least
common multiples of the denominators of the coefficients of
resp.
.
One obvious way to compute
is to set
,
,
and compute
using one of the methods from
section 3.2. This approach works well in many cases
(e.g. when
and
are truncations of exponential generating series). Unfortunately, this
approach is deemed to be very slow if the size of
or
is much larger than the size of any of the
coefficients of
.
An alternative, more heuristic approach is the following. Let be an increasing sequence of prime numbers with
and such that each
is relatively
prime to the denominators of each of the coefficients of
and
. For each
, we may then multiply
and
using the algorithm from
section 2. For
,
we may recover
using Chinese remaindering and
attempt to reconstruct
from
using rational number reconstruction [GG02, Chapter 5]. If
this yields the same result for a given
and
, then the reconstructed
is likely to be correct at those stages.
Of course, if we have an a priori bound on the bit sizes of the
coefficients of , then we may
directly take a sufficient number of primes
such
that
can be reconstructed from its reduction
modulo
.
Let be an algebraic number field. For some
algebraic integer
, we may
write
. Let
be the monic polynomial of minimal degree
with
. Given a prime number
, the polynomial
induces an algebraic extension
of
, where
.
Reduction modulo
of a sparse polynomial
then yields a sparse polynomial
. We have seen in section 3.1 how
to multiply sparse polynomials over the finite field
. Choosing one or more sufficiently large prime
numbers
, we may thus apply
the same approaches as in section 3.2 in order to multiply
sparse polynomials over
.
Using the techniques from section 3.4, we next deal with
the case of sparse polynomials over
.
Given , let
. The total degree of a polynomial
is defined by
Given a subset , we define
the restriction
of
to
by
For , we define initial
segments
of
by
Then
is the set of polynomials of total degree .
Given
, the aim of this
section is to describe efficient algorithms for the computation of
. We will follow and extend the
strategy described in [LS03].
Remark with
,
but for the sake of simplicity, we will stick to ordinary total degrees.
Given a polynomial , we
define its projective transform
by
If , then
, where
Inversely, for any , there
exists a unique
with
. The transformation
is an
injective morphism of
-algebras.
Consequently, given
, we will
compute the truncated product
using
Given a polynomial and
, let
If , then
, with
Let be an element of
of
sufficiently high order
.
Taking
as above, the construction in section 2.3 yields a
-linear
and invertible evaluation mapping
such that for all with
, we have
![]() |
(1) |
This map extends to using
Given and
,
the relation (1) yields
In particular, if , then
Since is invertible, this yields an efficient
way to compute
.
The number of coefficients of a truncated series in
is given by
The size of
is smaller
by a factor between
and
:
and an element
of order at least
, we can
compute
using
inversions in
,
general operations in
,
and
vectorial operations in
.
Proof. The transforms
and
require a negligible amount of time. The
computation of the evaluation points
only
involves
products in
, when exploiting the fact that
is an initial segment. The computation of
and
requires
vectorial
operations in
. The
computation of
can be done using
general operations in
.
Recovering
again requires
vectorial operations in
, as
well as
divisions in
.
, where
is a prime number with
, we can compute
in time
.
In the case when , the
assumption
, with
, guarantees that the coefficients
of the result
can be reconstructed from their
reductions modulo
. Combining
this observation with Chinese remaindering, we obtain:
and a reduced sequence
of prime moduli with order
and capacity
, we can
compute
in time
We have implemented the fast series product of the previous section
within the C++ library multimix of
, with
, on a 2.4 GHz Intel(R) Core(TM)2
Duo platform. Recall that
is the number of the
variables and
the truncation order. Timings are
given in milliseconds. The line naive corresponds to the naive
multiplication, that performs all the two by two monomial products,
while the line fast stands for the algorithm of the previous
section. We have added the size
of the support
of the series, together with the cost
of our
univariate multiplication in size
for
comparison. An empty cell corresponds to a computation that needed more
than 10 minutes. The following tables demonstrate that the new fast
algorithms are relevant to practice and that the theoretical soflty
linear asymptotic cost can really be observed.
|
||||||||||||||||||||||||||||||
|
||||||||||||||||||||
|
||||||||||||||||||||||||||||||
M. Agrawal, N. Kayal, and N. Saxena. Primes is in p. Annals of Mathematics, 160(2):781–793, 2004.
D. Bernstein. The transposition principle. Available
from
http://cr.yp.to/transposition.html.
A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Proceedings of ISSAC 2003, pages 37–44. ACM, 2003.
A. Borodin and R.T. Moenck. Fast modular transforms. Journal of Computer and System Sciences, 8:366–386, 1974.
J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research B: Electrophysics, Acoustics, Optics, Mathematical Methods, 6:1–74, 1956.
Johannes Buchmann and Victor Shoup. Constructing nonresidues in finite fields and the extended riemann hypothesis. In STOC '91: Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 72–79, New York, NY, USA, 1991. ACM.
D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proc. ISSAC '89, pages 121–128, Portland, Oregon, A.C.M., New York, 1989. ACM Press.
H. Cramér. On the order of magnitude of the difference between consecutive prime numbers. Acta Arithmetica, 2:23–46, 1936.
M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.
J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.
G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.
R.T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.
K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.
K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
A. Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. Acta Informatica, 7:395–398, 1977.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
V. Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. Numer. Math., 20:238–251, 1973.
J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.