|
In 2008, Kedlaya and Umans designed the first multivariate multi-point evaluation algorithm over finite fields with an asymptotic complexity that can be made arbitrarily close to linear. However, it remains a major challenge to make their algorithm efficient for practical input sizes. In this paper, we revisit and improve their algorithm, while keeping this ultimate goal in mind. In addition we sharpen the known complexity bounds for modular composition of univariate polynomials over finite fields. |
Let be a commutative ring with unity, and let
be a multivariate polynomial. The
multi-point evaluation problem consists in evaluating
at several given points
in
. Let
be
polynomials in
of degrees
and let
be a monic polynomial in
of degree
. The
modular composition problem consists in computing
modulo
. This
is equivalent to the computation of the remainder
of the Euclidean division of
by
. It turns out that these two problems are
related and that they form important building blocks in computer
algebra. Theoretically speaking, Kedlaya and Umans have given efficient
solutions to both problems when
is a finite ring
of the form
where
is a
monic polynomial [34]. The design of practically efficient
algorithms remains an important challenge. The purpose of this paper is
to revisit the algorithms by Kedlaya and Umans in detail, to sharpen
their theoretical complexity bounds, and get some insight into the
required data size for which this approach outperforms asymptotically
slower algorithms.
Let denote a complexity function that bounds the
number of operations in
required to multiply two
polynomials of degree
in
. We will often use the soft-Oh notation:
means that
;
see [13, chapter 25, section 7] for technical details. The
least integer larger or equal to
is written
. The largest integer smaller
or equal to
is written
. The
-module
of polynomials of degree
is denoted by
.
In the univariate case when ,
the evaluation of
at
points in
can be achieved with
operations in
. We refer the
reader to [13, chapter 10] for the description of the well
known algorithm based on remainder trees. Algorithms with the smallest
constant hidden in the “
”
may be found in [6]. By allowing precomputations that only
depend on the set of points, this evaluation complexity even drops to
as shown in [23]. For specific sets
of points, such as geometric progressions or TFT points, multi-point
evaluation requires only
operations in
; see [4, 7,
21].
The univariate situation does not extend to several variables, unless
the set of evaluation points has good
properties. For instance if
has the form
with
,
then fast univariate evaluations may be applied coordinate by
coordinate. Fast algorithms also exist for suitable initial segments of
such Cartesian products [29]. Other specific families of
sets of points are used for fast evaluation and interpolation of
multivariate polynomials in sparse representation; see [1,
24] for some recent results.
In the bivariate case when ,
a smart use of the univariate case leads to a cost
, where
bounds the
partial degrees of
[36, Theorem 3].
In 2004, Nüsken and Ziegler improved this bound to
[36, Result 4]—here the constant
is such that a
matrix over
may be multiplied with another
rectangular
matrix with
operations in
. When
is a field the best
currently known bound
is due to Huang and Pan
[30, Theorem 10.1].
In 2008, Kedlaya and Umans achieved a major breakthrough for the general
case [33]. In [34, Corollary 4.3] they showed
the following statement (simplified here for conciseness): let be a fixed rational value, given
in
with partial degrees in any
at most
, and evaluation
points
in
,
then
can be computed with
bit operations, provided that
and where
is a constant independent of
. This result was stated for random access memory
machines. In fact, some of the underlying arguments (such as the use of
lookup tables) need to be adapted to make them work properly on Turing
machines. This is one of our contributions in this paper. In a nutshell
the Kedlaya–Umans algorithm proceeds as follows (see section 3.1):
If is “sufficiently small”, then
we exhaustively evaluate
at all points in
, using fast univariate
multi-point evaluation with respect to each coordinate.
Otherwise, the evaluation of at
is reduced to the evaluation of the preimage
of
in
at the preimages
of
in
. Through the Chinese
remaindering theorem, the latter evaluation over
further reduces to several independent multi-point evaluation
problems modulo “many” prime numbers
that are “much smaller” than
. These evaluations of
mod
at
are handled
recursively, for
.
Let us first discuss the standard modular composition problem when . Let
and
be polynomials in
of
respective degrees
,
and
, with
monic. The naive modular composition algorithm
takes
operations in
. In 1978, Brent and Kung [9] gave an
algorithm with cost
, which
uses the baby-step giant-step technique [39].
Their algorithm even yields a sub-quadratic cost
when using fast linear algebra; see [32, p. 185].
The major breakthrough for this problem is again due to Kedlaya and
Umans [33, 34] in the case when is a finite field
(and even more
generally a finite ring of the form
for any
integer
and
monic). For
any fixed real value
, they
have shown that the composition
could be
computed modulo
using
bit operations.
The special case of power series composition corresponds to . The best known complexity bound in the
algebraic model when
is a field, written
for convenience, is still due to Brent and Kung: in
[9], they achieved
operations in
, under the condition that
is invertible and that the characteristic of
is at least
,
where
. The variant proposed
by van der Hoeven [20, section 3.4.3] raises the condition
on
. For fields with small
characteristic
, Bernstein
[3] proposed an algorithm that is softly linear in
but linear in
.
These algorithms have been generalized to moduli
of the form
in [25]; it is shown
therein that such a composition reduces to one power series composition
at order
over
,
plus
compositions modulo
, and one characteristic polynomial computation
modulo
. Let us finally
mention that an optimized variant, in terms of the constant hidden in
the “
”, of the
Brent–Kung algorithm has been proposed recently by Johansson in
[31], and that series with integer, rational or floating
point coefficients can often be composed in quasi-linear time in
suitable bit complexity models, as shown by Ritzmann [40];
see also [22].
Multi-point evaluation and modular composition are instances of
evaluation problems at points lying in different extensions of . The former case involves several
points with coordinates in
.
The latter case implies one point in the extension
. In the next paragraphs we summarize known
conversions between evaluation problems.
When , several algorithms are
known for converting evaluations at any set of points to specific sets
of points. For instance evaluating at roots of unity can be done fast
thanks to the seminal FFT algorithm, so we usually build fast algorithms
upon FFTs. Typically fast polynomial products are reduced to FFTs over
synthetic roots of unity lying in suitable extensions of
by means of the Schönhage–Strassen algorithm.
And since fast multi-point evaluation reduces to polynomial products,
they thus reduce to FFTs. Such reductions to FFTs are omnipresent in
computer algebra.
Let us still assume that .
Let
, let
be given evaluation points in a field
,
and let
be pairwise distinct evaluation points
in
. Let
and let
be such that
for
. Setting
we have
. So the evaluations
of
at
reduce to
evaluations and interpolations in degree
at the
chosen points plus one modular composition. Conversely given a modulus
, one may benefit from
factorizations of
to compose modulo
. We have studied this approach when
has factors with large multiplicities in [25],
when it splits into linear factors over
in [26], and also when it factors over an algebraic extension of
in [27].
The key idea of Nüsken and Ziegler to speed up multi-point
evaluation is a reduction to modular composition; then their
aforementioned complexity bound follows from a variant of the
Brent–Kung algorithm. Assume .
In order to evaluate
at
points
they first compute
and interpolate
such that
for
(assuming the
being
pairwise distinct, which is not restrictive). Then they compute
and deduce
as
.
Over finite fields, Kedlaya and Umans showed an equivalence between
multi-point evaluation and modular composition. Using Kronecker
segmentation, Theorem 3.1 from [34] reduces such a
composition to multi-point evaluation for an increased number of
variables. Kedlaya and Umans' reduction in the opposite direction is
close to the one of Nüsken and Ziegler. Let
be pairwise distinct points in
.
For each
, they interpolate
such that
for
. Then they compute
, so that
.
On machines with random access memory, arbitrary memory accesses admit a constant cost. This does not reflect the actual behavior of real computers, on which memory is organized into different levels, with efficient hardware support for copying contiguous blocks of memory from one level to another. In this paper, we opted for the standard Turing machine model with a finite number of tapes [38], which charges a “maximal penalty” for non contiguous memory accesses. This means in particular that complexity bounds established for this model are likely to hold for any more or less realistic alternative model. Our first contribution in the present paper is to show that Kedlaya and Umans' complexity bounds hold in the Turing machine model.
Our second contribution concerns sharper and more precise bit complexity
bounds. For multi-point evaluation over ,
we achieve softly linear time in the bit size of
and obtain more general explicit bounds in terms of
,
,
the partial and total degrees of
,
without the assumption
as in [34,
Corollary 4.3]. We also put into evidence the advantage of taking
much larger than the dense size of the support of
. In particular, we analyze
the threshold for which the average cost per evaluation point
stabilizes. Our algorithm closely follows the main ideas of Kedlaya and
Umans, but with two notable changes. On the one hand, using precise
estimates for the first Chebyshev function, we obtain sharper bounds for
the prime numbers to be used during the multi-modular stage of the
algorithm; see section 3.3. On the other hand, when
becomes very large, we fall back to the naive
evaluation algorithm, and thereby achieve a softly linear dependence in
.
Let us now turn to multi-point evaluation over an extension ring of the
form , where
is monic of degree
. Kedlaya
and Umans proposed a reduction to multi-point evaluation over
, with
large, based on Kronecker substitution. In section 4, we
propose an alternative approach, based on univariate polynomial
evaluation, interpolation, and Chinese remaindering, to directly reduce
to several compositions over suitable finite prime fields.
Our detailed analysis of multi-point evaluation is used in section 6 in order to obtain refined bounds for univariate modular
composition. In [34, Corollary 7.2] it is shown that
univariate modular composition in degree over a
finite field
can be done in time
. Making use of the same reduction to
multi-point evaluation, the exponent in
can be
made more explicit: in Theorem 6.2 we prove the bound
The new complexity bounds for multi-point evaluation are also crucial for our new bit complexity bounds for multivariate modular composition and the application to polynomial system solving in [28].
Section 7 addresses the special case when
is a field
of small positive characteristic
. We closely revisit the
method proposed in [34, section 6], and again make the
complexity bound more explicit. Again we quantify the number of
evaluation points from which the average cost per point stabilizes, and
we deduce a sharpened complexity bound for modular composition.
We consider Turing machines with sufficiently many tapes. In fact seven
tapes are usually sufficient to implement all useful complexity bounds
for the elementary operations on polynomials, series and matrices
involved in the present paper (standard algorithms may be found in [42]). The number of symbols used by the machine is not of the
utmost importance, since it only impacts complexity bounds by constant
factors. In the sequel, Turing machines will always have two symbols
“” and
“
”, as well as a
few specific additional ones dedicated to data representation.
Some algebraic structures involve a natural bit size for representing
their elements (e.g. modular integers, finite fields); others involve a
variable size (e.g. arbitrarily large integers, arrays, polynomials). In
both cases, elements are seen as sequences of symbols on tapes ended by
a specific symbol, written “”
in the sequel. Because heads of the machine can just move one cell left
or right at time, algorithms must take care of consuming data in the
most contiguous way as possible. In particular, we notice that loop
counters must be used with care: for instance, the naive implementation
of a loop “for
from 1 to
” involves a non-constant number of
bit operations at each iteration: to increment
by
and to test whether
is less than
.
In this section we gather standard data types and elementary operations
needed in the next sections. We freely use well known complexity bounds
for polynomials and matrices from [13] and refer to [42] for more details on Turing machine implementations.
We use binary representation for integers, so that
has bit size
. A modular
integer in
is represented by its natural
representative in
. Integers
may be added in linear time. The expression
will
represent a nondecreasing cost function for multiplying two integers of
bit sizes
, which satisfies
for all
.
At present time the best known complexity bound is
, where
;
see [16-18] and historical references therein.
The integer division in bit sizes
takes time
(see Lemma 2.15 below for
instance), and the extended gcd costs
by [41]. Overall, all arithmetic operations in
take softly linear time.
One dimensional arrays are sequences of elements ended with the symbol
“”.
Example is stored as
.
For bidimensional arrays we use column-major representation. Precisely
an array of size
(
rows and
columns), is stored
as the vector of its columns, that is
.
Such arrays are allowed to contain elements of different types and
sizes.
Example over
is stored as
.
In the Turing machine model, it is not known how to perform transpositions of bidimensional arrays in linear time. The following lemma shows how to do transpositions with a logarithmic overhead. The special case when all entries have the same bit size was treated before in [5, appendix]. Notice that transpositions do not preserve the total bit size for non square matrices, due to changes in the number of “#” symbols.
be an
matrix. Let
denote the size of
for all
, and
define
. Then
can be transposed in time
.
Proof. We first handle the case
using the following “divide and conquer” algorithm. If
, then the array
is encoded as
and we write its transpose
on the output tape using one linear traversal.
Otherwise, we split
into two matrices
and
on separate tapes, where
is made of the
first rows of
, and
of the
remaining ones. We recursively transpose
and
and glue the results
together on the output tape.
Clearly, the case when can be handled in time
, as well as the algorithm
for splitting
into
and
, and the algorithm for
gluing the transposes of
and
together into the transpose of
.
Let
be a constant such that each of these
algorithms takes time at most
.
Let
and
.
Let us show by induction over
that the
transposition algorithm takes time
.
This is clear for
. For
, the computation time is bounded
by
The case when is handled in an essentially
similar way, by reverting the steps of the algorithm: if
, then
is rewritten
into
using one linear pass. If
, then we recursively apply the algorithm on
the first
and the last
columns, and merge the results in a linear pass. The entire computation
can be done in time
, by a
similar complexity analysis as above.
For univariate polynomials we use dense representation, which means that
a polynomial of degree is stored as the vector
of its
coefficients from degrees
to
. Additions
and subtractions take linear time in
.
Let
denote a cost function that yields an upper
bound for the number of operations in
needed to
multiply two polynomials in
.
For a general ring
one may take
thanks to [10]. For finite fields better bounds exist, and
we write
for the time taken by a Turing machine
to multiply two polynomials in
.
In what follows, any finite field with
and
prime is always assumed to be
given as
with
monic and
irreducible of degree
.
Elements of
are stored as their natural
representatives in
.
Additions and subtractions in
take linear time,
one product takes time
and one inversion
: see [13, part II],
for instance. In [15, 19], it was shown that
.
For a polynomial in a given number of variables
, we use the recursive
dense representation, by viewing
as an
element of
. In particular,
admits the same representation as its expansion
as a univariate polynomial in
. In our algorithms, the number of variables
is not part of the representation of
, so it must be supplied as a
separate parameter.
Example of degree
is represented by
.
The bivariate polynomial
is represented by
.
The support of
is defined as the set of monomials with nonzero coefficients and we
write
for its cardinality. Assuming that, apart
from the mandatory trailing “#” symbol, the representations
of coefficients in
do not involve the
“
” symbol (this
can always be achieved through suitable renaming
), we denote the number of “
” symbols involved in the representation
of
by
.
We notice that
.
Proof. This follows by an easy induction over : for
,
we have nothing to do. If
and
, then we get
which concludes the proof.
If all the equal
,
then
is the constant polynomial
and its representation is
with
symbols “
”. If
for all
,
then the number of
becomes
.
Proof. We use a similar induction as in the proof of Lemma 2.5:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
If , then we already observed
that
. If
, then
.
For the remainder of this subsection, we assume that the size of the
elements in
is bounded by a constant
. In particular, the total size of
a multivariate polynomial
is bounded by
.
Proof. Recall that both and
are considered to be the inputs. We use the
following recursive algorithm: if
,
then we have nothing to do. If
,
then we write
and recursively compute partial
degree bounds
for the coefficients. We next
return
. The lemma clearly
holds for
. By induction, the
recursive computations can be done in time
The computation of the maxima can be done using one linear pass in time
Determining requires an additional time
. Altogether, the computation takes
time
.
Proof. We use the following recursive algorithm: if
, then we have nothing to do.
If
, then we write
and recursively compute total degree bounds
for the coefficients. We next return
if
and
otherwise. The
complexity bound follows using a similar induction argument as in the
previous lemma.
In the following paragraphs we recall the costs of integer multi-remaindering and univariate multi-point evaluation together with the inverse problems. The underlying techniques are well known. They are recalled in the context of Turing machines for convenience.
be an integer, let
be an array of integers in
,
and let
be integers such that
. Then
may be
computed in time
.
Proof. We first compute the bidimensional array
in time by using fast multi-remaindering [13, chapter 10]. Then we appeal to Lemma 2.3 to
obtain
in time
.
be an integer, let
have its coefficients in
,
and let
be integers such that
. Then
can be
computed in time
, where
.
Proof. We first extract the vector
of all the nonzero coefficients of
in time
. We use the latter lemma on
, which incurs time
. Then we recover
from
and
,
for
, in time
.
be pairwise coprime integers, let
, and let
. Then the unique vector
with entries in
such that
for
can be computed in time
.
Proof. First we transpose the bidimensional array
of size
and then use
Chinese remaindering
times. By Lemma 2.3
and [13, chapter 10], this can be done in time
.
be an integer and let
be an array of polynomials in
. For any set of points
we
may compute
in time
.
Proof. We first compute the bidimensional array
in time , using fast
univariate multi-point evaluation [13, chapter 12]. We next
transpose the array in time
,
using Lemma 2.3.
be an integer, let
, and let
be elements
in
. Then
can be computed in time
where is the cardinality of the support of
in the variables
.
Proof. We first extract the vector
of all the nonzero coefficients of
in time
together with
of the same
support as
. We use the
latter lemma on
, which
incurs time
Then for we recover the evaluation of
at
from
and
in time
.
be a prime number, let
be distinct elements of
and let
be vectors in
.
Then the unique vector of polynomials
with
for
and
can be computed in time
.
Proof. We first transpose the
array made of
to obtain
in time by Lemma 2.3. We next
obtain the result through
interpolations, in
time
by [13, chapter 12].
We will have to use the lexicographic order on , written
,
defined by
Notice that in the dense polynomial representation used here,
coefficients are stored accordingly to the lexicographic order on the
exponent vectors; this corresponds to the usual lexicographic monomial
order induced by .
We use fixed point representation to approximate real numbers. The
number , with
and
, is
represented by
, where
“.” is a specific symbol of the machine. A negative number
starts with the symbol “-”. The bit size of
is
. The integer
is called the (absolute) precision. Additions,
subtractions and reducing the precision take linear time. The product of
two fixed point numbers of bit size
takes time
. We gather well known
results from [8].
Proof. Without loss of generality we may assume
. We perform the following
long integer division in time
:
Then we have
Consequently we may take .
Proof. We write with
and
so
. By using [8, Theorem 6.1],
and
may be computed to precisions
and
respectively in time
. Let us write
and
the respective approximations,
which satisfy
and . Then
.
.
An approximation
of
to
precision
such that
may be computed in time
.
Proof. We write with
and
, so
we have
. By using [8,
Lemma 2.3],
may be approximated to precision
in time
.
In this section is an integer with
and
is a polynomial in
with partial degree
in
for
, and total
degree
. We assume that
is given in recursive dense representation for
, as described in the previous
section, so the size of its representation is
where
and
represents the
number of
in the representation of
. The upper bounds
and
for the partial and total degrees are not
necessarily sharp. Throughout this section, we assume that
and that these bounds satisfy
![]() |
(3.1) |
We wish to evaluate at
points in
, written
.
In order to evaluate at a point
the initial idea consists in performing the
evaluation over
, that is to
say by discarding the modulus
.
We write
for the natural preimage of
with coefficients in
,
and
for the natural preimage of
with entries in
. In order to
compute
we construct an ad hoc sequence
of primes
such that
. In this way,
may be
recovered by Chinese remaindering from
.
Minimizing the bit size of is of the utmost
importance for efficiency. For this purpose we introduce the following
quantity
which bounds the cardinality of the
support of
both in terms of the partial and
total degrees:
On the other hand the quantity
![]() |
(3.2) |
is used as a bound for . It
satisfies the following inequality to be used several times in the
proofs when
:
![]() |
(3.3) |
Proof. We first prove that
![]() |
(3.4) |
This inequality trivially holds when .
Suppose by induction that it holds up to
variables. Then
again holds for
variables.
On the other hand we have
The conclusion simply follows from the assumption .
Given and
evaluation
points
in
,
the multi-point evaluation algorithm of this section works as follows:
If is “sufficiently small”, then
we evaluate
at all points in
and read off the needed values. This task is detailed
in the next subsection.
Otherwise we compute prime numbers such that
. This is addressed in
sections 3.3 and 3.4.
We evaluate at all
modulo
for
by
calling the algorithm recursively.
We reconstruct the values of at all
by Chinese remaindering and perform the final
divisions by
.
We will be able to take all the of the order of
. Therefore, the bit size of
the modulus when entering the first recursive call is of the order
, then
at
depth
, then
at depth
, etc. The total bit
size of all recursive problems to be solved at depth
turns out to grow with
. In
section 3.5 we study the complexity of the algorithm in
terms of the depth
. Section
3.6 is devoted to finding a suitable value for
to end the recursive calls. Roughly speaking, the property
“sufficiently small” from step 1 becomes “
is of the order
”,
so the time spent in the exhaustive evaluation of step 1 is of the order
.
We begin with studying the base case of the multi-modular algorithm,
i.e. the exhaustive evaluation at all points of . We recall that this algorithm is
used for sufficiently small values of
.
We regard the evaluation of
at all points in
as a vector in
.
Proof. Detecting if is the
constant polynomial
takes time
. If so, then it suffices to copy
onto the destination tapes
times.
This costs
. From now we
assume that
is not a constant, whence
,
and
.
We interpret as a univariate polynomial
and recursively evaluate the coefficients
at all points in
.
After one
matrix transposition of cost
, this yields a vector of
univariate polynomials
where ranges over
.
Using
multi-point evaluations of these
polynomials at all
of cost
, we finally obtain the vector of all
with
. Denoting
by
the cost of the algorithm, we thus obtain
By induction over , it
follows that
which implies the claimed bound.
In order to evaluate at a specific sequence
of points in
,
we next wrap the latter lemma in the following algorithm that simply
reads off the requested values once the exhaustive evaluation is done.
This task is immediate in the RAM model, but induces a logarithmic
overhead on Turing machines.
Algorithm
Evaluate at all points of
sorted lexicographically.
For from
to
do
Set and
.
Sort the vectors of pairs into
accordingly to the second coordinate
(where
denotes a permutation of
).
Deduce the vector by using the
exhaustive evaluation of step 1.
Sort the latter vector according to the first coordinate in
order to obtain .
Return .
Proof. The cost of step 1 is given in Lemma 3.2.
In the Turing machine model the loop counter and
the bounds
and
do not
need to be explicitly computed in step 2. Instead it suffices to
allocate an array of
bits once on an auxiliary
tape and use it to split the sequence of evaluation points into
subsequences of
elements—except the last
one which has cardinality
.
With this point of view in mind, each step 2.b and 2.d requires time
so the sum over all the values of is
Each step 2.c takes time .
The total cost of all steps 2.c is therefore bounded by
.
Multi-modular techniques classically involve bounds on the first Chebyshev function
For many applications, crude estimates on
suffice. For our refined complexity analysis of the Kedlaya–Umans
algorithm, we rely on the somewhat better estimate
![]() |
(3.6) |
More precisely, it was shown by Barkley Rosser and Schoenfeld [2,
Theorem 31] that, for all ,
Even sharper bounds may be found in [12], but they will not
be necessary here. From now on represents a
constant in
such that
![]() |
(3.7) |
Proof. For fixed and large
, one has
Taking and
sufficiently
large (say
), it follows that
Then it suffices to further increase so that the
implication also holds on the interval
.
In the rest of this section the constant of the
lemma will be used via the following function:
![]() |
(3.8) |
It is also convenient to introduce the function
that will bound the inflation of the modulus at successive recursive
calls of our main algorithm. We will write for
the
-th iterate of this
function.
Generating prime numbers is a standard task. In the next paragraphs we include the needed results for completeness.
Proof. We use the well known Eratosthenes sieve. On
the same tape we generate all the integer multiples of
not larger than
, followed by
all the multiples of 3 not larger than
,
then all the multiples of
not larger than
, etc. The total number of
multiples generated in this way is
.
These multiples can all be generated in time
. Then we sort these integers in increasing order
and remove duplicates in time
.
The integers
which are not listed in this way
are exactly the requested prime numbers, which can thus be deduced with
further
bit operations.
The following algorithm computes consecutive prime numbers larger than a
given integer , such that
their product exceeds a given threshold
.
Algorithm
Set .
Compute all the consecutive prime numbers
less or equal to
.
If , then increase
by
and go back
to step 2.
Compute the first index such that
, and set
and
.
If , then return
.
While do
Compute .
If , then
, else
.
Return .
. In addition we have
.
Proof. After step 3 we have , so the rest of the algorithm corresponds to a
binary search to obtain the minimal index
such
that
. During the
“while” loop of step 6 we have
and
. So when
the minimal sequence is actually found. Since
at
each step of the “while” loop, either
increases or
decreases strictly. Consequently
the algorithm returns the correct result.
We exit step 3 once . Thanks
to (3.6), this condition is met for
, after time
,
by Lemma 3.5. The binary search also takes time
.
We are now ready to present the multi-modular evaluation algorithm. The
parameter indicates the allowed depth for the
recursive calls.
Algorithm
.
.
If or if
is a
constant polynomial, then use Algorithm 3.1.
Call Algorithm 3.2 with and
to compute the shortest sequence of
consecutive prime numbers
with
and
.
Let be the canonical preimage of
in
,
with coefficients in
.
Let
represent the canonical preimage of
in
,
for
.
Compute and
, for
.
For from
to
, compute
modulo
by calling the
algorithm recursively with
.
Use Chinese remaindering to recover .
Compute the remainders by of these
values to obtain and return
.
, that (3.1)
holds, and that
has partial degree
in
for
and total degree
. When
the algorithm takes time
For a fixed value of the algorithm takes
time
where is defined in
Proof. Lemma 3.1 ensures that the
multi-modular approach works well, whence the correctness of the
algorithm. From now assume and
. Inequality (3.1), combined to
the definition of
, implies
. If
is bounded, then so are
,
, and
. Consequently we may freely assume that
is sufficiently large in the cost analysis. From (3.7),
for all
we obtain
Since is increasing for all
, we deduce
The condition is therefore satisfied whenever
![]() |
(3.10) |
By Lemma 3.4, there exists such
that (3.10) is satisfied when
is
larger than
It follows that
![]() |
(3.11) |
From , we deduce that
By Proposition 3.6, step 2 takes time . The number of “#” in the
representation of
is
by
Lemma 2.5. Consequently the multi-remaindering of
in step 3 takes time
by Lemma 2.10. By Lemma 2.9 the multi-remaindering of
takes time
.
In total step 3 contributes to
Step 5 costs by Lemma 2.11. The
cost of step 6 is also dominated by
.
Let denote the cost function of Algorithm 3.3, for
being fixed. In other words,
the constants hidden in the “
”
of the rest of the proof do not depend on
but on
. Since
by (3.3), we have
.
Proposition 3.3 yields the cost of step 1:
By summing the costs of steps 2 to 6, we deduce that
![]() |
(3.13) |
Consequently, if , using the
bounds (3.11), (3.12), and
the cost of Algorithm 3.3 simplifies as follows:
Using (3.3) again gives ,
whence
Now assume by induction that holds for some
. Combining (3.11)
and (3.13) we deduce that:
We claim that for all
which implies
and concludes the proof. The
latter claim is proved by induction on
.
It clearly holds for
. Assume
it holds for
. Then, using
, we verify that
which concludes the proof of the claim.
In order to complete Algorithm 3.3, we still have to
specify how to set the parameter in terms of
. It is natural to let
increase as a function of
. Yet we cannot take
arbitrarily large because the complexity in Proposition 3.7
involves a factor
. The key
idea here is to observe that, if
is very large,
namely when
, then we may use
the naive algorithm to evaluate
independently at
each
. This leads to the
following complexity bound.
Proof. If ,
then
is constant and we just copy the input. If
, then we expand
as a univariate polynomial in
and
recursively evaluate
at the point
. This yields a univariate polynomial
in
that we evaluate at
using Horner's method. Using induction over
, it is not hard to show that the algorithm
essentially performs
ring operations in
, which can be done in time
. We finally recall that
, by Lemmas 2.5 and 2.6.
We are now ready to present the top level multi-point evaluation
procedure. Recall that the bit size of an integer
is given by
.
Algorithm
If , then compute the
output
using the univariate multi-point
evaluation algorithm.
Compute .
Compute the bit size of
and then the bit size
of
.
If , then compute the
output
using the naive algorithm.
Compute the bit size of
.
If or
,
then compute the output
using Algorithm
3.1.
Compute the output using Algorithm 3.3 with parameter
.
where is a function which tends to
when
tends to infinity.
Proof. When we arrive at step 5 with and
, the
inequality
holds, whence
. Consequently the assumption of Algorithm 3.3
is satisfied. This proves the correctness of the algorithm thanks to
Propositions 3.3 and 3.7.
If , then multi-point
evaluation costs
which is below the bound of the proposition. From now on we assume that
. Recall that
by (3.3).
The quantities may be obtained in time
by combining Lemmas 2.5, 2.7 and 2.8.
By the subproduct tree technique, we may compute
in time
, and
in time
. The
cost for summing
is
. We may also compute
in
time
and then easily deduce
as the bit size of
. Overall
the cost of step 2 is negligible.
Let denote the logarithm function in base
. The bit size
of
and then
may be
obtained in time
. We have
and
,
whence
The naive evaluation in step 3 costs by Lemma 3.8. So when
this cost drops to
.
From now we may assume that .
If
is bounded, then so are all other parameters
by (3.3) and
, whence the execution takes time
. If
and
, then we have
so we may use Proposition 3.3 to bound the time of step 4 by
Let us now consider step 5, where we have and
, whence
, as previously mentioned. For our complexity
analysis, we may freely assume that
is
sufficiently large. In particular, by using (3.14),
inequality
implies
On the one hand implies
. On the other hand
implies
. Consequently we have
and deduce:
Therefore the cost of Algorithm 3.3 with parameter is
by Proposition 3.7 and equation (3.8).
By gathering costs of each step we thus obtain that Algorithm 3.4 takes time
for some universal constant .
Finally we set
![]() |
(3.15) |
to conclude the proof.
which tends to
when
tends to
infinity, such that the following assertion holds. Let
be of partial degree
in
for
and of total degree
, and let
be a sequence of points in
.
Then, we may compute
in time
Proof. We may compute with
and
in time
thanks to the lemmas from the end of section 2.
If
, then
and the result directly follows from Proposition 3.9.
Otherwise we apply the evaluation algorithm several times with sets of
evaluation points of cardinality at most
.
In this section and
represent integers and we study multi-point multivariate evaluation over
, where
is a monic polynomial in
of degree
. The approach developed in the next paragraphs
lifts this problem to an evaluation problem over
, so that fast univariate evaluation/interpolation
in
may be used.
We endow with the usual norm
:
Proof. The proof is done by induction on . The inequality is an equality for
. Then we verify that
holds for
.
of partial degree
in
for
and total degree
in
, and let
.
Assume that
and
have
their coefficients in
and degrees
in
. Then we
have
and
,
where
Proof. The degree bound is clear. Now consider the polynomials
From Lemma 4.1 we obtain
The conclusion follows by applying (3.4) and (3.5)
to the polynomial .
Algorithm
Compute , the bit
size
of
,
and the bit size
.
If , then use the
naive evaluation algorithm to compute and return
.
Call Algorithm 3.2 with and
to compute the minimal sequence of the
smallest prime numbers
such that
and
.
Let be the canonical preimage of
in
with degrees
in
and with integer
coefficients in
.
Let represent the similar preimage of
in
,
for
.
Compute and
, for
.
For from
to
do
Specialize for
.
Specialize for
.
For , evaluate
at
by using
the algorithm from Theorem 3.10.
Interpolate .
Use Chinese remaindering to recover .
Compute the remainders by and
of these values to obtain and return
.
and that
has partial
degree
in
for
and total degree
.
Then there exists a function
which tends to
when
tends to
infinity, such that the cost of Algorithm 4.1 is
Proof. The correctness follows from Lemma 4.2.
The quantities may be obtained in time
by Lemmas 2.5, 2.7 and 2.8. As in
the beginning of the proof of Proposition 3.9, the cost for
deducing and
is
negligible. The degree
of
may be obtained in time
.
Then, computing
requires time
. To obtain
we first
evaluate
in time
and
then
in time
.
Overall step 1 takes negligible time
![]() |
(4.1) |
If , then the naive algorithm
in step 2 runs in time
![]() |
(4.2) |
by adapting Lemma 3.8. Proposition 3.6 implies that step 3 takes time
![]() |
(4.3) |
and we have whence
,
for a universal constant
.
The cost of step 4 is obtained by adapting Lemmas 2.9 and
2.10 to :
![]() |
(4.4) |
Let us now examine the cost of step 5. For fixed , the specializations of
and
the
for
in steps 5.a and
5.b require time
by Lemmas 2.12 and 2.13. By Theorem 3.10, the evaluations in step 5.c take time
where (3.15) implies .
The cost of step 5.d is
by Lemma 2.14.
It follows that the total cost of step 5 is
The cost of step 6 is provided by Lemma 2.11, that is
![]() |
(4.6) |
Finally the cost of step 7 is
![]() |
(4.7) |
Summing all costs from (4.1)–(4.7), we obtain the total cost of the algorithm
We conclude that the function
satisfies the requirement of the proposition.
which tends to
when
tends to
infinity, such that the following holds. Let
with
monic of degree
, let
be of partial degree
in
for
and of total degree
,
and let
be a sequence of points in
. Then, we may compute
in time
Proof. If ,
then we use fast univariate multi-point evaluation. Otherwise we use
Proposition 4.3 in combination with
.
The first corollary is a complexity bound in terms of the partial degrees, while the second one concerns the total degree.
be a fixed real
value. Let
with
monic
of degree
, let
be of partial degree
in
for
,
and let
be a sequence of points in
. Then we may compute
in time
Proof. We apply Theorem 4.4 with .
be a fixed real
value. Let
with
monic
of degree
, let
be of total degree
,
and let
be a sequence of points in
. Then we may compute
in time
Proof. We apply Theorem 4.4 with and make use of the well known inequality
. If
,
then
Otherwise,
In both cases we thus have .
If the partial degrees are large with respect to the number of the
variables, then we may use Kronecker segmentation on
in order to decrease the dependency in
in the
complexity bounds from the two previous sections. We first analyze the
cost of Kronecker segmentation on Turing machines and then show how to
reduce the complexity of multi-point evaluation. Throughout this
section,
is an effective ring whose elements
occupy at most
cells on tapes and whose
arithmetic operations take softly linear time.
Let be integers
.
The Kronecker substitution map is the unique
-algebra morphism determined by
When restricted to the space of polynomials of partial degree in
, it becomes
an
-linear isomorphism onto
the space of polynomials in
of degree
. The Kronecker segmentation
associated to
transforms the univariate
polynomial
of degree
into the multivariate polynomial
,
so that
Algorithm
If , then return
.
For , call the
algorithm recursively on
and integers
to obtain
.
Return .
.
Proof. The correctness is clear. Let represent the cost function of the algorithm. Step 1 takes
linear time in the size of
,
that is
. Step 2 requires one
linear traversal and
recursive calls, whence
By induction over , it
follows that
.
The cost of the Kronecker substitution, stated in the next proposition, will be needed in section 7 only.
be of partial degree
in
for
. The Kronecker substitution
of
may be computed in time
.
Proof. The proof is done by induction on . We will require in addition that
is zero padded up to degree
(at the end, we may clearly remove trailing zeros in linear time). The
case
corresponds to a simple zero padding up to
degree
, which takes time
. If
, then we write
,
we recursively compute
for
, and concatenate the results onto the output
tape. The complexity bound is straightforward.
Only the univariate Kronecker segmentation is actually needed for the modular composition algorithm of the next section. In the rest of this section we introduce the multivariate segmentation and make use of it in order to speed up multi-point evaluation.
Now consider a multivariate polynomial of
partial degree
in
for
. For
, let
be integers such that
![]() |
(5.1) |
We introduce the multivariate Kronecker substitution map
This map restricts to an -linear
isomorphism between polynomials of partial degree
in
and polynomials in
of
partial degree
in
.
In this context, the Kronecker segmentation of
is defined as
.
be of partial degree
in
for
, and let
be integers
satisfying (5.1). The Kronecker segmentation
of
may be computed in time
where and
.
Proof. If ,
then we may use Proposition 5.1. Otherwise we consider
in the form
.
Let
represent the cost of the segmentation of
. By Lemma 2.5
the representation size of each
is
. If
,
then Proposition 5.1 gives
Otherwise, we simply have .
It follows that
.
In the rest of this section we explain how to decrease the cost of
multi-point evaluation using Kronecker segmentation of .
Recall that . Let
be an integer. For
with
, let
We thus have
which implies
whence
![]() |
(5.3) |
If , then we set
so that inequality (5.3) still holds. In
addition, if
, then
, whence
, and
![]() |
(5.4) |
Notice that we have . For all
we introduce
so and
hold. From
![]() |
(5.5) |
for , and
![]() |
(5.6) |
we deduce that
![]() |
(5.7) |
In a dual manner to the Kronecker substitution map (5.2)
associated to the we introduce the map
where . Letting
we thus have
In this way we reduce the multi-point evaluation in
variables and partial degree bounds
to
evaluation in
variables and partial degree
bounds
. Notice that this
segmentation generically builds a polynomial
of
total degree close to the sum of its partial degrees. The cardinality of
the support of
is the same as of the support of
, but its number of
“#” symbols in the representation is larger. From (5.7)
we deduce that
![]() |
(5.8) |
The latter may be replaced by a smaller value
with
arbitrarily close
to
whenever
is
sufficiently large. The reduction process is summarized in the following
algorithm.
Algorithm
Compute and then
and
for
and
.
Build .
For all compute
.
Return and
.
Proof. The correctness is clear from the
definitions. The quantities may be obtained in
time
by Lemmas 2.5 and 2.7.
Then we use a binary search to determine
as the
first integer such that
in time
. By Proposition 5.3 the
segmentation in step 2 takes time
.
Using binary powering, step 3 involves
operations in
for each point
.
be a fixed real value and let
be a fixed integer. Let
with
monic of degree
, let
be of partial degree
in
for
, and let
be a
sequence of points in
.
Then, we may compute
in time
where .
Proof. We may freely assume that
is sufficiently large, so that the cost of multi-modular evaluation is
by Theorem 4.4. If ,
then we deduce the complexity bound
so we are done.
From now we assume that . If
is bounded, then so is
, and we may appeal to the naive evaluation
algorithm; the conclusion follows by adapting Lemma 3.8 to
. We may thus further assume
that
is sufficiently large to satisfy
If , then
. Otherwise, (5.3), (5.4),
(5.5), and (5.6) imply
whence
For all we thus have
. It follows that
and
Using Proposition 5.4, we compute
and the
by Algorithm 5.2 in time
Then Theorem 4.4 ensures that the evaluation of at all the
takes time
![]() |
(5.9) |
where . Now we further assume
that
is sufficiently large such that
Then the cost (5.9) rewrites into
In the univariate case it is well known that a polynomial of degree may be evaluated at
points in
softly linear time. In the multivariate setting we wish to reach softly
linear time for the evaluation in degree
,
with
variables, at
points. Although such a complexity bound seems out of reach for the
present techniques, the aim of this section is to prove a slightly
weaker bound. We start with a simple lemma.
Proof. The bound is proved as follows:
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
![]() |
![]() |
be a fixed rational
value. Let
with
monic
of degree
, let
be of total degree
and let
be in
with
. Then we may compute
in time
.
Proof. If ,
then Lemma 3.8 ensures evaluation time
. From now on we may assume
.
First we examine the case when for a constant
to be fixed. Lemma 5.6 combined to
the fact that the function
is nondecreasing
yields
We fix the constant sufficiently small such that
. By Lemma 3.8,
the evaluations can be achieved in time
.
From now we may assume that holds. If
is bounded, then so is
and Lemma
3.8 again leads to an evaluation time
. Consequently we may further assume that
is sufficiently large to satisfy
![]() |
(5.10) |
Setting , Theorem 5.5
provides us with the time complexity bound
For a universal constant ,
this bound simplifies to
Whenever the latter cost further simplifies to
We now consider the other case when ,
so we have
, hence
. In this case, Corollary 4.6
gives the cost
which is bounded from above by
![]() |
|||
![]() |
![]() |
||
![]() |
![]() |
by (5.10). |
In all cases with , we have
thus proved the complexity bound
Since , we have
, so the total running time is bounded by
. We conclude by applying this for
instead of
.
In this section, is an effective field,
is a monic polynomial in
of
degree
, and
are two polynomials in
. We
want to compute
. We first
describe and analyze the algorithm in the algebraic complexity model and
then return to Turing machines in the case when
is a finite field.
Let . This precise choice of
will be motivated below. We let
be an integer such that
![]() |
(6.1) |
In particular, we have
and
so that
tends to for large values of
. Now we define
![]() |
![]() |
![]() |
(1⩽i⩽n-1) |
![]() |
![]() |
![]() |
so that . If
is sufficiently large, then
for all
. In addition, from
and
we deduce
![]() |
(6.3) |
We are now ready to state the modular composition algorithm.
Algorithm
.
,
,
,
, and
, where
is as
in (6.1).
Compute .
If one of the is less than
, then use the naive modular
composition algorithm.
Build .
Compute ,
,…,
.
Compute for
.
Evaluate at
for
.
Interpolate such that
for
.
Return .
The following proposition summarizes the cost in terms of arithmetic
operations in .
operations in
plus the
multi-point evaluation of step 5.
Proof. If is sufficiently
large, then
for all
. Step 3 performs
operations in in view of (6.3).
Steps 4 and 6 respectively take time
and
. Step 7 takes
additional operations in
.
From now we return to the Turing machine model.
be polynomials in
such that
is monic of degree
and
have degrees
.
We assume that
is given as
with
prime and
monic
of degree
. Then
may be computed in time
Proof. The integer can be
computed in time
. Without
loss of generality we may suppose
,
so that
. Since
, Lemmas 2.15, 2.16
and 2.17 allow us to routinely compute
in time
. We compute
as the largest integer such that
, in time
,
and deduce
with additional cost
.
The sum may be bounded from above as follows:
From (6.2) we obtain
Since we deduce that
It follows that
![]() |
(6.4) |
whence
![]() |
(6.5) |
We then obtain
and
which imply that . Combined
with (6.5), this yields
On the other hand, thanks to (6.4), we have
![]() |
(6.6) |
First we handle the case when .
We may generate
pairwise distinct
in time
and use Algorithm 6.1.
Step 5 takes time
by Theorem 4.4. This bound simplifies into
for some constant . Notice
that
so minimizes
.
Now
so step 5 takes
Step 2 of Algorithm 6.1 takes time
by Proposition 5.1 and (6.3). The cost for
other steps, as analyzed by Proposition 6.1, amounts to
operations in , which
simplifies into
. We are done
with the case
.
It remains to deal with the case .
Basically we construct a suitable algebraic extension
of
and run Algorithm 6.1 over
instead of
.
In fact we need
to hold. We compute the first
integer
such that
,
in time
, so we have
. We next compute the smallest
integer
that its coprime with
. Since
,
this takes time
. We proceed
with the construction of an irreducible polynomial
of degree
. Using [43,
Theorem 3.2], this can be done in time
This includes the computation of the irreducible factorization of : the primes
can be computed in time
by Lemma 3.5,
so the prime factors of
may be deduced in time
.
We let represent the monic part of the resultant
in
and we write
for the corresponding subresultant of degree
in
. It is
well known that
is irreducible and that
is invertible modulo
(see for
instance [43, Lemma 2.4]). Setting
, we then have the following isomorphism:
We identify to
.
We may obtain
and
in
time
(see [35, Corollary 31] for
fast algorithms, but it would be sufficient here to appeal to naive
methods). An element of
represented by
may be sent into
in time
. For the backward conversion we
simply replace
by
and
reduce modulo
and
,
which takes time
.
Consequently applying Algorithm 6.1 over
instead of
only involves and additional overhead
of
in the total complexity.
Remark , with a similar complexity result.
Remark at the end of
the proof can be handled more efficiently by constructing irreducible
polynomials by means of a faster, although probabilistic Las Vegas
algorithm; see [13, chapter 14] and [43] for
instance. It is also worth it to build an extension of
of smooth degree
with
, which allows the isomorphism (6.7) to
be computed more efficiently [27].
For the case when is a field of small
characteristic
, Kedlaya and
Umans also designed an algebraic algorithm for fast multi-point
evaluation [34, Theorem 6.3]. This algorithm turns out to
be somewhat more efficient than those from the previous sections. In the
present section, we adapt their techniques and prove a complexity bound
in terms of the total degree instead of the partial degrees. We also
refine their complexity estimates for multi-point evaluation and modular
composition.
The base field is written with
and
prime; we assume it is explicitly given as
with
monic and
irreducible of degree
. We
let
represent a constant such that two
matrices can be multiplied with
ring operations.
We begin with -th root
extractions in
. It is well
known that such root extractions reduce to linear algebra via
the so-called Pietr-Berlekamp matrix of the isomorphism
of
in the canonical basis
. Once the inverse of this matrix is known, each
root extraction takes
ring operations in
, and
root
extractions take
ring operations in
. We could use this strategy in this section
but it would involve an extra factor
in our
complexity estimates. Instead, since we focus on the case when
remains small, it is worth using the following algorithm,
borrowed from [34, Theorem 6.1] and originating from [37].
Algorithm
Expand into
, where
.
Expand into
, where
.
Select such that
, compute
,
and return
.
ring operations in
plus
inversions in
.
Proof. The identity is
equivalent to
. For all
we have
and thus
. Since
is irreducible,
is invertible modulo
. We are done with the correctness.
The computations in steps and 2 take
ring operations in
.
Step 3 requires
ring operations in
plus
inversions in
by using fast extended gcds.
The next algorithm, borrowed from [34, section 6], performs multivariate multi-point evaluations by reduction to the univariate case.
Algorithm
Compute the total degree of
.
Let be minimal such that
and
.
Build a monic irreducible polynomial of
degree
, and find a
primitive root
of unity of maximal order
in
.
Reinterpret as an element of
, let
, and reinterpret
as an element of
.
For from
to
do
Compute .
Interpolate such that
for
.
Let , compute
modulo
for
.
Return .
Proof. For and
we write
. Let
represent the endomorphism of
that rises coefficients to the
-th
power. Then
In particular has degree
whence
, by the assumption
. Now
Since has degree
,
we deduce that it coincides with
,
whence
. We are done with the
correctness.
Step 1 takes time
By Lemmas 2.6 and 2.8. Steps 2 and 4 take negligible time.
By construction, we have
![]() |
(7.1) |
In step 3, the construction of may be done
deterministically in time
by means of [43, Theorem 3.2] (this includes the
computation of the irreducible factorization of ).
Now [44, Theorem 1.1] provides us with a universal constant
such that there exists a monic polynomial
of degree
with
, for which
is a
primitive element. It thus suffices to enumerate all the monic
polynomials of increasing degrees
and to stop as
soon as one discovers a primitive one. The loop terminates after testing
candidates.
Verifying that a candidate polynomial is a
primitive element for the multiplicative group of
can be done as follows. We first compute the irreducible factorization
of
in time
(see for
instance [11, Theorem 1.1]). We next verify that
for each prime divisor of
in time
. Altogether, this
allows us to compute
deterministically in time
Now each step 5.a requires extractions of
-th roots in
. In total, they take time
by Proposition 7.1. The total cost of step 5.b is .
Since the partial degrees of are
and since
, the
polynomial
is the Kronecker substitution
, following the notation of section
5.1. The univariate polynomial
in
step 6 may be obtained in time
by Proposition 5.2, thanks to (7.1) and the
assumption that .
Then the simultaneous evaluation of at
points in
takes
operations in which corresponds to time
![]() |
(since deg f∗⩽d*hn-1) | ||
![]() |
![]() |
by (7.1). |
The final evaluations in step 7 take time .
Remark is not necessarily a field.
As an optimization, it is worth taking
and build
the extension
of degree
of
. Then we compute a
primitive root
in
of
order
. In practice, it is
also worth using faster probabilistic algorithms in step 3.
Let us now reanalyze the complexity of Algorithm 6.1 when
using Algorithm 7.2 for multi-point evaluations. For
simplicity we assume that is a fixed prime
number, so the constant hidden in the “
” below actually depends on
. This time, we let
be
an integer such that
![]() |
(7.2) |
be a fixed prime number. Let
be polynomials in
such that
is monic of degree
and
have degrees
.
We assume that
is given as
with
monic irreducible of degree
. Then
may be
computed in time
Proof. We adapt the proof of Theorem 6.2
and use (7.2) for the value of
instead of (6.1). Here, it is important for
to be fixed, in order to benefit from the same kind of
asymptotic expansions as in the case of Theorem 6.2:
and
In particular still tends to
for large values of
.
The main change is the use of Proposition 7.2 to obtain the following cost for step 5 of Algorithm 6.1:
By using (6.4), (6.5), (6.6), and
the fact that , the latter
cost is bounded by
We then need to upper bound the term
Now for sufficiently large ,
we have
It follows that
On the other hand, we have
which concludes the proof.
For small fixed values of ,
Theorem 7.4 therefore improves upon Theorem 6.2.
This happens roughly whenever
,
which rewrites into
. A more
precise complexity analysis in terms of the parameter
could be developed under the latter condition.
An important application of the present results concerns polynomial system solving, for which we prove new complexity bounds in [28]: the key algorithms are the Kronecker solver [14] and fast multivariate modular composition. For the latter problem, we mostly follow the strategy deployed in the proof of Proposition 5.7.
Besides technical adaptations to Turing machines and various refinements within the asymptotic complexity bounds, our main improvements upon Kedlaya and Umans' algorithms concern complexity analyses in terms of the total degree, and the way we appeal to the naive evaluation algorithm as a fallback in Theorem 3.10. In particular, our complexity bounds are quasi-optimal with respect to the bit size of the elements in the ground ring.
Another major motivation behind our work is to understand how relevant the new complexity bounds are for practical implementations. Unfortunately, the input sizes for which our optimized variants of Kedlaya and Umans' algorithms become faster than previous approaches still seem to be extremely large. It is instructive to discuss, even in very informal terms, the orders of magnitude of the cross-over points.
In the univariate case over
, fast algorithms for multi-point evaluation
allow for an average evaluation cost per point of
, by means of the classical subproduct tree
technique [13, chapter 10], and as soon as the number of
points is of the order of the degree. In the same spirit, in order to
minimize the average cost per evaluation point, Theorem 3.10
indicates that it is favorable to take
of the
order of
(that is larger than the cardinality of
the support of
).
To simplify the discussion, we discard the factor
occurring in Theorem
3.10
, and we focus on the case when
and
.
So when
,
the average cost per point roughly becomes
.
Recall that this average cost is
with the naive approach. The ratio between both bounds is therefore of
the order
.
In Table
8.1
|
||||||||||||||||||||||||||||||||||||||||||||||||
Table 8.1. Efficiency threshold orders for the complexity bound of Theorem 3.10. |
The above factor corresponds to the value
in Algorithm 3.3. In practice it turns
out to be more interesting to use smaller values for
. In fact, it is worth using Algorithm 3.3
with
whenever
,
since
and the cost given in Proposition
3.7
drops to
.
Table
8.2
|
||||||||||||||||||||||||||||||
Table 8.2. Efficiency threshold
orders for the complexity bound of Proposition 3.7
with |
In small positive characteristic ,
Proposition 7.2 seems more promising for practical
purposes. For
, the average
cost per evaluation point drops to
.
The efficiency ratio with respect to the naive algorithm is thus of the
order
. For instance, with
and
,
this ratio rewrites into
.
Consequently, Algorithm 7.2 might be relevant in practice
for input data of a few kilobytes. However we are not aware of such an
efficient implementation.
For the above reasons, faster software implementations of multivariate
multi-point evaluation and modular composition remain major challenges.
At least, we hope that our new detailed complexity bounds will stimulate
more theoretical and practical research in this area. For example, is it
possible to decrease the contribution of in
Theorem 3.10? Or, still in Theorem 3.10, could
one decrease the exponent
of
? Is it possible to improve upon the constant
in Theorem 6.2?
A. Arnold, M. Giesbrecht, and D. S. Roche. Faster sparse multivariate polynomial interpolation of straight-line programs. J. Symbolic Comput., 75:4–24, 2016.
J. Barkley Rosser and L. Schoenfeld. Approximate formulas for some functions of prime numbers. Ill. J. Math, 6:64–94, 1962.
D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.
L. I. Bluestein. A linear filtering approach to the computation of discrete Fourier transform. IEEE Transactions on Audio and Electroacoustics, 18(4):451–455, 1970.
A. Bostan, P. Gaudry, and É. Schost. Linear recurrences with polynomial coefficients and application to integer factorization and Cartier–Manin operator. SIAM J. Comput., 36:1777–1806, 2007.
A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.
A. Bostan and É. Schost. Polynomial evaluation and interpolation on special sets of points. J. Complexity, 21(4):420–446, 2005.
R. P. Brent. Fast multiple-precision evaluation of elementary functions. J. ACM, 23(2):242–251, 1976.
R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28:693–701, 1991.
E. Costa and D. Harvey. Faster deterministic integer factorization. Math. Comp., 83(285):339–345, 2014.
P. Dusart. Estimates of some functions over primes without R.H. Technical report, ArXiv, 2010. https://arxiv.org/abs/1002.0442.
J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 3rd edition, 2013.
M. Giusti, G. Lecerf, and B. Salvy. A Gröbner free alternative for polynomial system solving. J. complexity, 17(1):154–211, 2001.
D. Harvey and J. van der Hoeven. Faster integer and polynomial multiplication using cyclotomic coefficient rings. Technical report, ArXiv, 2017. http://arxiv.org/abs/1712.03693.
D. Harvey and J. van der Hoeven. Faster integer multiplication using plain vanilla FFT primes. Math. Comp., 88(315):501–514, 2019.
D. Harvey and J. van der Hoeven. Faster integer multiplication using short lattice vectors. In ANTS XIII. Proceedings of the Thirteenth Algorithmic Number Theory Symposium, volume 2 of The Open Book Series, pages 293–310. Mathematical Sciences Publishers, 2019.
D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication. J. Complexity, 36:1–30, 2016.
D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.
J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proceedings of the 2004 international symposium on Symbolic and algebraic computation, ISSAC '04, pages 290–296, New York, NY, USA, 2004. ACM.
J. van der Hoeven. Fast composition of numeric power series. Technical Report 2008-09, Université Paris-Sud, Orsay, France, 2008.
J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://hal.archives-ouvertes.fr/hal-01403810.
J. van der Hoeven and G. Lecerf. On the bit-complexity of sparse polynomial and series multiplication. J. Symbolic Comput., 50:227–254, 2013.
J. van der Hoeven and G. Lecerf. Composition modulo powers of polynomials. In M. Blurr, editor, Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '17, pages 445–452, New York, NY, USA, 2017. ACM.
J. van der Hoeven and G. Lecerf. Modular composition via complex roots. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01455731.
J. van der Hoeven and G. Lecerf. Modular composition via factorization. J. Complexity, 48:36–68, 2018.
J. van der Hoeven and G. Lecerf. On the complexity exponent of polynomial system solving. https://hal.archives-ouvertes.fr/hal-01848572, 2018.
J. van der Hoeven and É. Schost. Multi-point evaluation in higher dimensions. Appl. Alg. Eng. Comm. Comp., 24(1):37–52, 2013.
Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.
F. Johansson. A fast algorithm for reversion of power series. Math. Comp., 84:475–484, 2015.
E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.
K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.
K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
G. Lecerf. On the complexity of the Lickteig–Roy subresultant algorithm. J. Symbolic Comput., 92:243–268, 2019.
M. Nüsken and M. Ziegler. Fast multipoint evaluation of bivariate polynomials. In S. Albers and T. Radzik, editors, Algorithms – ESA 2004. 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004, volume 3221 of Lect. Notes Comput. Sci., pages 544–555. Springer Berlin Heidelberg, 2004.
D. Panario and D. Thomson. Efficient th root computations in finite fields
of characteristic
.
Des. Codes Cryptogr., 50(3):351–358, 2009.
C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.
P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.
A. Schönhage. Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informatica, 1(2):139–144, 1971.
A. Schönhage, A. F. W. Grotefeld, and E. Vetter. Fast algorithms: A multitape Turing machine implementation. B. I. Wissenschaftsverlag, Mannheim, 1994.
V. Shoup. New algorithms for finding irreducible polynomials over finite fields. Math. Comp., 54(189):435–447, 1990.
V. Shoup. Searching for primitive roots in finite fields. Math. Comp., 58:369–380, 1992.