|
Nowadays, asymptotically fast algorithms are widely used in computer algebra for computations in towers of algebraic field extensions of small height. Yet it is still unknown how to reach softly linear time for products and inversions in towers of arbitrary height. In this paper we design the first algorithm for general ground fields with a complexity exponent that can be made arbitrarily close to one from the asymptotic point of view. We deduce new faster algorithms for changes of tower representations, including the computation of primitive element representations in subquadratic time.
|
Let be an effective commutative ring with unity. Here effective means that elements of can be represented by concrete data structures and that we have algorithms for the ring operations, including the zero test. Effective fields are defined in a similar way.
Given a monic polynomial of degree , the quotient ring is an effective ring and fast algorithms exist for the arithmetic operations in . More precisely, counting in terms of operations in , additions in require additions in , whereas multiplications can be done [15] in almost linear time . Here is a standard notation for the cost of multiplying two univariate polynomials of degree : see section 2.2.
Given another monic polynomial , we may build the effective ring and fast arithmetic operations in are available for the same reason. Doing so inductively, we obtain a tower of extensions of , with for . Such a tower is written and we call its height. Throughout this paper, we write for the class of in , we let , and . The are called the defining polynomials of the tower and its degree. Elements of are naturally represented by univariate polynomials in over of degrees . If all are fields, then we write instead of and instead of , for convenience.
Towers of this type naturally arise in several contexts of applied algebra, including cryptography (for instance in [2, 3, 18]), error correcting codes (for instance in [26]), and the resolution of differentially algebraic systems in the more general setting of triangular sets (for instance in [1, 12]).
By induction over , basic arithmetic operations in can be naively performed in time , for some constant . But it is well known that, for a sufficiently large constant , these operations can actually be carried out in time by means of Kronecker substitution (see for instance [27, Chapter 8]). However, if many of the individual degrees are small, then can become as large as , which leads to an overall complexity bound of the form . The main aim of this paper is to prove the sharper bound in the case when is a field and the are irreducible and separable. The top level complexity result is stated in Corollary 4.11.
In order to simplify the presentation of complexity bounds, we often use the soft-Oh notation: means that ; see [27, Chapter 25, section 7] for technical details. The least integer larger or equal to is written ; the largest one smaller or equal to is written . The -module of polynomials of degree is written .
One essential tool for our new algorithms is modular composition. Before returning to our main problem, let us recall several useful existing results on this problem and various related topics.
Let be as above and assume that , whence . Given the computation of the remainder of the Euclidean division of by is called the problem of modular composition. It is equivalent to the problem of evaluating at a point . Currently, the best known solution to this problem is due to Brent and Kung [13, 56] and requires operations in . Here the constant , with , denotes an exponent such that a rectangular matrix can be multiplied with a square matrix with operations in . Huang and Pan showed in [40] that one may take . Brent and Kung's algorithm is based on the baby-step giant-step technique [50]; we will recall and generalize it in our section 3.
For a fixed point , the evaluation map
is linear. The dual or transposed map is given by
and sends an -linear functional to the vector . The computation of this transposed map is called the problem of modular power projection. Using the transposition principle (see [10] and historical references therein), the cost of power projection and modular composition are essentially the same. In particular, the computation of the traces corresponds to one modular power projection.
Given an element , the characteristic polynomial of is the characteristic polynomial of the multiplication endomorphism by in . If is a field, then this polynomial can be computed in softly quadratic time by means of a resultant (see for instance [45, Corollary 29]). Shoup has also designed a practical randomized algorithm to compute minimal polynomials for the case when has sufficiently many elements, with an expected complexity of .
The fastest known method to compute the characteristic polynomial of an element over proceeds as follows: first compute the traces of the powers of for all using modular power projection, and then recover by integrating the differential equation
(1.1) |
where represents the reverse polynomial of . This integration requires to have the inverses of in at our disposal. Historically, this method goes back to Le Verrier [43], and formula (1.1), often called the Newton–Girard formula, expresses the relationship between symmetric and power sum polynomials. The integration step takes softly linear time: the seminal contributions are due to Brent and Kung [13], generalizations can be found in [8], the first efficient extension to finite fields has been designed in [9], and we refer the reader to [29, section 2] for a concise proof of it.
Recall that an element is said to be primitive over if . The primitive element theorem states that such an element always exists: if contains sufficiently many elements, then it suffices to take a sufficiently generic -linear combination of the . In this light, it may seem odd at first sight to work with towers of height , since an isomorphic tower of height one always exists. The problem is that both the computation of primitive elements and conversions between representations are usually expensive.
Concerning primitive elements, it is currently not known how to efficiently find one, together with its minimal polynomial and polynomials such that for . In fact, naive algorithms mostly boil down to linear algebra in dimension , and thus feature at least quadratic costs. One may for instance appeal to the FGLM algorithm to change orderings of Gröbner bases [21, 22].
If a modular composition algorithm (technically, for the multivariate version of section 3.2) of softly linear complexity were known over any field, then primitive element representations of towers would be computable in softly linear expected time by a randomized algorithm; conversions between towers and their primitive element representations would also become possible in softly linear time. These aspects have been studied by Poteaux and Schost in [51, 52], where subquadratic costs are achieved for multivariate modular composition and its transpose, as well as for computing primitive representations of towers.
Let represent a fixed positive rational value. If is the finite field with elements, then a major result by Kedlaya and Umans [42] states that modular composition and power projection are possible in time . Whenever , then this in particular implies that characteristic polynomials can be obtained with expected bit complexity : see [29, sections 2.3–2.5] for details. Based on these results, Poteaux and Schost have derived [51] fast algorithms for multiplication, division, traces and primitive element representations for separable towers over finite fields. Unfortunately, very large constant factors are hidden in the complexities of all these algorithms, which currently prevent them from being of any practical relevance.
For some other particular cases of interest, fast algorithms also exist for modular composition and related problems. For fixed irreducible moduli of sufficiently smooth degree , practically faster algorithms for modular composition have been proposed in [39]. When , we have also shown in [38] that modular composition can be achieved in quasi-linear time when computing with a sufficiently high numeric precision.
More direct approaches for algebraic tower arithmetic are usually based on computations modulo so-called “triangular sets”. We may see the preimage of over as a multivariate polynomial in such that is monic in , has degree in , degrees in for all , and . The sequence forms a special case of a triangular set. Triangular sets and decompositions have a long history that goes back to Ritt's contributions in differential algebra [53].
In this context of triangular sets, it was shown in [46] that two elements in can be multiplied in time . In [44, Proposition 2], Lebreton has proved the stronger bound . This result even applies if the ideal () is not radical, under the condition that form a regular chain (in the sense of Kalkbrener [41], see also [1] for generalities about triangular sets). For triangular sets of certain quite special types, we notice that even faster multiplication algorithms have been designed in [7]. When working over finite fields, practically efficient algorithms have also been proposed in [17], based on fast embeddings into towers of a specific type.
Another major occurrence of algebraic towers in the literature concerns “dynamic evaluation”. This technique was introduced by Della Dora, Dicrescenzo and Duval [19, 20] as a way to compute with algebraic numbers without requiring algorithms for polynomial factorization. Basically, the idea is to compute in as if it were a field, and cases are distinguished whenever a zero test of an element is requested: the “current tower” is then explicitly decomposed into a “direct product of two towers” such that the projections of in these two towers are respectively zero and invertible. The computations finally resume non-deterministically with each of the two new towers in the role of the current tower. For recent complexity results on this technique we refer the reader to [16].
The efficient computation with polynomials modulo zero-dimensional ideals defined by triangular sets is an important topic in applied algebra. Obtaining softly linear complexity bounds when working over a general field is an open problem. The previous best known bound due to Lebreton [44, section 3.2] is recalled in section 2.4.
Then, one first technical contribution of us concerns multivariate modular composition: in section 3.2 we design a new baby-step giant-step algorithm to evaluate with the same kind of complexity as the Brent–Kung algorithm for the univariate case. It slightly improves upon the incremental method described in [52], that actually relies on the bivariate case. In fact, our method applies the baby-step giant-step paradigm over the whole representation of . Compared to [52, Lemma 4] used with , the complexity bound given in Proposition 3.3 does not have restrictions on the characteristic, and is asymptotically smaller by a factor of .
The main contribution of this paper is section 4, which is devoted to a new deterministic algorithm to multiply in towers of separable field extensions (all the are thus irreducible and separable). For the first time we achieve an asymptotic complexity with an exponent that becomes arbitrarily close to one. From the asymptotically complexity point of view, and in the specific case of towers of separable field extensions, this improves upon [44, 52], and also upon the randomized algorithms for finite fields from [51]. For specific towers over finite fields, some of the algorithms from [17] remain more efficient both in practice and in theory.
The main idea behind our algorithms is as follows: we use the primitive element theorem to replace a general tower of height by a new “accelerated” tower for which we can control the height . There is a tradeoff between the cost of tower arithmetic (more expensive for larger ) and primitive element computations (more expensive for small ). By making a suitable compromise, we obtain our main complexity bound.
One shortcoming of our main result is that it only applies to towers of fields. Now if an algorithm is available for factoring polynomials in into irreducibles, then our results generalize to towers of separable integral extensions over , by “factoring” them into towers of separable field extensions. This reduction and the corresponding conversion algorithms (that generalize Chinese remaindering) are detailed in section 5. Some particular cases when the cost of factorization is often affordable are discussed in section 5.5.
Our final section 6 contains a general subquadratic Las Vegas complexity bound for computing primitive element representations of towers of separable field extensions, along with the related data for conversions: our algorithms directly benefit from the fast product, and they rely on power projections as in [11, 42, 51]. Compared to [52, Lemma 4] used with , section 6 gains a factor of asymptotically, without restrictions on the characteristic. When working over a finite field, section 6 does not improve upon the asymptotic complexity bounds from [51]. Nevertheless, [51] relies on the Kedlaya–Umans algorithm for modular composition, so we expect our approach to behave better in practice.
Let us finally stress once more that we regard our contribution as a uniform way to prove almost optimal complexity bounds for tower arithmetic in the algebraic complexity model. It is well known that this model is reasonably close to bit complexity models when working over finite fields. Over other fields such as , coefficient growth needs to be taken into account, which leads us beyond the scope of this paper. From a practical perspective, we also recall that efficient tower arithmetic has been developed over various specific kinds of base fields such as finite fields and subfields of : see [17, 38, 39] and section 5.5. It is true however that actual implementations of our own algorithms have fallen somewhat behind. We intend to address such implementation issues in the near future.
This section gathers basic prerequisites on complexity models and polynomial arithmetic. In particular we recall Lebreton's results on multiplication in towers [44]. Then we examine the costs of divisions.
Let be an effective ring. Our complexity analyses concern the algebraic complexity model [14, Chapter 4] over , and more precisely straight-line programs and computation trees. In other words, running times of algorithms are measured in terms of the required number of ring operations in , and constants are thought to be freely at our disposal.
It is convenient to introduce the notations and as abstractions for the respective costs of multiplication and division in (whenever defined). For simplicity we always assume that multiplication is at least as expensive as addition, subtraction, and zero testing.
For randomized algorithms over a finite effective ring , we assume a special instruction that uniformly generates random elements in , with constant cost. For a given input, the cost of a randomized algorithm is thus a random variable. The expected cost for input size is defined as the maximum of the averages of these random variables over all possible inputs of size .
Remark
Let be an effective commutative ring with unity. We denote by a cost function for multiplying two polynomials . For general , it has been shown in [15] that one may take
(2.1) |
For rings of positive characteristic, one has
by [30]. We make the following assumptions:
is a non-decreasing function in .
is sufficiently close to linear, in the sense that
(2.2) |
holds whenever .
These assumptions hold for as in (2.1). Notice that (2.2) is also equivalent to . Applying equation (2.2) a finite number of times, it follows that for any fixed positive integer .
If is an effective -algebra that is also a finite dimensional free -module, then we denote by a cost function for multiplying two polynomials in , in terms of the number of operations in , and we make the same assumptions as for . Notice that we may always take .
In particular, if is a monic polynomial in of degree then is such an -algebra of dimension ; elements in are represented by polynomials in . In this case, we have and by means of Kronecker substitution [27, Chapter 8].
If is a finitely generated -algebra, an element is said to be primitive if . In this case, we write the monic minimal polynomial of , so the following isomorphism holds:
Notice that such a primitive element representation does not necessarily exist: consider and .
A primitive element of over , for .
The minimal polynomial of over , for .
such that , for and .
These data induce the following isomorphisms for :
The polynomials are called parametrizations of the in terms of the .
With the triangular set as defined in the introduction, an element is represented by a polynomial with defined modulo . So the conversion of into an element of boils down to evaluating modulo . The backward conversion consists in evaluating a univariate polynomial at . Such conversions are the topic of section 3 below.
Remark
Under the assumptions of section 2.2, recall that there exists a constant such that for , and where represents a cost function for multiplying two polynomials in , in terms of the number of operations in . When using this bound in an iterated fashion, we obtain
By taking care of the cost of polynomial divisions in terms of multiplications, one may prove the following sharper bounds:
Proof. Taking , Lebreton proved that ; see [44, section 3.2]. His proof was done for the particular cost function . For completeness, we briefly repeat this proof, for a general cost function such that is non-decreasing. We recall that two polynomials of partial degrees in the can be multiplied in time using Kronecker substitution recursively; see [27, Chapter 8] and [44, section 3.2].
We let represent the triangular set associated to the tower, as defined in section 1.3. We identify elements in the tower with their canonical representatives in . Let represent the reverse polynomial of in . Assume for now that we precomputed polynomials in , of partial degrees in the , such that
Let represent a cost function for reducing polynomials in of degrees in for modulo . The reduction of such a polynomial modulo is done recursively, as follows:
Let . If , then we reduce modulo and we are done.
Otherwise, let be the reverse of .
Reduce modulo , in time .
Compute , in time .
Reduce modulo , in time .
Let , compute , and then reduce modulo . It turns out that equals modulo ; see for instance [27, Chapter 9, section 1].
Altogether, the cost of this algorithm is therefore bounded by
for some universal constant . We deduce that .
Now let and be the respective representatives of two elements and to be multiplied in the tower. The product takes , and is then reduced modulo , whence
For the second bound of the proposition, we may multiply two polynomials in as in with operations in , and then reduce their product coefficientwise modulo with cost , whence .
It remains to observe that once are known, the precomputation of using Newton's method takes operations in ; see [27, section 9.1], for instance. Therefore, the complexity bounds obtained so far remain valid up to a constant factor when taking the cost for obtaining the into account. Finally, our assumption for implies , so (2.2) leads to the claimed bounds.
Remark
Remark
In the remainder of this section, we assume that we work in a tower of fields over . Given such that is monic, it is well known [27, Chapter 9] that the quotient and the remainder of the Euclidean division of by can be computed in time .
It is important for us that the gcd algorithm with input returns the monic polynomial along with such that . In this way, if then is the inverse of modulo and is the inverse of modulo . It is well known [27, Chapter 11, Algorithm 11.6] that this extended gcd can be performed in time .
In fact, when replacing all polynomial divisions by pseudo-divisions in [27, Chapter 11], we avoid all divisions in , but the computed gcd is not necessarily monic. Multiplying , and with the inverse of the leading coefficient of , we do obtain a monic gcd, at the expense of a single division in . Summarizing, this shows that the extended monic gcd can actually be computed in time .
Now consider an extension of , where is a monic irreducible polynomial of degree . Then the above bounds for division and gcd computations yield
When using the univariate algorithm for inverting non-zero elements in in a recursive manner, we obtain the following complexity bound:
Proof. Let and be universal constants with for all and for all extensions of as previously considered. Then we have
and we conclude by induction.
Let us mention that inversion modulo a general triangular set (that does not determine a prime ideal) is more subtle; see various approaches in [16, 49].
Throughout this section represents an effective ring and is an effective -algebra with a given basis . Given a polynomial and a point , we study how to compute efficiently. We first recall the well known baby-step giant-step algorithm in the case when . In section 4 below, these evaluation algorithms will be used for conversions between triangular and primitive representations of the same algebra.
Given a polynomial and , how to evaluate efficiently at ? Horner's method uses time . For convenience of the reader, we now recall a well known faster algorithm from [50] that relies on the baby-step giant-step technique.
Algorithm
Output: .
Proof. By construction, we have for , whence . This proves the correctness of the algorithm. Step 2a requires multiplications in , when computing the powers using . Step 2b takes assignments in , which come for free in our complexity model. The matrix multiplication in step 3 can be done in time . Step 5 involves multiplications and additions in , when using Horner's method. Altogether, this leads to the claimed running time.
Remark
Let us now study the evaluation of a multivariate polynomial of partial degree in each at a point . Setting and writing for the coefficient of the monomial in , we have the following generalization of Algorithm 3.1 to multivariate polynomials.
Algorithm
Output: .
Proof. This time, for all , we have
Plugging this expression into the formula of step 6 shows that the algorithm indeed returns the desired evaluation.
From we always have . If , then we obtain
and then
The lower bound on also implies
whence
Otherwise straightforwardly implies and . In all cases and are in . Consequently the complexity analysis is the same as for Algorithm 3.1. Of course, one has to carefully evaluate the power products in step 3b and the sum in step 6, in order to use only multiplications in .
Remark
Let be a tower of separable algebraic extensions of an effective base field . Throughout this section, we assume that the extension degrees satisfy for all (which is usually not restrictive as explained in Remark 2.6), and that is as in Proposition 2.4. Our main objective is to design a fast algorithm for multiplying elements in the tower.
The basic idea is as follows. Let be an arbitrarily small positive constant. If at least half of the are “sufficiently large”, then automatically becomes “small enough” to ensure that . Otherwise, there exist consecutive small degrees , and we replace the corresponding subtower of extensions by a primitive one . We keep repeating these replacements until the height of the tower becomes “small enough”. Some careful balancing is required here, since the computation of a primitive element for over can become expensive if gets “too large”. The precise tradeoff will be made explicit at the end of the section.
In this section, given two effective rings and along with a natural way to convert elements in to elements in , we denote by the cost of such conversions. If we also have an algorithm for conversions in the opposite direction, then we write .
Assuming that contains sufficiently many elements, the aim of this subsection is to construct a primitive element representation in the sense of Definition 2.2. We thus have to compute primitive elements over such that
together with the minimal polynomials of over for . We also show how to convert efficiently between each of the two representations of .
Proof. The bound on the cost of conversions from to is a direct consequence of Proposition 3.3 by taking the assumption into account. For , the minimal polynomial of over may thus be converted into a minimal polynomial over in time . The computation of requires extra operations since for all .
For conversions from to , we consider a polynomial . We evaluate at in seen as with cost by Proposition 3.1. We thus obtain a polynomial such that .
We next need to convert the coefficients of from to . Using a straightforward induction this leads to the following cost:
Here we again use of the assumptions that and for all .
Remark
The next proposition provides us with complexity bounds for computing primitive tower representations by induction. We denote the cardinality of by .
We can compute of the form with , and in time
Given and , we can compute with for in time
In addition, if then the computations in part a can be achieved with expected cost by means of a randomized Las Vegas algorithm.
Proof. Thanks to Proposition 4.1, modulo conversions of cost between and , we rewrite into such that , , . We thus have
Now let be a parameter that will be specified later, and consider . The characteristic polynomial of over equals to . Consequently the characteristic polynomial of over is -proportional to
The polynomial can be obtained as the preimage of whose computation costs by a standard “divide and conquer” approach; see [5, Lemma 7], for instance. Then the resultant may be computed in time by [45, Corollary 26].
Regarding as an indeterminate, the polynomial has degree . Therefore may be interpolated from values of . The total cost to obtain is thus
Geometrically speaking, the system admits the pairwise distinct solutions in , where denotes the algebraic closure of . The polynomial is separable if, and only if, its roots are pairwise distinct. There are at most values of for which this is not the case, so we simply need to try distinct values of until becomes separable. Testing the separability of for values of is achieved as follows.
We first evaluate for . Using fast multi-point evaluation, this takes time . We next test whether the discriminant of in vanishes, for ; this can be done in time . Doing this for packets of values, the overall computation can be done in time . This completes the proof of part a.
As to part b, for any root of the gcd of and has degree , so the specialization property of subresultants ensures that the subresultant in of degree of and has the form with coprime to . This subresultant can be computed in time again by [45, Corollary 26]. Since and have degrees , we obtain the polynomial modulo in with additional cost . Then from we deduce that . For , we then take . Proposition 3.1 implies that the cost of these modular compositions is bounded by . This completes the proof of part b.
If , then we use a Las Vegas algorithm for part a: we pick at random in a set of size until is separable. Each pick is successful with probability at least , so the expected cost is .
Proof. This directly follows from the latter proposition, using that for all .
Remark
The main problem with the complexity bounds of Proposition 2.4 is that the height of the tower may get as large as . Now if indeed gets large, then many of the are necessarily small. Furthermore, as long as a product of the form is reasonably small, the results from the previous subsection allow us to change the given representation of into a more efficient primitive element representation . Repeating this operation as many times as necessary, we reduce the height of the tower and guarantee that all defining polynomials have sufficiently large degrees. This process is detailed in the next paragraphs.
A sequence of integers .
A tower such that for , represented by a sequence of defining polynomials , where and .
For , a primitive tower representation of seen as a tower over and ending with .
Denote . If and , then . If , then .
Proof. The construction of the sequence is done by induction. For with , assume that we have completed the construction up to height . We distinguish the following cases:
If then we set .
Otherwise, we take maximal such that . So either or .
Whenever we must have either or . Consequently the number of such indices is constrained by . It follows that . On the other hand the number of indices with is necessarily . It follows that . Once the indices are known, the existence of a -accelerated tower representation follows from Corollary 4.4.
Algorithm
Output. A -accelerated tower representation of .
In order to bound the execution time of Algorithm 4.1, we need to carefully analyze the cost of conversions between and for .
Proof. By Proposition 2.4, there exists a universal constant with
By Proposition 4.1, there also exists a universal constant such that conversions between and can be performed in time
for , and where we used the assumption that is non-decreasing in . If then such conversions are actually for free. Otherwise we have , whence
Now take and let us prove the lemma by induction over . For we have nothing to do, so we assume that . Using the induction hypothesis, conversions between and are performed in time
Consequently,
The lemma follows by induction.
In addition, if , then a randomized Las Vegas version of Algorithm 4.1 has expected cost .
Proof. The correctness of Algorithm 4.1 is ensured by Lemma 4.7. Let us analyze the cost of step 2a for a given . The necessary conversions for rewriting the defining polynomials of over can be done in time
by Lemma 4.8. If , then there is nothing to be done since . So assume that , whence . Then Proposition 2.7 gives
Consequently, in view of Corollary 4.4 and using assumption (2.2), the remainder of step 2a takes time
which dominates the cost of conversions since . The total cost for all is thus .
Finally, if , then the randomized variant from Corollary 4.4 leads to the claimed expected cost.
In addition, if a -accelerated tower representation of is known, then we have
Proof. The cost to obtain a -accelerated tower representation of is given in Proposition 4.9, namely .
Then, in order to multiply two elements in , we convert them into the accelerated representation, and multiply them with cost
by Proposition 2.4. We finally convert the product back into . By Lemma 4.8, each of these conversions can be done in time .
Proof. It is important to notice that constants hidden in the “” of Theorem 4.10 are independent of the value for , so we may freely set in terms of . Now taking balances the contributions of and . Indeed, since by Lemma 4.7, we have . We conclude by the latter theorem.
Remark
Up to now we have focused on towers of field extensions. In this section, we turn our attention to more general towers of separable integral ring extensions. Each extension ring is still assumed to be a product of separable field extensions over a common ground field , which will allow us to apply the theory from the previous sections.
We start with a description of the precise types of towers of rings that we will be able to handle. We next explain how such towers of rings can be “factored” into towers of fields and we analyze the cost of the corresponding conversions. Finally, we give a complexity bound for the computation of “tower factorizations”. We do not claim that our approach is efficient in all cases. Nevertheless it is expected to be so whenever the factorization process behaves as a pretreatment with negligible cost (or when factoring polynomials over is reasonably affordable; see section 5.5). See the conclusion of this paper for a discussion of alternative approaches.
Throughout this section, we still assume that the ground ring is a field, written , and we consider a tower of integral ring extensions of . For , we still let denote the defining polynomial of over , define , , etc. as before, and assume that . We say that is a tower of separable integral ring extensions, or just separable in short, if
is the field .
There exist polynomials and in such that , for all , where denotes the derivative of .
By induction over , one may check that each is isomorphic to a direct product of separable algebraic extensions of . The projection of into is separable in the usual sense, which means that and are coprime for all and .
In terms of the triangular set defined in the introduction, a separable tower corresponds to the situation where the ideal is radical over the algebraic closure of . In particular, this means that and therefore are uniquely determined by the variety
For each , we notice that is the projection of onto the first coordinates.
From a geometric point of view, one may regard computations with elements in as computations with all zeros in in parallel. Alternatively, we may regard them as computations with parameters subject to the polynomial constraints .
Consider two separable towers and of the same height and over the same base field . Let and denote the respective defining polynomials for and . We say that is a factor of if . This is the case if and only if there exist natural projections (that naturally extend to projections in a coefficientwise manner) such that:
divides for .
sends an element represented by to , for .
We say that is irreducible if the are all fields or, equivalently, if the are all irreducible.
The notation “()” stands for the empty tuple. Let be index sets of -tuples of integers with and
for suitable integers and . Consider a family of factor towers of with the property that only depends on for all , and write . We say that such a family of factors forms a tree factorization of if the variety is partitioned into . We say that the factorization is irreducible, if all its factor towers are irreducible. If and , then we also write for the defining polynomial of over .
It is convenient to represent such a factorization by a labeled tree (see Figure 5.1 ): the nodes are identified with the index set , and each node is labeled with the algebraic extension . The parent of the node with is simply the node . Each individual factor corresponds to a path from the root to a leaf.
|
Given and , let
Projecting the equality
on the first coordinates, we observe that the projection of , for given , is the same for all , and equals , whence . Consequently, forms a tree factorization of the subtower . From an algebraic point of view, this means that we have a natural isomorphism
For any , we denote by the natural projection of onto . Dually, the family forms a tree factorization of the tower , where
for . In particular, if , then this relation becomes
This corresponds to the factorization
where we recall that stands for the defining polynomial of for , whereas is the defining polynomial of .
If , then (5.1) reduces into the well known isomorphism
from the Chinese remainder theorem. We may thus view the isomorphism (5.1) as a generalized Chinese remainder theorem. The multi-modular representation of an element in is simply its image under this isomorphism. The aim of this section is to present efficient algorithms for conversions between the usual and the multi-modular representations.
For the usual Chinese remainder theorem, efficient algorithms are well known for carrying out the corresponding conversions [6, 23, 48]. These algorithms are based on the technique of so-called remainder trees, for which recent improvements can be found in [4, 10, 35]. In particular, if then the conversions from (5.4) can be carried out in time ; see for instance [27, Chapter 10]. These fast algorithms actually work in our context. This means that the isomorphism
from (5.2) and (5.3) can be computed with complexity , whenever modulo are precomputed for all . In fact, if are elements of with natural preimages for , then the natural preimage of can be computed as
(5.5) |
using fast “linear combination for linear moduli” [27, Algorithm 10.9].
The idea is to combine these “isomorphisms at nodes ” in order to compute the global isomorphism from (5.1). For the complexity analysis, it is convenient to introduce the following maximal normalized cost of multiplication in any of the factors of the tree factorization:
For the time being, we may use Proposition 2.4 as a complexity bound for . The conversion of elements in into elements in is called multi-modular reduction and we may use the following algorithm for this operation:
Algorithm
Output: .
Proof. The correctness is straightforward from the definitions. We perform step 4b using fast univariate multi-modular reduction. Let be a universal constant such that multi-modular reduction of a polynomial of degree over an arbitrary effective ring can be performed in time .
Let us show by induction over that the algorithm runs in time . For , the result is obvious, so assume that . By the induction hypothesis, step 3 runs in time . By the definition of , the contribution of step 4b to the complexity is bounded by
Adding up, it follows that the algorithm runs in time .
The opposite conversion of elements in into elements in is called Chinese remaindering; we may use the following algorithm for this operation:
Algorithm
Output: such that for all .
Assumption: modulo are precomputed for all , , and .
Proof. The proof is very similar to the one of Proposition 5.1. The polynomials modulo are actually used in the Chinese remaindering of step 2 via formula (5.5).
Factoring univariate polynomials over an arbitrary effective field is generally expensive or even impossible [24, 25]. Still, if an algorithm for this task is given, then it is interesting to study how to exploit it for the computation of an irreducible tree factorization of a given separable tower . Once such an irreducible tree factorization is known, arithmetic and other operations can potentially be sped up by switching to the multi-modular representation thanks to Algorithms 5.1 and 5.2.
For the complexity analysis, the function represents the time necessary to factor a polynomial in , and it is again convenient to introduce the following maximal normalized cost of factoring univariate polynomials over any of the fields in the factor towers:
Here we notice that each of the irreducible factors is a tower of separable fields, so we may directly use the accelerated arithmetic from section 4 for computations over any of the .
Algorithm
Output: the irreducible tree factorization of .
If , then return .
Recursively compute the irreducible tree factorization of .
Compute using Algorithm 5.1.
Return , where .
Proof. The cost of all factorizations in step 4a is bounded by
Together with recursive calls, the total cost of all factorizations is bounded by
Step 3 takes time , by Proposition 5.1, and the same step for the recursive calls amounts to a similar complexity. We conclude by adding up these contributions.
We are now ready to present a major corollary of this result; in combination with Algorithms 5.1 and 5.2, it reduces arithmetic in general separable towers to arithmetic in accelerated towers of fields.
Proof. This follows from Proposition 5.3 combined to Corollary 4.11, which yields .
For certain specific fields of coefficients , factorization of univariate polynomials over can be reasonably cheap. Let us briefly study two particular such cases.
Even if or if is an algebraic number field, then it may be useful to temporarily convert coefficients in into bit complex floating point numbers and convert the results of computations back using rational reconstruction techniques [27, Chapter 5].
A convenient framework for analyzing the “ultimate complexity” of numeric algorithms was introduced in [38]. Concerning the factorization of complex polynomials of degree , it was shown therein that all roots could be computed at precision in time for “sufficiently large precisions” . The actual precision from which this bound becomes relevant highly depends on the location of the roots. As a rule of thumb, a precision is often required and sufficient, but we refer the reader to the seminal works of Schönhage, Pan and others for details [47, 54].
Since is algebraically closed, towers of separable algebraic extensions can be factored into towers of degree one. From the “ultimate bit complexity” point of view from [38], it follows that the ultimate bit complexity of the conversions in Propositions 5.1 and 5.2 becomes and so does the cost of factoring itself in Proposition 5.3. Using our rule of thumb, we expect that a precision of the order of should be sufficient in practice in order to observe these complexity bounds. Small values of are thus expected to be favourable for this type of coefficients (notice that this was the least favourable case for general coefficients!). A refined comparison between the bit complexities of this approach and the building of accelerated towers would require efforts that are beyond the scope of the present paper.
This section revisits the computation of traces and characteristic polynomials in towers of field extensions over a general field by using fast products. Our algorithms are rather different from those of [52]: we compute traces faster, but for characteristic polynomials and primitive elements the bottleneck lies in the baby-step giant-step method, so the complexity exponents in terms of turn out to be the same as in [52] (notice that we gain a logarithmic factor ). Until the end of this section, is a tower of separable field extensions of degrees and is as in Proposition 2.4.
Let us recall how to compute traces in a tower of separable field extensions. We let represent the reverse polynomial of . Its constant coefficient is and the trace function of over may be computed as the first terms of the power series
In other words, the trace map can be regarded as a vector in that can be computed in time : indeed this vector represents the matrix of with respect to the basis . Given such a vector representation, the individual evaluation of at an element in takes time .
More generally, is a -algebra for all , and the relative trace function satisfies
Proof. First we compute , as a vector in , for . This computation takes time . We regard each as a -linear map from to , whose evaluation takes time . Regarding as a vector , we apply the transpose of to in order to obtain a new vector that represents . This computation takes time thanks to the transposition principle (as explained in [10]). We next apply the transpose of to and obtain a vector that represents . Continuing this way, we obtain as a vector in time .
We may take advantage of accelerated towers to compute a primitive element over , together with its minimal polynomial and the parametrizations with for . We first consider the computation of characteristic polynomials.
Proof. Lemma 6.1 and Proposition 2.4 allow us to compute the representation of as a vector in in time
Using Lemma 4.8, we convert into an element of in time . By using the transposed product in the vector representation of the linear form can be computed in time . Then, as in section 1.2, the computation of
is achieved by transposing Algorithm 3.1, so Proposition 3.1 yields the cost
We finally use the Newton–Girard identity (1.1) to recover the characteristic polynomial of with further cost ; see for instance [29, section 2.4]. The conclusion follows.
By taking as in Corollary 4.11, the expected cost of Theorem 6.2 simplifies to , thanks to the assumption .
We are now in a position to design a randomized algorithm with subquadratic expected cost for computing a primitive element representation of over .
Taking as in Corollary 4.11, this cost further simplifies into .
Proof. The characteristic polynomial of has total degree in . So its discriminant is a polynomial in of total degree . By the Schwartz–Zippel lemma [55, 57], picking at random with the in a set of size leads to a primitive element with probability . Computing the characteristic polynomial of takes time by Theorem 6.2. We can verify that is a primitive element by checking that its characteristic polynomial is separable with further cost . Consequently the expected time to find a primitive element is .
Now assume that a primitive element has been found, and let us consider the computation of polynomials such that for . We follow Kronecker's classical deformation technique of the primitive element; see for instance [28, section 3.3]. For this purpose, we first introduce a new formal indeterminate with , so that is the ring of tangent numbers over . We aim at computing the characteristic polynomial of over . Then
Notice that is already available at this point. The computation of the vector representation of the linear form can be done in time by the transposition principle. Therefore is obtained by transposing Algorithm 3.1. In view of Proposition 3.1, this computation takes time
At this point, we have computed the traces for all . We deduce the characteristic polynomial of using Newton–Girard's formula (1.1), so that
It follows that . The total cost to obtain all the for is
Now let be an element of . It can be converted into an element of in time O() by Lemma 4.8, where has partial degree in for . The expression of in terms of is obtained as in time by Proposition 3.3. By successively taking for , the polynomials can all be obtained in time
Finally, thanks to , the assumption implies the simplified complexity bound.
We have proved nearly optimal complexity bounds for basic arithmetic operations in towers of fields of arbitrary height. Our results are based on the newly introduced concept of “accelerated” towers. Two major problems remain open:
Do there exist algorithms of quasi-linear complexity instead of ?
It is well known that this is indeed the case for towers of bounded height : see Propositions 2.4 and 2.7. In section 5.5, we have singled out a few other situations when quasi-linear algorithms do exist. We refer to [7] for yet a few other examples.
Can one efficiently construct accelerated towers for general separable towers, without relying on univariate polynomial factorization as in section 5?
One obvious strategy is to use the dynamic evaluation paradigm [19, 20] that we already discussed in the introduction, while controlling complexity issues using techniques from [16]. We do not see any serious obstacle to a positive answer, but the technical details are beyond the scope of this paper. We intend to come back to this problem in a future work.
Another major challenge concerns efficient implementations. Asymptotically fast algorithms based on Proposition 2.4 obviously only become interesting when is in the range where evaluation-interpolation style algorithms are used for univariate arithmetic. For instance, FFT-based algorithms are typically used for .
For such sufficiently large , we expect our acceleration techniques to be useful. For instance, assume that and are both small, say . Then conversions between and an equivalent primitive representation can be done very fast. Conversions between and can be done coefficientwise, so they are also fast. Without much cost, this allows us to diminish the height by one and gain a constant factor when applying Proposition 2.4.
This discussion indicates that our techniques should be of practical interest for sufficiently large and generic base fields . For specific base fields and specific types of towers, various alternative approaches have been proposed [17, 38, 39], and it would require a more significant implementation effort to figure out which approaches are best; we intend to address to this issue in the near future.
Ph. Aubry, D. Lazard, and M. Moreno Maza. On the theories of triangular sets. J. Symbolic Comput., 28(1–2):105–124, 1999.
S. Baktir, J. Pelzl, T. Wollinger, B. Sunar, and C. Paar. Optimal tower fields for hyperelliptic curve cryptosystems. In Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004, volume 1, pages 522–526. IEEE, 2004.
N. Benger and M. Scott. Constructing tower extensions of finite fields for implementation of pairing-based cryptography. In M. Anwar Hasan and T. Helleseth, editors, Arithmetic of Finite Fields. Third International Workshop, WAIFI 2010. Istanbul, Turkey, June 27-30, 2010. Proceedings, volume 6087 of Lecture Notes in Comput. Sci., pages 180–195. Springer-Verlag Berlin Heidelberg, 2010.
D. Bernstein. Scaled remainder trees. Available from https://cr.yp.to/arith/scaledmod-20040820.pdf, 2004.
J. Berthomieu, G. Lecerf, and G. Quintin. Polynomial root finding over local rings and application to error correcting codes. Appl. Algebra Engrg. Comm. Comput., 24(6):413–443, 2013.
A. Borodin and R. T. Moenck. Fast modular transforms. J. Comput. System Sci., 8:366–386, 1974.
A. Bostan, M. F. I. Chowdhury, J. van der Hoeven, and É. Schost. Homotopy methods for multiplication modulo triangular sets. J. Symbolic Comput., 46(12):1378–1402, 2011.
A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, É. Schost, and A. Sedoglavic. Fast computation of power series solutions of systems of differential equations. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, pages 1012–1021, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.
A. Bostan, L. Gonzales-Vega, H. Perdry, and É. Schost. Complexity issues on Newton sums of polynomials. Distributed in the digital proceedings of MEGA'05, 2005.
A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.
A. Bostan, B. Salvy, and É. Schost. Fast algorithms for zero-dimensional polynomial systems using duality. Appl. Algebra Engrg. Comm. Comput., 14(4):239–272, 2003.
F. Boulier, D. Lazard, F. Ollivier, and M. Petitot. Computing representations for radicals of finitely generated differential ideals. Appl. Algebra Engrg. Comm. Comput., 20(1):73–121, 2009.
R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.
D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Inform., 28:693–701, 1991.
X. Dahan, M. Moreno Maza, É. Schost, and Yuzhen Xie. On the complexity of the D5 principle. In J.-G. Dumas, editor, Proceedings of Transgressive Computing 2006: a conference in honor of Jean Della Dora, pages 149–168. U. J. Fourier, Grenoble, France, 2006.
L. De Feo, J. Doliskani, and É. Schost. Fast algorithms for -adic towers over finite fields. In Proceedings of the 38th International Symposium on Symbolic and Algebraic Computation, ISSAC '13, pages 165–172, New York, NY, USA, 2013. ACM.
L. De Feo and É. Schost. Fast arithmetics in Artin–Schreier towers over finite fields. J. Symbolic Comput., 47(7):771–792, 2012.
J. Della Dora, C. Dicrescenzo, and D. Duval. About a new method for computing in algebraic number fields. In B. F. Caviness, editor, EUROCAL '85: European Conference on Computer Algebra Linz, Austria, April 1–3 1985 Proceedings Vol. 2: Research Contributions, pages 289–290. Springer Berlin Heidelberg, 1985.
D. Duval. Algebraic numbers: An example of dynamic evaluation. J. Symbolic Comput., 18(5):429–445, 1994.
J.-C. Faugère, P. Gaudry, L. Huot, and G. Renault. Sub-cubic change of ordering for Gröbner basis: a probabilistic approach. In K. Nabeshima, editor, Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 170–177, New York, NY, USA, 2014. ACM.
J.-C. Faugère, P. Gianni, D. Lazard, and T. Mora. Efficient computation of zero-dimensional Gröbner bases by change of ordering. J. Symbolic Comput., 16(4):329–344, 1993.
C. M. Fiduccia. Polynomial evaluation via the division algorithm: the fast Fourier transform revisited. In A. L. Rosenberg, editor, Fourth annual ACM symposium on theory of computing, pages 88–93, 1972.
A. Fröhlich and J. C. Shepherdson. On the factorisation of polynomials in a finite number of steps. Math. Z., 62:331–334, 1955.
A. Fröhlich and J. C. Shepherdson. Effective procedures in field theory. Philos. Trans. Roy. Soc. London. Ser. A., 248:407–432, 1956.
A. Garcia and H. Stichtenoth. A tower of Artin–Schreier extensions of function fields attaining the Drinfeld–Vladut bound. Invent. Math., 121(1):211–222, 1995.
J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 3rd edition, 2013.
M. Giusti, G. Lecerf, and B. Salvy. A Gröbner free alternative for polynomial system solving. J. Complexity, 17(1):154–211, 2001.
B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Algebra Engrg. Comm. Comput., 27(3):237–257, 2016.
D. Harvey and J. van der Hoeven. Faster integer and polynomial multiplication using cyclotomic coefficient rings. Technical report, ArXiv, 2017. http://arxiv.org/abs/1712.03693.
D. Harvey and J. van der Hoeven. Faster integer multiplication using short lattice vectors. Technical report, ArXiv, 2018. http://arxiv.org/abs/1802.07932, to appear in Proc. ANTS 2018.
D. Harvey and J. van der Hoeven. Faster integer multiplication using plain vanilla FFT primes. Math. Comp., 88(315):501–514, 2019.
D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication. J. Complexity, 36:1–30, 2016.
D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://hal.archives-ouvertes.fr/hal-01403810.
J. van der Hoeven and G. Lecerf. On the complexity of multivariate blockwise polynomial multiplication. In J. van der Hoeven and M. van Hoeij, editors, Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation, ISSAC '12, pages 211–218. ACM, 2012.
J. van der Hoeven and G. Lecerf. On the bit-complexity of sparse polynomial and series multiplication. J. Symbolic Comput., 50:227–254, 2013.
J. van der Hoeven and G. Lecerf. Modular composition via complex roots. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01455731.
J. van der Hoeven and G. Lecerf. Modular composition via factorization. J. Complexity, 48:36–68, 2018.
Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.
M. Kalkbrener. A generalized Euclidean algorithm for computing triangular representations of algebraic varieties. J. Symbolic Comput., 15(2):143–167, 1993.
K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
U. J. J. Le Verrier. Sur les variations séculaires des éléments elliptiques des sept planètes principales : Mercure, Venus, La Terre, Mars, Jupiter, Saturne et Uranus. J. Math. Pures Appli., 1:220–254, 1840.
R. Lebreton. Relaxed Hensel lifting of triangular sets. J. Symbolic Comput., 68(2):230–258, 2015.
G. Lecerf. On the complexity of the Lickteig–Roy subresultant algorithm. J. Symbolic Comput., 92:243–268, 2019.
X. Li, M. Moreno Maza, and É Schost. Fast arithmetic for triangular sets: from theory to practice. J. Symbolic Comput., 44(7):891–907, 2009.
J. M. McNamee and V. Y. Pan. Numerical Methods for Roots of Polynomials, Part II, volume 16 of Studies in Computational Mathematics. Elsevier, 2013.
R. T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.
M. Moreno Maza, É. Schost, and P. Vrbik. Inversion modulo zero-dimensional regular chains. In V. P. Gerdt, W. Koepf, E. W. Mayr, and E. V. Vorozhtsov, editors, Computer Algebra in Scientific Computing. 14th International Workshop, CASC 2012. Maribor, Slovenia, September 2012. Proceedings, volume 7442 of Lect. Notes Comput. Sci., pages 224–235. Springer Berlin Heidelberg, 2012.
M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J. Comput., 2(1):60–66, 1973.
A. Poteaux and É. Schost. Modular composition modulo triangular sets and applications. Comput. Complex., 22(3):463–516, 2013.
A. Poteaux and É. Schost. On the complexity of computing with zero-dimensional triangular sets. J. Symbolic Comput., 50:110–138, 2013.
J. F. Ritt. Differential algebra. Amer. Math. Soc., New York, 1950.
A. Schönhage. The fundamental theorem of algebra in terms of computational complexity. Technical report, Math. Inst. Univ. of Tübingen, 1982.
J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. J. ACM, 27(4):701–717, 1980.
V. Shoup. Fast construction of irreducible polynomials over finite fields. J. Symbolic Comput., 17(5):371–391, 1994.
R. Zippel. Probabilistic algorithms for sparse polynomials. In Edward W. Ng, editor, Symbolic and Algebraic Computation. EUROSAM '79, An International Symposium on Symbolic and Algebraic Manipulation, Marseille, France, June 1979, number 72 in Lect. Notes Comput. Sci., pages 216–226. Springer Berlin Heidelberg, 1979.