Composition modulo powers of polynomials

Modular composition is the problem to compose two univariate polynomials modulo a third one. For polynomials with coefficients in a finite field, Kedlaya and Umans proved in 2008 that the theoretical bit complexity for performing this task could be made arbitrarily close to linear. Unfortunately, beyond its major theoretical impact, this result has not led to practically faster implementations yet. In this paper, we study the more specific case of composition modulo the power of a polynomial. First we extend previously known algorithms for power series composition to this context. We next present a fast direct reduction of our problem to power series composition.

Keywords

1.Introduction

Let

be an effective field and let

be polynomials in

of degrees

. The problem of modular composition is to compute

modulo

. This is a fundamental problem in complexity theory because of its applications to polynomial factorization [16-18].

From a theoretical point of view, Kedlaya and Umans achieved a major breakthrough by showing that modular compositions over a finite field

can be computed in time

. Unfortunately, the practical impact of these results has been limited so far. The best current implementations still admit nearly quadratic complexities in

. More precisely, over a field

, denoting by

the cost to multiply two polynomials in

of degrees

, Brent and Kung [3] gave an algorithm with cost

. It uses the baby-step giant-step technique due to Paterson–Stockmeyer [21] and even yields a sub-quadratic cost when using fast linear algebra (see [15, p. 185] and our Theorem 2 below).

In the special case when

, composition modulo

corresponds to the composition of truncated power series at order

. Better practical algorithms are known for this particular situation. Brent and Kung gave an algorithm [3] with complexity

under the conditions that

is invertible and that the characteristic is at least

. A variant proposed by van der Hoeven [11, Section 3.4.3] removes the condition on

. For fields of small positive characteristic, Bernstein [1] proposed an algorithm that is softly linear in the precision

but linear in the characteristic. Series with integer, rational or floating point coefficients can often be composed in quasi-linear time as well in suitable bit complexity models, as shown by Ritzmann [22]; see also [12].

In this paper, we focus on the intermediate case between power series and general modular compositions. More precisely, we assume

to be a power

of some polynomial

. Composition modulo

can also be regarded as a special “base” case of composition modulo a polynomial

that can be factored into

. Using Chinese remaindering, one may reduce composition modulo such

to compositions modulo powers of irreducible polynomials. In a separate paper [13], we exploit this observation for the design of fast practical (although not fully general) algorithms for modular composition over finite fields.

We develop two main approaches for fast composition modulo

. In section 3, we generalize Brent and Kung's and Bernstein's algorithms. We achieve a complexity exponent close to

, but only under the condition that

is much larger than

. In section 4, we directly reduce composition modulo

to truncated power series composition. Our main result is Theorem 11: one composition modulo

is essentially reduced to

compositions modulo

and one power series composition in

. An important ingredient of this reduction is a softly optimal algorithm for converting between

and

where

is sent to

. Potential applications of these conversions might also concern algebraic algorithms for plane curves such as integral basis computations and Puiseux expansions, which usually require several Taylor expansions at roots of discriminants.

Our paper is organized as follows. In section 2, we introduce basic notations and recall several well known complexity results. We also specify the complexity model that we work with. Further technical complexity results are gathered in the appendix. In sections 3 and 4, we present our main algorithms. For testing purposes, our algorithms are implemented in the Mathemagix language within the factorix library (www.mathemagix.org, revision 10538).

2.Preliminaries

2.1Costs of elementary operations

Recall that an effective ring is a ring

with unity whose elements can be represented on a computer and such that we have algorithms for performing the ring operations. Effective fields

and effective algebras over an effective ring are defined similarly.

Given an effective ring

, algebraic complexity models express running times in terms of the number of ring operations in

. Unless otherwise stated, we will analyze the costs of the algorithms in this paper in this way. More precisely, our results both apply for the straight-line program and computation tree models [4, Chapter 4].

We write

for a cost function such that two polynomials in

can be multiplied using

operations in

. It will be convenient to assume that

is a nondecreasing function in

. Notice that this assumption implies the super-additivity of

, namely

for all

and

The schoolbook algorithm allows us to take

. The fastest currently known algorithm [5] yields

. If

is a field of finite characteristic, then we even have

, where

[10].

The constant

represents a feasible exponent for the multiplication cost of matrices: two square matrices of size

can be multiplied using

operations in their coefficient ring. The constant

is defined similarly but for multiplying a

matrix by a

rectangular one. The best known bound

is due to Le Gall [19]. This naturally yields

. However the latter bound does not improve upon the earlier bound

due to Huang and Pan [14, Theorem 10.1].

2.2Problems related to modular composition

and

are polynomials in

, then

represents the remainder in the division of

. Let

be an effective ring and

be an effective

-algebra. We introduce the following cost functions:

: the cost of computing the power series composition , where .
: the cost of computing the power series composition in terms of operations in , where .
: the cost of computing the modular composition , where is a monic polynomial of degree and .
: the cost of computing the characteristic polynomial of modulo a monic polynomial of degree . This is the characteristic polynomial of the multiplication endomorphism by in .
: the cost of modular power projections, i.e. the cost to compute , where is monic of degree and is a linear form on .

Theorem 2. Let be a monic polynomial of degree over a ring and let . The composed polynomial may be computed with

Proof. The first bound is immediate. The proof of the second bound is detailed in [8, Section 12.2]. The bound

follows from

by taking

For a fixed monic polynomial

of degree

, the modular composition

is a linear operation in

, for

and

. The corresponding transposed application is precisely the operation of modular power projections: it corresponds to computing

, where

is a linear form on

. If a modular composition algorithm with cost

can be transposed in the sense of [2], then this leads to a power projection algorithm with cost

Theorem 3. Let be a monic polynomial of degree over a ring and let . The characteristic polynomial of modulo can be computed using

Proof. The first two bounds directly rely on the formula

by using [20, Corollary 29]. The third bound is obtained by computing

for

values of

, from which

is interpolated using

additional operations in

. We refer to Appendix A.1 for the fourth bound.

For the particular kind of moduli

that interest us in here, the problem of computing characteristic polynomials quite easily reduces to the case when

Proposition 4. Let be of degree in , let with , and let be in . Then the characteristic polynomial of modulo can be computed using operations in .

Proof. The characteristic polynomial

is simply the

-th power of the characteristic polynomial of

modulo

3.Extending known algorithms

From now on, we assume the modulus

to be the

-th power of a polynomial

of degree

. In this section we extend the two fast known algorithms for power series composition from Theorem 1 to composition modulo

. We directly state and prove the generalized algorithms. For more explanations we refer to the original papers [1, 3] and [11, Section 3.4.3].

3.1Extension of Bernstein's algorithm

has positive characteristic

, then Bernstein's algorithm [1] exploits the Frobenius map in order to recursively reduce the degrees of the polynomials to be composed by a factor

Input
Output: .
Assumptions: , , .

If , then return , computed by a general modular composition algorithm of cost .
Split into polynomials such that . Let and , be such that and .
Recursively compute for all .
Return .

In step 2 we need:

operations in

to compute all the necessary

-th powers,

for

, and

for the division to obtain

. In step 4, by [8, Exercise 9.16], the computation of

for

may be done in time

Let

be the time complexity for precision

and let

be such that

. By what precedes, we have

for all

and a sufficiently large constant

. Unrolling this inequality

times, we obtain

3.2Extension of Brent and Kung's algorithm

We now extend Brent and Kung's series composition algorithm to composition modulo

. This method uses Taylor series expansion in order to reduce the degrees of the polynomials to be composed. For a given polynomial

, we write

for its

-th derivative. Algorithm 2 is a sub-algorithm that is used when the arguments of the composition both have small degree.

Input
Output: .

If then return .
Split with and .
Recursively compute for ,
Return .

Input
Output: .
Assumption: has characteristic , where and ; , is separable and irreducible, .

Compute and .
If then compute the valuation of in and the modular inverse .
Compute using Algorithm 2.
If then compute for all from to . Otherwise, for from to , compute .
Return .

which makes use of

and

. If

then

holds for all

, so the algorithm is correct. Otherwise we may write

with

and

of degree

, which implies

. Since

and

separable, it follows that

, whence

is well defined.

Let us show that

, by induction on

. This is clear for

. Assume that the identity holds for some

. From

we see that the valuation of

is necessarily

and that

. The induction hypothesis is thus satisfied for

because

. We are done with the correctness.

The

-adic expansion of

takes

operations and dominates the cost of steps 1 and 2. Steps 4 and 5 amount to

operations. As to step 3, we enter Algorithm 2 with

for some constant

. Let

be the largest integer such that

. Unrolling

times the inequality for

and

, we obtain

Corollary 7. Let be of degree in , let be of degree , and let be in . If , then the modular composition may be computed using operations in .

Proof. If

, then we may use Algorithm 1, which requires

operations in

. Otherwise, we use Algorithm 3.

Notice that for fixed

, this corollary gives a cost

, independently of the characteristic.

4.Reduction to series composition

Recall that the modulus

is the

-th power of a polynomial

of degree

. From now on, we assume that

is separable. So far, our algorithms for composition modulo

relied on extensions of known algorithms dedicated to power series. In this section, we investigate the other natural approach: we design algorithms that reduce one composition modulo

to one series composition of order

over a suitable extension of the ground field. We set

and write

for the residue class of

. The associated trace function is denoted by

. This function naturally extends to a map

in a coefficientwise fashion.

4.1Case of small moduli

The first method we study is rather elementary. It works well for any characteristic, but it is only efficient when

is much smaller than

. We compute

for all roots

and recover

by Chinese remaindering. The simultaneous computation with all roots of

is emulated via standard algebraic techniques.

Since

is not assumed to be irreducible,

is not necessarily a field. We will compute inverses in

using the technique of dynamic evaluation, which is briefly recalled in Appendix A.2. For convenience of the reader, we use the auxiliary variable

for power series and keep the variable

for polynomials.

Input
Output: .
Assumptions: , is separable, .

Compute in .
Compute in .
Perform the power series composition

in .
Compute in .
With the algorithm underlying Corollary 20, compute the modular inverse .
Compute

and then .
Return .

Proof. For a root

in some algebraic closure

, we write

for the homomorphism from

that sends

. This homomorphism extends coefficientwise into a map

. One verifies that

equals

for all roots

. This proves the correctness of the algorithm thanks to the Chinese remainder theorem.

For the complexity analysis, step 3 amounts to

operations in

. In step 5, we use Corollary 20 to compute

with

operations in

. Steps 1, 2, 4, and 6 require

additional ring operations in

. In step 7, the row matrix of

may be computed with

operations in

by formula (20), so each trace takes

operations in

. Therefore step 7 amounts to

operations in

, which is negligible.

When

is much smaller than

, we may directly benefit from fast power series composition. But overall this does not improve much on Corollary 7. In the rest of this section, we aim at decreasing the dependency in

in complexity bounds.

4.2Fast reduction to series composition

In order to improve on Algorithm 4, we shall use a better way to expand

and

modulo

. For this purpose we introduce the

-algebra homomorphism

We regard

and

as conversions that we respectively call the untangling and tangling mappings.

Lemma 9. For all separable polynomial and all , the map is a -algebra isomorphism.

Proof.

is clearly a homomorphism between

-algebras of equal dimensions, so we need to prove its injectivity when

is separable. We consider the

-adic expansion

modulo

, where all the

have degrees

, and we assume that

. It is immediate that

. We may thus suppose, by induction on

, that

for all

. From

we obtain

. Since

is invertible, it follows that

, whence

. This completes the induction.

Input
Output: .
Assumptions: , is separable, .

Compute .
Let and compute the characteristic polynomial of modulo .
Compute .
Write , where is the class of in , and the have degrees . For each , compute and let .
Perform the power series composition in .
Return .

From now on, in cost analyses, the expression

(resp.

) represents a cost function for computing the direct image (resp. the preimage) of

. If good algorithms are known for

and

, for characteristic polynomials and for compositions modulo

, then we observe that Algorithm 5 boils down to one power series composition at precision

over

Proof. The homomorphism

that sends

is well defined, so applying it to

we obtain

. Therefore we have

, which ensures the correctness of the algorithm. The cost directly follows from the definitions.

The rest of this section is devoted to algorithms with softly linear costs for computing

and its inverse. More precisely, we will prove that we may take

and

to be

in the latter proposition. This leads to our main theorem:

Theorem 11. Let and , where , such that is separable of degree . Then, we can compute using

4.3Untangling in characteristic zero

We begin with the easiest situation when the characteristic

is zero or sufficiently large and achieve softly linear time for

Input
Output: .
Assumptions: , is separable, .

If then return , otherwise let .
Recursively apply the algorithm to , giving .
Recursively apply the algorithm to , giving .
Return .

Proposition 13. Algorithm 6 is correct and takes operations in . If or , then, given , one may compute using operations in .

Proof. The correctness follows from Lemma 12 which ensures that

for all

and

for all

. The cost analysis is straightforward. Finally, to deduce

, thanks to the assumption on the characteristic, we may simply use the Taylor expansion

4.4Untangling in positive characteristic

In order to handle the case

, we need to pay attention to vanishing coefficients when computing Taylor expansions using higher order derivatives. For this, we rely on the following adaptation of the Taylor expansion of

The polynomial

is called the

-th order Hasse derivative of

. The operator

is linear and we may compute it as follows: if

is a monomial and

, then

; otherwise

Now if the characteristic

divides the integer

, then it also divides

, and the lemma remains correct.

The next lemma expresses conditions to compute Hasse derivatives from ones of lower orders, in positive characteristic.

Lemma 15. Let be an integer power of , let be an integer such that , and let be two integers in . Then, for any polynomial , we may compute from using the formula

for all

. In positive characteristic, we claim that the ratios

have non-negative valuations in

Let us first consider the case when

. If

, with

, then

. If

, then

. This shows that

is well defined and may be computed in

whenever

is a multiple of

At a second stage, we make the induction hypothesis that

has been computed in

for

such that

. The value for

may then be obtained via

We are now in a position to adapt the “divide and conquer” Algorithm 6 to the case when

has small positive characteristic

Input
Output: , for all .
Assumption: has positive characteristic ; is a power of , ; ; .

If , then return .
Let .
Compute .
By using formula (1), compute from .
Recursively apply the algorithm to , , , , , and . This yields for all .
Recursively apply the algorithm to , , , , , , and . This yields , for all .
Return , for all .

Proof. The correctness follows from Lemmas 12 and 15; the cost analysis is straightforward.

Input
Output: .
Assumptions: has positive characteristic ; ; ; .

If , then use Algorithm 6 to compute and return .
Let be minimal with , let , and .
Compute for all using Algorithm 7 (called with , , and ).
If , then compute from with formula (1).
For all , use the algorithm recursively with in order to obtain .
If then use the algorithm recursively with in order to obtain .
Return .

Proof. Let us first examine the case when

. We need to show that

coincides with

for all

. From Lemma 14, we have

. Since

, it follows that

Now we suppose that

. We have

, so

is a power of

, we have

, and the assumption

is preserved through recursive calls. The conditions of Algorithm 7 are satisfied, whence the correctness of step 3 by Proposition 16. Step 4 is a consequence of Lemmas 12 and 15. We are done with the correctness.

Step 1 costs

by Proposition 13. Step 3 costs

by Proposition 16 and step 4 takes

operations in

. The total cost function

of the algorithm satisfies

4.5Tangling

The next algorithm is independent of the characteristic. It relies on the untangling algorithms from the previous subsections, by using the “divide and conquer” strategy. For a polynomial or series

and

, we denote

and

. For convenience, we assume that

is a nondecreasing function of

Input
Output: .

If , then let be the preimage of in and return .
Let and compute , .
Return .

Lemma 18. Algorithm 9 is correct and takes time . In particular, for any , we have

Kronecker substitution [8, Chapter 8, Section 4] allows us to multiply two series in

at precision

using

operations in

The computation of

requires one inversion in

that takes

operations in

, together with

operations for the series inversion. The other

amount to

more operations in

. The cost function

satisfies

for some constant

, which leads to

. The final bound of the lemma follows from Proposition 17.

Appendix A

A.1Trace and Newton–Girard identities

Let

be a monic polynomial in

of degree

, let

be a linear form on

, and let

. We study the modular power projection problem, which corresponds to computing

Let

represent

. If we take

, where

denotes the usual trace function of

over

, then the latter power projections write as

It is well known that these traces are related to the characteristic polynomial of

modulo

by the Newton–Girard identities:

where

is the reverse polynomial of

. If there exist given inverses of

, then it is possible to integrate the latter differential equation in

with

operations in

, which directly leads to an algorithm to compute

from the power sums of the roots of

(see for instance [9, Section 2] for details, which also cover the case when

is a finite field).

First, let us explain how to compute

efficiently. Recall that

. In the basis

, the linear form

may be computed as

which uses

operations in

. If

represents the multiplication endomorphism by

, then we can write

and therefore

. By transposing the modular multiplication by

with the techniques from [2, Sections 4 and 5], the computation of

requires

operations in

A.2Modular inversions

Let

be a field. Often in computer algebra, we are led to compute in monogen algebras such as

, where

is separable of degree

, but not necessarily irreducible. Ring operations in

require

operations in

. Algorithms such as the determinant are often slower over rings than their counterparts over fields. For this reason, we often like to conduct our computations as if

were a field. From the programming point of view, this is easy to achieve: whenever we need to invert an element

or test whether

, we factor

, where

and

is invertible modulo

. We then restart the relevant part of the computations with

and/or

in the role of

. This approach is known as the dynamic evaluation principle in computer algebra [6, 7]. We borrow the following complexity result for extended gcds from Dahan et al. [6]:

Proposition 19. Let be a separable polynomial of degree over and let and be univariate polynomials over of degrees . Using operations in , one can compute a factorization of and triples of polynomials respectively over , such that , is monic, generates the extension of the ideal (,) to , and the Bézout relation holds with and , for all .

Proof. This statement rephrases [6, Propositions 2.4 and 4.1], where we replace the assumption “

is square-free over a perfect field” by “

is separable”. In fact all arguments of [6] apply in this setting mutatis mutandis. An alternative point of view is to apply [6, Propositions 2.4 and 4.1] over the algebraic closure of

, while noticing that all actual computations are really done over

Corollary 20. Let be a separable polynomial of degree over and let and be univariate polynomials over of degrees , such that is monic and is invertible modulo . Then the inverse of modulo may be computed using operations in .

Proof. The triples

of the previous proposition satisfy

over

. We recover the inverse

modulo

from the

using the Chinese remainder theorem.

5.References

[1]: D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.
[2]: A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.
[3]: R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
[4]: P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.
[5]: D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.
[6]: X. Dahan, M. Moreno Maza, É. Schost, and Yuzhen Xie. On the complexity of the D5 principle. In J.-G. Dumas, editor, Proceedings of Transgressive Computing 2006: a conference in honor of Jean Della Dora, pages 149–168. U. J. Fourier, Grenoble, France, 2006.
[7]: J. Della Dora, C. Dicrescenzo, and D. Duval. A new method for computing in algebraic number fields. In G. Goos and J. Hartmanis, editors, Eurocal'85 (2), volume 174 of Lect. Notes in Comp. Science, pages 321–326. Springer, 1985.
[8]: J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 2 edition, 2003.
[9]: B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Alg. Eng. Comm. Comp., 27(3):237–257, 2016.
[10]: D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
[11]: J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.
[12]: J. van der Hoeven. Fast composition of numeric power series. Technical Report 2008-09, Université Paris-Sud, Orsay, France, 2008.
[13]: J. van der Hoeven and G. Lecerf. Modular composition via factorization. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01457074.
[14]: Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.
[15]: E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.
[16]: E. Kaltofen and V. Shoup. Subquadratic-time factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.
[17]: K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.
[18]: K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
[19]: F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 296–303, New York, NY, USA, 2014. ACM.
[20]: G. Lecerf. On the complexity of the Lickteig-Roy subresultant algorithm. Technical report, CNRS & École polytechnique, 2017. https://hal.archives-ouvertes.fr/hal-01450869.
[21]: M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.
[22]: P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.

			(2)

Abstract

Keywords

1.Introduction

2.Preliminaries

2.1Costs of elementary operations

2.2Problems related to modular composition

3.Extending known algorithms

3.1Extension of Bernstein's algorithm

3.2Extension of Brent and Kung's algorithm

4.Reduction to series composition

4.1Case of small moduli

4.2Fast reduction to series composition

4.3Untangling in characteristic zero

4.4Untangling in positive characteristic

4.5Tangling

Appendix A

A.1Trace and Newton–Girard identities

A.2Modular inversions

5.References