Joris van der
Hoevena, Robin
Larrieub
CNRS, Laboratoire d'informatique
École polytechnique
91128 Palaiseau Cedex
France
a. Email:
vdhoeven@lix.polytechnique.fr
b. Email:
larrieu@lix.polytechnique.fr
The Frobenius FFT
November 3, 2023
Let be the finite field with
elements and let
be a primitive
-th root of unity in an extension field
of
. Given
a polynomial
of degree less than
, we will show that its discrete Fourier
transform
can be computed essentially
times faster than the discrete Fourier transform of a
polynomial
of degree less than
, in many cases. This result is achieved by
exploiting the symmetries provided by the Frobenius automorphism of
over
.
FFT; finite field; complexity bound; Frobenius automorphism
Let and let
be a
primitive
-th root of unity
in
. The traditional
algorithm for computing discrete Fourier transforms [7]
takes a polynomial
of degree
on input and returns the vector
.
If
actually admits real coefficients, then we
have
for all
,
so roughly
of the output values are superfluous.
Because of this symmetry, it is well known that such “real
FFTs” can be computed roughly twice as fast as their complex
counterparts [3, 25].
More generally, there exists an abundant literature on the computation of FFTs of polynomials with various types of symmetry. Crystallographic FFT algorithms date back to [26], with contributions as recent as [20], but are dedicated to crystallographic symmetries. So called lattice FFTs have been studied in a series of papers initiated by [11] and continued in [27, 4]. A more general framework due to [1] was recently revisited from the point of view of high-performance code generation [18]. Further symmetric FFT algorithms and their truncated versions were developed in [16].
In this paper, we focus on discrete Fourier transforms of polynomials
with coefficients in the finite field with
elements. In general, primitive
-th roots of unity
only
exist in extension fields of
,
say
. This puts us in a
similar situation as in the case of real FFTs: our primitive root of
unity
lies in an extension field of the
coefficient field of the polynomial. This time, the degree of the
extension is
instead of
. As in the case of real FFTs, it is natural to
search for algorithms that allow us to compute the DFT of a polynomial
in
approximately
times
faster than the DFT of a polynomial in
.
The main purpose of this paper is to show that such a speed-up by an
approximate factor of can indeed be achieved.
The idea is to use the Frobenius automorphism
as
the analogue of complex conjugation. If
,
then we will exploit the fact that
for all
and
. This
means that it suffices to evaluate
at only one
element of each orbit for the action of the Frobenius automorphism on
the cyclic group
generated by
. We will give an efficient algorithm for doing
this, called the Frobenius FFT.
Our main motivation behind the Frobenius FFT is the development of
faster practical algorithms for polynomial multiplication over finite
fields of small characteristic. In [12], we have shown how
to reduce multiplication in to multiplication in
, while exploiting the fact
that efficient DFTs are available over
.
Unfortunately, this reduction involves an overhead of a factor
for zero padding. The Frobenius FFT can be thought of as a
more direct way to use evaluation-interpolation strategies over
for polynomial multiplication over
. We have not yet implemented the algorithms in
the present paper, but the ultimate hope is to make such multiplications
twice as efficient as in [12].
Let us briefly detail the structure of this paper. In section 2, we first recall some standard complexity results for finite field arithmetic. We pursue with a quick survey of the fast Fourier transform (FFT) in section 3. The modern formulation of this algorithm is due to Cooley and Tukey [7], but the algorithm was already known to Gauß [9].
In section 4, we introduce the discrete Frobenius Fourier transform (DFFT) and describe several basic algorithms to compute DFFTs of prime order. There are essentially two efficient ways to deal with composite orders. Both approaches are reminiscent of analogue strategies for real FFTs [25], but the technical details are more complicated.
The first strategy is to reduce the computation of
discrete Fourier transforms of polynomials
of
degree
to a single discrete Fourier transform of
a polynomial
of degree
. This elegant reduction is the subject of section
5, but it is only applicable if we need to compute many
transforms. Furthermore, at best, it only leads to the gain of a factor
instead of
,
due to numerous applications of the Frobenius automorphism
.
The second strategy, to be detailed in section 6, is to
carefully exploit the symmetries in the usual Cooley–Tukey FFT; we
call the resulting algorithm the Frobenius FFT (FFFT). The
complexity analysis is more intricate than for the algorithm from
section 5 and the complete factor
is only regained under certain conditions. Fortunately, these conditions
are satisfied for the most interesting applications to polynomial
multiplication in
or when
becomes very large with respect to
.
In this paper, we focus on direct DFFTs and FFFTs. Inverse FFFTs can be computed in a straightforward way by reversing the main algorithm from section 6. Inverting the various algorithms from section 4 for DFFTs of prime orders requires a bit more effort, but should not raise any serious difficulties. The same holds for the algorithms to compute multiple DFFTs from section 5.
Besides inverse transforms, it is possible to consider truncated
Frobenius FFTs, following [14, 15, 21],
but this goes beyond the scope of this paper. It might also be worth
investigating the use of Good's algorithm [10] as an
alternative to the Cooley–Tukey FFT, since the order usually admits many distinct prime divisors in our
setting.
Throughout this paper and unless stated otherwise, we will assume the
bit complexity model for Turing machines with a finite number of tapes
[23]. Given a field ,
we will write
for the set of polynomials in
of degree
.
Let us write for some prime number
and
. Elements
in the finite field
are usually represented as
elements of
, where
is a monic irreducible polynomial of degree
. We will denote by
the
cost to multiply two polynomials in
.
Multiplications in
can be reduced to a constant
number of multiplications of polynomials in
via
the precomputation of a “pre-inverse” of
. The multiplication of two polynomials in
can be reduced to the multiplication of two
polynomials in
using Kronecker substitution [8, Corollary 8.27]. In other words, we have
.
The best currently known bound for was
established in [13]. It was shown there that
![]() |
(2.1) |
uniformly in , where
Sometimes, we rather need to perform
multiplications
with
. The complexity of this operation will be denoted
by
. Throughout this paper,
it will be convenient to assume that
is
increasing in both
and
. In other words,
is
increasing in both
and
.
The computation of the Frobenius automorphism in
requires
multiplications
in
when using binary powering, so this operation
has complexity
. This cost
can be lowered by representing elements in
with
respect to so-called normal bases [5]. However,
multiplication in
becomes more expensive for
this representation.
The discrete Frobenius Fourier transform potentially involves
computations in all intermediate fields where
divides
.
The necessary minimal polynomials
for these
fields and the corresponding pre-inverses can be kept in a table. The
bitsize of this table is bounded by
.
It is known [22] that
.
Consequently, the bitsize of the table requires
storage.
Another important operation in finite fields is modular
composition: given and a monic polynomial
of degree
,
the aim is to compute the remainder
of the
euclidean division of
by
. If
is the minimal
polynomial of the finite field
,
then this operation also corresponds to the evaluation of the polynomial
at a point in
.
If
and
are two distinct
monic irreducible polynomials in
of degree
, then the conversion of an element
in
to an element in
also
boils down to one modular composition of degree
over
.
We will denote by the cost of a modular
composition as above. Theoretically speaking, Kedlaya and Umans proved
that
[19]. From a practical point
of view, one rather has
. If
is smooth and one needs to compute several
modular compositions for the same modulus, then algorithms of
quasi-linear complexity do exist [17].
Given primitive elements and
with
, the paper [17]
also contains efficient algorithms for conversions between
and
. Using
modular composition, we notice that this problem reduces in time
to the case where we are free to choose primitive
elements
and
that suit
us. We let
be a cost function for conversions
between
and
and denote
.
Let be any field and let
be a primitive
-th root of
unity for some
. Now consider
a polynomial
, where
The discrete Fourier transform of
with respect to
is the vector
If is a small number, then
can be computed by evaluating
directly for
, using Horner's method for
instance. If
is a prime number, then Rader and
Bluestein have given efficient methods for the computation of discrete
Fourier transforms of order
;
see [24, 6] and subsection 3.2
below.
In subsection 3.4, we are interested in the case when is composite. In that case, one may compute
using the fast Fourier transform (or FFT). The
modern formulation of this algorithm is due to Cooley and Tukey [7], but the algorithm was already known to Gauß [9]. The output
of the FFT uses a
mirrored indexation
that we will introduce in
subsection 3.3 below.
We will denote by the cost of computing a FFT in
the field
.
Assume that we wish to compute a DFT of large prime length . The multiplicative group
is cyclic, of order
. Let
be such that
is a
generator of this cyclic group. Given a polynomial
and
with
,
we have
Setting
it follows that , when
regarding
,
and
as elements of
In other words, we have essentially reduced the computation of a
discrete Fourier transform of length to a
so-called cyclic convolution of length
.
Let be a positive integer with a factorization
such that
for each
. Then any index
can uniquely be written in the form
where for all
.
We call
the digits of
when written in the mixed base
.
We also define the
-mirror
of
by
Whenever is clear from the context, we simply
write
instead of
.
If has length one, then we notice that
Setting , we also have
Finally, given , let
,
,
and
,
so that
and
.
Then it is not hard to see that for all
and
, we have
We are now in a position to recall how the FFT works. Let ,
,
and
be as at the end of
the previous subsection. We denote
and notice
that
is a primitive
-th root of unity. Similarly,
is a primitive
-th root of
unity. Recall that
for
. For any
and
, we have
so that
In this formula, one recognizes two polynomial evaluations of orders
and
.
More precisely, we may decompose the polynomial
as
where and
.
This allows us to rewrite (3.1) as
For each , we may next
introduce the polynomial
by
The relation (3.2) now further simplifies into
![]() |
This leads to the following recursive algorithm for the computation of discrete Fourier transforms.
Algorithm
.
if then
compute the answer using a direct algorithm and return it
Take ,
and
Let ,
,
Decompose
for do
Compute
for do
Let
Let
Compute
return
If the are all bounded, then this algorithm
requires
operations in
. For practical purposes it is best though to pick
such that
and
are as close to
as possible. This
approach is most “cache friendly” in the sense that it keeps
data as close as possible in memory [2].
Let be an extension of finite fields. Let
denote the Frobenius automorphism of
over
. We recall that the
group
generated by
has
size
and that it coincides with the Galois group
of
over
.
Let be a primitive
-th
root of unity with
. Recall
that
and
.
We will write
for the cyclic group of size
generated by
.
The Frobenius automorphism
naturally acts on
and we denote by
the
order of an element
under this action. Notice
that
. A subset
is called a cross section if for every
, there exists exactly one
such that
for some
. We will denote this element
by
. We
will also denote the corresponding set of indices by
.
Now consider a polynomial .
Recall that its discrete Fourier transform with respect to
is defined to be the vector
Since admits coefficients in
, we have
for all . For any
, this means that we may retrieve
from
.
Indeed, setting
with
, we obtain
.
We call
the discrete Frobenius Fourier transform of
with respect to
and the section
.
Let and
.
Then
leaves
invariant,
whence
. Since
is the subfield of
that is fixed
under the action of
, this
yields
It follows that the number of coefficients in
needed to represent
is given by
In particular, the output and input size of the
discrete Frobenius Fourier transform coincide.
In the Cooley–Tukey algorithm from section 3.4, the
second wave of FFTs operates on “twiddled” polynomials instead of the polynomials
. An alternative point of view is to directly
consider the evaluation of
at the points
with
and where
. We will call this operation a twiddled
DFT.
The twiddled DFFT is defined in a similar manner. More precisely, assume
that we are given an -th root
of unity
such that the set
is stable under the action of
.
Assume also that we have a cross section
of
under this action and the corresponding set of
indices
. Then the
twiddled DFFT computes the family
Again, we notice that
We will denote by the complexity of computing a
twiddled DFFT of this kind.
If is a small number, then we recall from
section 3.1 that it is most efficient to compute ordinary
DFTs of order
in a naive fashion, by evaluating
for
using Horner's
method. From an implementation point of view, one may do this using
specialized codelets for various small
.
The same naive strategy can also be used for the computation of DFFTs
and twiddled DFFTs. Given a cross section and
the corresponding set of indices
,
it suffices to evaluate
separately for each
. Let
be the order of
under the action of
. Then
actually belongs
to
, so the evaluation of
can be done in time
. With the customary assumption that
is increasing as a function of
, it follows using (4.2) that the
complete twiddled DFFT can be computed in time
Assuming that full DFTs of order over
are also computed using the naive strategy, this yields
In other words, for small ,
we have achieved our goal to gain a factor
.
The next interesting case for study is when is
prime and the action of
on
is either transitive, or has only two orbits and one of them is trivial.
If
, then
always forms an orbit of its own in
,
but we can have
. If
, then it can happen that
is a singleton. This happens for instance for a
-th primitive root of unity
, for
,
and
.
If is a singleton, say
, then we necessarily have
and the DFFT reduces to a single evaluation
with
and
.
This is precisely the problem of modular composition that we encountered
in section 2, whence
.
The speed-up
with respect to a full DFT is
comprised between
and
, depending on the efficiency of our algorithm for
modular composition. Theoretically speaking, we recall that
, which allows us to gain a factor
for any
.
In a similar way, if and
, then the computation of one DFFT reduces to
additions (in order to compute
) and one composition modulo a polynomial of degree
(instead of
).
Remark is prime,
and
. If we are free to choose the representation
of elements in
, then the
DFFT becomes particularly efficient if we take
. In other words, if
is the
monic minimal polynomial of
over
(so that
),
then we represent elements of
as polynomials in
modulo
.
Given
, just writing down
then yields the evaluation of
at
, whence
.
For large prime orders and
, let us now show how to adapt Rader reduction
to the computation of DFFTs in favourable cases. With the notations from
section 3.2, we have
for all , where
is such that
.
The action of
therefore induces an action
on
.
Let us assume for simplicity that the order of
the additive subgroup generated by
in
is comprime with
.
Then there exists an element
of order
in
and
. Consequently,
We may thus regard the product as a bivariate
polynomial in
and
.
Let
be the constant term in
of this bivariate polynomial. From
,
we can recover
for every
. By construction, the
with
actually form a section of
under the action of
. This
reduces the computation of a DFFT for the section
to the computation of
.
As to the computation of ,
assume for simplicity that
contains a primitive
-th root of unity. Recall
that
has coefficients in
whereas
admits coefficients in
. Notice also that
does
not depend on the input
,
whence it can be precomputed. We compute
as
follows in the FFT-model with respect to
.
We first transform
and
into two vectors
and
. The computation of
may
again be regarded as a precomputation. The computation of the Fourier
transform
now comes down to
modular compositions of degree
over
. We finally transform the result back into
. Altogether, this yields
It is instructive to compare this with the cost of a full DFT of length
over
using Rader's
algorithm:
Depending on the efficiency of modular composition, the speed-up for
DFFTs therefore varies between and
. Notice that the algorithm from the previous
subsection becomes a special case with parameters
and
.
Let be an extension of
and assume that its elements are represented as polynomials in a
primitive element
of degree
. Given
polynomials
, we may then form the
polynomial
If and
is a primitive
-th root of unity in the base
field
, then we notice that
In other words, the discrete Fourier transform operates coefficientwise
with respect to the basis of
.
Recall that stands for the cost of computing
Fourier transforms of length
over
. The maps
and its inverse boil down to matrix transpositions in
memory. On a Turing machine, they can therefore be computed in time
. From (5.1), it
follows that
This relation does not give us anything really new, since we made the
assumption in section 2 that is
increasing in both
and
. Nevertheless, it provides us with an elegant and
explicit algorithm in a special case.
Let us next consider an -th
root of unity
and a primitive
-th root of unity
.
Given
, we have for all
,
Abbreviating and setting
it follows that
Now given the twiddled discrete Fourier transform
of
, we in particular know
the values
. Letting the
Frobenius automorphism
act on these values, we
obtain the vector
. Using one
matrix-vector product
, we
may then retrieve the values of the individual twiddled transforms
at
. Doing
this for each
in a cross section of
under the action of
,
this yields a way to retrieve the twiddled DFFTs of
from the twiddled DFT of
.
Recall that denotes the complexity of computing
the DFFTs of
elements of
over
. Since
is a Vandermonde matrix, the matrix-vector product
can be computed in time
using polynomial
interpolation [8, Chapter 10]. Given
, we may compute each individual vector
in time
using the Frobenius
automorphism
. Since there
are
orbits in
under the
action of
, we may thus
retrieve the
from
in
time
. In other words,
In the extreme case when ,
every
has order one under the action of
and
. In
that case, the bound (5.3) is not very sharp, but (5.2)
already provides us with a good alternative bound for this special case.
For
with
,
let us now consider the case when
has order at
most
under the action of
for all
, so that
. Let
be a
primitive element in
over
, so that
and
. Given our
polynomials
, we may form
Then we have
Moreover, we may compute from
for each
, using the
algorithm from the previous subsection, but with the additional property
that at least one
has maximal order
under the action of
.
If is prime and
,
then the above discussion shows that, without loss of generality, we may
assume that
has order
under the action of
. This
means that
has maximal order
for every
, whereas
has order one. Hence,
.
Similarly, if
is prime and
, then we obtain a reduction to the case when
has maximal order
for
all
, whence
. In both cases, we obtain the following:
is prime, then
Remark , whence
.
When using the best known bound (2.1) for
and
, it follows for some
constant
that
In other words, we cannot hope to gain more than an asymptotic factor of
with respect to a full DFT.
If is not necessarily prime, then the technique
from this section can still be used for every individual
. However, the rewritings of elements in
as elements in
and
have to be done individually for each
using modular compositions. Denoting by
the
order of
under
,
it follows that
Notice that conversions of the same type correspond to modular
compositions with the same modulus. If is
smooth, then it follows that we may use the algorithms from [17]
and keep the cost of the conversions quasi-linear.
Let and
be as in section
3.3 and let
be a primitive
-th root of unity. In this section
we present an analogue of the FFT from section 3.4: the
Frobenius FFT (or FFFT). This algorithm uses the same mirrored
indexation as the Cooley–Tukey FFT: given
, it thus returns the family
. We start with the description of the index set
that corresponds to the so-called
“privileged cross section”
of
.
Given , consider the orbit
of
under the action of
. We define
Then is a cross section of
under the action of
; we call
it the privileged cross section for
(and
).
Now let ,
,
,
and
be as above. Then
is a primitive
-th
root of unity. Let
be the corresponding
privileged cross section of
.
and
with
and
be such that
for some
. Since
and similarly , it follows
that
.
Inversely, given with
for some
, we claim that
there exists a
such that
for
. Indeed,
implies that
,
whence
is an
-th
root of unity. This means that there exists a
with
, i.e.
.
Now assume that is minimal with
for some
. Given
such that
, the
above claim shows that there exists a
such that
for
.
But then
, whence
. This shows that
implies
.
Inversely, assume that is minimal such that
. Then the above claim
implies that there exists a
such that
for
. Without
loss of generality we may assume that
was chosen
minimal while satisfying this property. Given
,
and
with
, we have
. Consequently,
and
whenever
.
In other words,
. This shows
that
implies the existence of a
with
and
.
We are now in a position to adapt the Cooley–Tukey FFT. In the
case when or
is prime,
we will use one of the algorithms from section 4 for the
computation of twiddled DFFTs.
Algorithm
.
if then
return
Take ,
and
Let ,
,
Let
Decompose
for do
for do
Let
Notice that ,
where is the order of
Let
if then
Let
Compute
else
Compute
return
For the usual FFT, it was possible to choose in
such a way that
, and we
recall that this improves the cache efficiency of practical
implementations. However, this optimization is more problematic in our
setting since it would require the development of an efficient recursive
algorithm for twiddled FFFTs. This is an interesting topic, but beyond
the scope of the present paper.
Let us now perform the complexity analysis of . For
,
we first focus on all FFTs and twiddled DFFTs that are computed at
“stage
” using a
fallback algorithm. These FFTs and twiddled DFFTs are all of length
. Given
, let
be the number of
transforms of a polynomial in
over
. From (4.2), it follows that
Now a naive bound for the cost of an FFT or twiddled DFFT of a
polynomial in over
is
This means that the cost of all FFTs and twiddled DFFTs at stage is bounded by
Now can only be non zero if
. Consequently,
The total cost of all FFTs and twiddled DFFTs of prime length is therefore bounded by
where .
Strictly speaking, if is a proper divisor of
, then the output of a
twiddled DFFT of a polynomial in
over
does not use the “standard” representation for
, so the result must be
converted. The cost of all such conversions at stage
is bounded by
For a number with prime factorization
, let
.
We also denote
and notice that
. Let us show by induction over
that
This is clear for , since
and
for
. Assuming that the relation holds for a given
, we get
This completes the induction and we conclude that the total conversion cost is bounded by
This concludes the proof of the following result:
are all prime, then
where .
Remark are very small, then we have shown in section 4.3
that (6.1) can be further sharpened into
As a consequence, the bound for becomes
Remark apart in the bound for
, thereby avoiding the factor
whenever possible. A similar analysis as for the cost
of the conversions then yields
Whenever we can manage to keep and
small with respect to
,
this means that we gain a factor
with respect to
a full DFT.
Let us now study the special case when and
. This case is useful for the
multiplication of polynomials
with
. The idea is to compute the Frobenius FFTs
and
,
to perform the pointwise multiplications
and to
retrieve
by doing one inverse Frobenius FFT.
Modulo zero padding, we have some flexibility in the choice of , which should divide
. In particular, for polynomials
and
of large degree, we may choose
to be a multiple of
and
take
and
(instead of
) for the top call of the
algorithm FFFT from section 6.2. By selecting the
representation of
as explained in Remark 4.1, this means that
for a single
and
for the remaining
indices
.
In other words, one Frobenius FFT of length
reduces to one Frobenius FFT of length
and one
full FFT over
of length
.
We choose the remaining to be small prime
numbers, so as to minimize the cost of the full FFT over
of length
. For
the remaining FFFT of length
,
we may then apply Theorem 6.2 and Remark 6.3.
From a practical point of view, the computation of this remaining FFFT
should take about
times the time the computation
of a full FFT of length
for some small constant
.
We conclude that one entire FFFT of length can
be performed approximately
times faster as a
full FFT of length
. We hope
that this will lead to practical implementations for polynomial
multiplication in
that are approximately twice
as efficient as the implementation from [12].
L. Auslander, J. R. Johnson, and R. W. Johnson. An equivariant Fast Fourier Transform algorithm. Drexel University Technical Report DU-MCS-96-02, 1996.
David H Bailey. Ffts in external of hierarchical memory. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 234–242. ACM, 1989.
G.D. Bergland. A fast Fourier transform algorithm for real-valued series. Communications of the ACM, 11(10):703–710, 1968.
R. Bergmann. The fast Fourier transform and fast wavelet transform for patterns on the torus. Applied and Computational Harmonic Analysis, 2012. In Press.
Dieter Blessenohl. On the normal basis theorem. Note di Matematica, 27(1):5–10, 2007.
Leo I. Bluestein. A linear filtering approach to the computation of discrete Fourier transform. IEEE Transactions on Audio and Electroacoustics, 18(4):451–455, 1970.
J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, New York, NY, USA, 3rd edition, 2013.
C. F. Gauss. Nachlass: Theoria interpolationis methodo nova tractata. In Werke, volume 3, pages 265–330. Königliche Gesellschaft der Wissenschaften, Göttingen, 1866.
I. J. Good. The interaction algorithm and practical Fourier analysis. Journal of the Royal Statistical Society, Series B. 20(2):361–372, 1958.
A. Guessoum and R. Mersereau. Fast algorithms for the multidimensional discrete Fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 34(4):937–943, 1986.
D. Harvey, J. van der Hoeven, and G. Lecerf. Fast
polynomial multiplication over .
In Proc. ISSAC '16, pages 255–262, New York, NY,
USA, 2016. ACM.
D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proc. ISSAC 2004, pages 290–296, Univ. of Cantabria, Santander, Spain, July 4–7 2004.
J. van der Hoeven. Notes on the Truncated Fourier Transform. Technical Report 2005-5, Université Paris-Sud, Orsay, France, 2005.
J. van der Hoeven, R. Lebreton, and É. Schost. Structured FFT and TFT: symmetric and lattice polynomials. In Proc. ISSAC '13, pages 355–362, Boston, USA, June 2013.
J. van der Hoeven and G. Lecerf. Modular composition via factorization. Technical report, HAL, 2017. http://hal.archives-ouvertes.fr/hal-01457074.
J. Johnson and X. Xu. Generating symmetric DFTs and equivariant FFT algorithms. In ISSAC'07, pages 195–202. ACM, 2007.
K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
A Kudlicki, M. Rowicka, and Z. Otwinowski. The crystallographic Fast Fourier Transform. Recursive symmetry reduction. Acta Cryst., A63:465–480, 2007.
R. Larrieu. The truncated fourier transform for mixed radices. Technical report, HAL, 2016. http://hal.archives-ouvertes.fr/hal-01419442.
Jean-Louis Nicolas and Guy Robin. Highly composite numbers by Srinivasa Ramanujan. The Ramanujan Journal, 1(2):119–153, 1997.
C.H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
C.M. Rader. Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE, 56:1107–1108, 1968.
H.V. Sorensen, D.L. Jones, M.T. Heideman, and C.S. Burrus. Real-valued fast Fourier transform algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(6):849–863, 1987.
L. F. Ten Eyck. Crystallographic Fast Fourier Transform. Acta Cryst., A29:183–191, 1973.
A. Vince and X. Zheng. Computing the Discrete Fourier Transform on a hexagonal lattice. Journal of Mathematical Imaging and Vision, 28:125–133, 2007.