Introduction to Field Theory, Iain T. Adamson, 1964, Oliver & Boyd, pp. 26–37.
§ 5. Polynomials. In elementary books on algebra, polynomials are usually defined to be “expressions of the form
f(x) ≡ a_{0} + a_{1}x + a_{2}x^{2} + … + a_{n}x^{n} |
Let R be a commutative ring with identity element e. We denote by P(R) the set of infinite sequences (a_{0}, a_{1}, …, a_{n}, …) of elements of R, each of which has the property that only finitely many of the members a_{i} of the sequence are non-zero; thus for each sequence a = (a_{0}, a_{1}, a_{2}, …) in P(R) there is an integer N_{a} such that a_{i} = 0 for all integers i > N_{a}. It is important to be clear that two sequences are equal if and only if corresponding members are equal, i.e. if a = (a_{0}, a_{1}, a_{2}, …) and b = (b_{0}, b_{1}, b_{2}, …) then a = b if and only if a_{i} = b_{i} (i = 0, 1, 2, …).
We introduce an operation of addition in P(R) by setting
(a_{0}, a_{1}, a_{2}, …)+(b_{0}, b_{1}, b_{2}, …) = (a_{0} + b_{0}, a_{1} + b_{1}, a_{2} + b_{2}, …). |
Next we introduce an operation of multiplication in P(R) by setting
(a_{0}, a_{1}, a_{2}, …)(b_{0}, b_{1}, b_{2}, …) = (c_{0}, c_{1}, c_{2}, …), |
c_{n} = ∑_{i=0}^{n} a_{i}b_{n−i} (n = 0, 1, 2, …). |
Consider now the mapping κ of R into P(R) defined by setting κ(a_{0}) = (a_{0}, 0, 0, …) for all elements a_{0} of R. It is easy to see that κ is a monomorphism; we call it the canonical monomorphism of R into P(R). Then R and its image κ(R) under κ are isomorphic; they differ, of course, in the nature of their elements, but have exactly the same structure. We frequently find it convenient to blur the distinction between R and κ(R) and to use the same symbol a_{0} for both an element of R and for its image under κ in P(R); when we do this, we say that we are identifying R with its image under κ and regarding R as a subring of the ring P(R). It will be found in practice that very little confusion is likely to arise from this identification procedure; but any confusion which does arise can be resolved by a return to the strictly logical notation.
We now introduce a name for the special sequence (0, e, 0, 0, …) in P(R): we call it X. By induction we can prove at once that, for every positive integer n, X^{n} is the sequence (c_{0}, c_{1}, c_{2}, …) for which c_{n} = e and c_{i} = 0 whenever i ≠ n. Then if f = (a_{0}, a_{1}, …, a_{N}, 0, 0, …) is any sequence in P(R), with a_{n} = 0 for all integers n > N, we have
f | = | (a_{0}, 0, 0, …) + (0, a_{1}, 0, …) + … + (0, …, 0, a_{N}, 0, …) | |
= | κ(a_{0}) + κ(a_{1})X + … + κ(a_{N})X^{N}. | (5.1) |
f = a_{0} + a_{1}X + … + a_{N}X^{N}. |
Let f be a non-zero polynomial with coefficients in R: say f = (a_{0}, a_{1}, a_{2}, …). We define the degree of f to be the greatest integer n such that a_{n} is non-zero; we denote the degree of f by ∂f. The polynomials of degree zero are precisely the non-zero elements of the subring κ(R); we call them constant polynomials or simply constants. Polynomials of degree 1 are also called linear polynomials. It is convenient to define the degree of the zero polynomial z = κ(0) to be −∞, with the usual understanding that for every integer n ≧ 0 we have n > −∞ and −∞ + n = −∞. We deduce immediately from the definitions of addition and multiplication that if f and g are polynomials coefficients in R, then
∂(f + g) ≦ max(∂f, ∂g) |
∂(fg) = ∂f + ∂g; |
If f = (a_{0}, a_{1}, …, a_{N}, …) is a non-zero polynomial with degree N, we call a_{N} the leading coefficient of f; this name perhaps appears more reasonable when we express f in the form f = a_{N}X^{N} + … + a_{1}X + a_{0}. If the leading coefficient of f is the identity e of R, we say that f is a monic polynomial, and we drop the leading coefficient, writing simply f = X^{N} + … + a_{1}X + a_{0}.
We now concentrate our attention on polynomials with coefficients in a field. So let F be a field; a polynomial f with coefficients in F is said to be divisible by another such polynomial d, and d is said to be a factor of f, if there exists a polynomial q such that f = qd. In this situation we say also that f is a multiple of d. The polynomial f is said to be irreducible if it has no factor d such that 0 < ∂d < ∂f; thus the only factors of an irreducible polynomial f are the constant polynomials and the products of f by the constant polynomials.
We now state without proof two theorems to which we shal have constant recourse in the next chapter. Proofs of these results may be found in Turnbull, Theory of Equations, § 17; there the coefficients of the polynomials considered are described as “constants”, but if we interpret this to mean “elements of the field F” we can extract the following statements.
Theorem 5.1. Let f be any polynomial and let d be a non-zero polynomial with coefficients in F. Then there exist unique polynomials q and r with coefficients in F such that f = qd+r and ∂r<∂d.
Theorem 5.2. Let f and g be any two non-zero polynomials with coefficients in F. Then there exists a unique monic polynomial h with coefficients in F such that (1) h is a factor of both f and g; (2) if k is any polynomial which is a factor of both f and g, then k is a factor of h. Further, there exist polynomials a and b with coefficients in F such that h = af+bg.
The polynomials q and r in Theorem 5.1 are called respectively the quotient and remainder when f is divided by d. The unique polynomial h described in Theorem 5.2 is called the highest common factor or greatest common divisor of f and g. If the highest common factor of f and g is the constant polynomial e (to be quite precise we should call it κ(e)), we say that f and g are relatively prime.
We now introduce a very important mapping of the polynomial ring P(R) into itself; this is the mapping D defined by setting
Df | = | D(a_{0} + a_{1}X + a_{2}X^{2} + … + a_{n}X^{n}) |
= | a_{1} + 2a_{2}X + 3a_{3}X^{3} + … + na_{n}X^{n−1} |
D(f + g) = D(f) + D(g) | and | D(fg) = (Df)g + f(Dg) |
It is, of course, quite impracticable to define derivatives of polynomials with coefficients in a general field F by the familiar type of limiting process used in the calculus; for one reason, we have not defined polynomials as functions; for another, in a general field we do not have any notion of “limit”. But the definition we have given makes sense in any field F, since the coefficients 2a_{2}, 3a_{3}, … are integral multiples of elements of F and hence are well-determined elements of F.
The next result is also an immediate consequence of the definition.
Theorem 5.3. Let f = a_{0} + a_{1}X + … + a_{n}X^{n} be a polynomial in P(F). If F has characteristic zero then Df = z (the zero polynomial) if and only if f is either zero or a constant polynomial, i.e., if and only if a_{1} = a_{2} = … = a_{n} = 0. If F has non-zero characteristic p then Df = z if and only if a_{k} = 0 for all integers k not divisible by p.
Let R be any commutative ring containing the field F, and let α be any element of R. An element β of R which can be expressed (not necessarily uniquely) in the form β = a_{0} + a_{1}α + a_{2}α^{2} + … + a_{n}α^{n}, where a_{0}, a_{1}, …, a_{n} are elements of F and n is a non-negative integer, is called a polynomial in α with coefficients in F. The set of all such elements is a subring of R, which we denote by F[α]. If κ denotes, as before, the canonical monomorphism of F into P(F), then equation (5.1) shows that P(F) = κ(F)[X] and so, identifying F and κ(F), we have P(F) = F[X] which is a standard notation for the polynomial ring with coefficients in F.
Returning to the general case, we define a mapping σ_{α} of the polynomial ring P(F) into R by setting
σ_{α}(f) | = | σ_{α}(a_{0} + a_{1}X + … + a_{n}X^{n}) |
= | a_{0} + a_{1}α + … + a_{n}α^{n} |
Theorem 5.4. If R is a commutative ring containing the field F and α is any element of R then the mapping σ_{α} is an epimorphism of P(R) onto F[α].
Proof. Let f = a_{0} + a_{1}X + a_{2}X^{2} + … and g = b_{0} + b_{1}X + b_{2}X^{2} + … be any two polynomials in P(F). Then
f + g = (a_{0} + b_{0}) + (a_{1} + b_{1})X + (a_{2} + b_{2})X^{2} + … |
fg = (a_{0}b_{0}) + (a_{0}b_{1} + a_{1}b_{0})X + (a_{0}b_{2} + a_{1}b_{1} + a_{2}b_{0})X^{2} + …. |
σ_{α}(f) + σ_{α}(g) | = | (a_{0} + a_{1}α + a_{2}α^{2} + …) + (b_{0} + b_{1}α + b_{2}α^{2} + …) |
= | (a_{0} + b_{0}) + (a_{1} + b_{1})α + (a_{2} + b_{2})α^{2} + … | |
= | σ_{α}(f + g). |
σ_{α}(f)σ_{α}(g) | = | (a_{0} + a_{1}α + a_{2}α^{2} + …)(b_{0} + b_{1}α + b_{2}α^{2} + …) | |
= | a_{0}b_{0} + (a_{0}b_{1}α + a_{1}αb_{0}) + (a_{0}b_{2}α^{2} + a_{1}αb_{1}α + a_{2}α^{2}b_{0}) + … | (5.2) | |
= | a_{0}b_{0} + (a_{0}b_{1} + a_{1}b_{0})α + (a_{0}b_{2} + a_{1}b_{1} + a_{2}b_{0})α^{2} + … | (5.3) | |
= | σ_{α}(fg). |
If f is a polynomial in P(F) and σ_{α}(f) = 0, then we say that α is a root of f in R. As in elementary algebra, there is an intimate connexion between the roots of f in the field F itself and the linear factors of f in P(F). If α is any element of F we denote the linear polynomial X − α by l_{α}.
Theorem 5.5. If α is an element of F and f is a polynomial in P(F), then α is a root of f in F if and only if l_{α} is a factor of f in P(F).
Proof. According to Theorem 5.1 there exist polynomials q and r in P(F) such that f = ql_{α} + r and ∂r < ∂l_{α} = 1. Thus r has the form κ(a) where a is an element of F, possibly zero. Then
σ_{α}(f) = σ_{α}(ql_{α} + r) = σ_{α}(q)σ_{α}(l_{α}) + σ_{α}(r) = σ_{α}(r) = a, |
We now contend that if two fields are isomorphic then the rings of polynomials with coefficients in those fields are also isomorphic. This result is an easy consequence of the following theorem.
Theorem 5.6. Let τ be a monomorphism of a field F_{1} into a field F_{2}; let κ_{1}, κ_{2} be the canonical monomorphisms of F_{1} into P(F_{1}), F_{2} into P(F_{2}) respectively. Then there exists a monomorphism τ_{P} of P(F_{1}) into P(F_{2}) such that for every element a of F_{1} we have τ_{P}(κ_{1}(a)) = κ_{2}(τ(a)).
Proof. The mapping τ_{P} of P(F_{1}) into P(F_{2}) defined by setting
τ_{P}(f) | = | τ_{P}(a_{0} + a_{1}X + … + a_{n}X^{n}) |
= | τ(a_{0}) + τ(a_{1})X + … + τ(a_{n})X^{n} |
The condition that τ_{P}(κ_{1}(a)) = κ_{2}(τ(a)) for every element a in F_{1} is sometimes described by saying that the diagram in fig. 1 is commutative; for the condition asserts that if we start with any element a of F_{1} and “transport it” to P(F_{2}) by either of the routes indicated in the diagram&emdash;by applying first κ_{1} and then τ_{P} or by applying first τ and then κ_{2}&emdash;we obtain the same result.
If we identify F_{1} and F_{2} with their images under the canonical monomorphisms and regard them as subfields of P(F_{1}) and P(F_{2}) respectively, then the condition on τ_{P} is simply that τ_{P}(a) = τ(a) for every element a of F_{1}, i.e. that τ_{P} shall act like τ on the elements of F_{1}. For this reason we call τ_{P} the canonical extension of τ to P(F_{1}).
§ 6. Higher polynomial rings; rational functions. Let R be any commutative ring with identity e. We define inductively a family of “higher polynomial rings” with coefficients in R, as follows: P_{1}(R) is simply the polynomial ring P(R) as we defined it in the last section; then for n>1, we set P_{n}(R) = P(P_{n−1}(R)). We call P_{n}(R) the nth order polynomial ring with coefficients in R.
In order to achieve some insight into the structure of these rings we shall examine P_{2}(R) = P(P(R)). Let κ_{1} and κ_{2} be the canonical monomorphisms of R into P(R) and of P(R) into P_{2}(R) respectively; the mapping κ of R into P_{2}(R) defined by setting κ(a) = κ_{2}(κ_{1}(a)) for all elements a of R is clearly also a monomorphism. If we denote by X the element (0, e, 0, …) of P(R), then, as we saw in § 5, every element b of P(R) can be expressed in the form
b = κ_{1}(a_{0}) + κ_{1}(a_{1})X + … + κ_{1}(a_{m})X^{m} |
p = κ_{2}(b_{0}) + κ_{2}(b_{1})X_{2} + … + κ_{2}(b_{n})X_{2}^{n} |
b_{j} = κ_{1}(a_{0j}) + κ_{1}(a_{1j})X + … + κ_{1}(a_{mjj})X^{mj} (j = 0, 1, …, n) |
p | = | κ(a_{00}) + κ(a_{10})X_{1} + … + κ(a_{m00})X_{1}^{m0} |
+ | (κ(a_{01} + κ(a_{11})X_{1} + … + κ(a_{m11})X_{1}^{m1})X_{2} | |
+ | … + | |
+ | (κ(a_{0n} + κ(a_{1n})X_{1} + … + κ(a_{mnn})X_{1}^{mn})X_{2}^{n} | |
= | ∑_{j=0}^{n} ∑_{i=0}^{mj} κ(a_{ij})X_{1}^{i}X_{2}^{j}. |
p = ∑_{i=0}^{∞} ∑_{j=0}^{∞} a_{ij}X_{1}^{i}X_{2}^{j} |
A similar discussion shows that by suitable choice of elements X_{1}, … X_{n} and the usual identification procedure, every element of P_{n}(R) can be expressed in the form
∑_{i1=0}^{∞} … ∑_{in=0}^{∞} a_{i1…in}X_{1}^{i1}…X_{n}^{in} |
Let now S be any commutative ring containing R and let α = (α_{1}, …, α_{n}) be an ordered n-tuple of elements of S. We may then define a mapping σ_{α} of P_{n}(R) into S by setting
σ_{α}(∑ a_{i1…in}X_{1}^{i1}…X_{n}^{in}) = ∑ a_{i1…in}α_{1}^{i1}…α_{n}^{in} |
pp. 39–40.
5. Let F be a field. For each polynomial f in P(F) we may define a mapping f* of F into itself by setting f*(α) = σ_{α}(f) for every element α of F. Show that f* is not in general a homomorphism.
6. Let F = Z_{p} and let f be the polynomial X^{p} − X in P(F). Show that while f is not the zero polynomial the mapping f* of F into itself determined by f as in Example 5 is the zero mapping, i.e., f*(α) = 0 for every element α of F.
8. Let F be a field. Show that the only elements of the polynomial ring P(F) which have multiplicative inverses are the constant polynomials.
9. Let F be a field, κ the canonical monomorphism of F into the polynomial ring P(F). If ϕ is an automorphism of P(F) show that there is an automorphism ϕ_{1} of F such that ϕ(κ(a)) = κ(ϕ_{1}(a)) for all elements a of F and that there are elements c, d of F (c ≠ 0) such that ϕ(X) = cX + d.
10. If f is any polynomial of degree n with coefficients in a field F of characteristic zero with identity element e, prove that (in the notation of example 5)
f = ∑_{r=0}^{n} (r!e)^{−1}(D^{r}f)*(0)X^{r}, |
† Turnbull, Theory of Equations, Chapter VI.
Galois Theory, Joseph Rotman, 1990, Springer-Verlag, pp. 2–3.
3. If R is a ring, define a polynomial f(x) with coefficients in R (briefly, a polynomial over R) to be a sequence f = (r_{0}, r_{1}, …, r_{n}, 0, 0, …) with r_{i} ∈ R for all i and r_{i} = 0 for all i > n. If g(x) = (s_{0}, s_{1}, …, s_{m}, 0, 0, …) is another polynomial over R, it follows that f(x) = g(x) if and only if r_{i} = s_{i} for all i. Denote the set of all such polynomials by R[x], and define addition and multiplication on R[x] as follows: (r_{0}, r_{1}, …, r_{i}, …) + (s_{0}, s_{1}, …, s_{i}, …) = (r_{0} + s_{0}, r_{1} + s_{1}, …, r_{i} + s_{i}, …) and (r_{0}, r_{1}, …, r_{i}, …)(s_{0}, s_{1}, …, s_{j}, …) = (t_{0}, t_{1}, …, t_{k}, …), where t_{0} = r_{0}s_{0}, t_{1} = r_{0}s_{1} + r_{1}s_{0}, and, in general, t_{k} = ∑ r_{i}s_{j}, the summation being over all i, j with i + j = k. Let (1, 0, 0, …) be abbreviated to 1 (there are now two meanings for this symbol). It is routine but tedious to verify that R[x] is a ring, the polynomial ring over R.
What is the significance of the letter x in the notation f(x)? Let x denote the specific element of R[x]: x = (0, 1, 0, 0, …). It is easy to prove that x^{2} = (0, 0, 1, 0, 0, …) and, by induction, that x^{i} is the sequence having 0 everywhere except for 1 in the ith spot. The reader may now prove (thereby recapturing the usual notation) that f(x) = (r_{0}, r_{1}, …, r_{n}, 0, 0, …) = r_{0} + r_{1}x + … + r_{n}x^{n} = ∑ r_{i}x^{i} (r_{0} = r_{0}1 if we identify r_{0} with (r_{0}, 0, 0, …) in R[x]). Notice that x is an honest element of a ring and not a variable; its role as a variable, however, is given in Exercise 18.
We remind the reader of the usual vocabulary associated with f(x) = r_{0} + r_{1}x + … + r_{n}x^{n}. The leading coefficient of f(x) is r_{n}, where n is the largest integer (if any) with r_{n} ≠ 0; n is called the degree of f(x) and is denoted by ∂f; every polynomial except 0 = (0, 0, …) has a degree.
A monic polynomial is one whose leading coefficient is 1. The constant term of f(x) is r_{0}; a constant (polynomial) is either the zero polynomial 0 or a polynomial of degree 0; linear, quadratic, cubic, quartic (or biquadratic), and quintic polynomials have degrees, respectively, 1, 2, 3, 4, and 5.
Recall from linear algebra that a linear homogeneous system over a field with r equations and n unknowns has a nontrivial solution if r < n; if r = n, one must examine a determinant. If f(x) = (x − α_{1})…(x − α_{n}) = ∑ r_{i}x^{i}, then it is easy to see, by induction on n, that
r_{n−1} | = | −∑ α_{i} |
r_{n−2} | = | ∑_{i<j }α_{i}α_{j} |
r_{n−3} | = | −∑_{i<j<k} α_{i}α_{j}α_{k} |
⋮ | ||
r_{0} | = | (−1)^{n}α_{1}…α_{n}. |
pg. 8
18. If a ∈ R, define e_{a} : R[x] → R by f(x) = ∑ r_{i}x^{i} ↦ ∑ r_{i}a^{i} (denote this element of R by f(a)); prove that e_{a} is a ring map (it is called evaluation at a). If f(a) = 0, then a is called a root of f(x).
(This exercise allows one to regard x as a variable ranging over R; that is, each polynomial f(x) ∈ R(x) determines a function R → R. But look at the next exercise.)
19. Give an example of distinct polynomials f(x), g(x) ∈ ℤ_{p}[x] with f(a) = g(a) for all a ∈ ℤ_{p}[x].
(Distinct polynomials (not all coefficients are the same) may determine the same function; this is one reason for our defining polynomials in such a formal way. Indeed, if F is any finite field (there are such other than ℤ_{p}), there are only finitely many functions F → F but there are infinitely many polynomials. We shall see after Theorem 11 that this exercise is false if ℤ_{p} is replaced by any infinite field.)
pg. 17.
Theorem 11. If F is a field and f(x) ∈ F[x] has degree n, then F contains at most n roots of f(x).
This last theorem is false for arbitrary rings R; for example, x^{2} − 1 has four roots in ℤ_{8}.
Recall that every polynomial f(x) ∈ F[x] determines a function F → F, namely, a ↦ f(a). In Exercise 19, however, we saw that distinct polynomials in ℤ_{p}[x] may determine the same function. This pathology vanishes when the coefficient field is infinite. Let F be an infinite field and let f(x) ≠ g(x) in F[x] satisfy f(a) = g(a) for all a ∈ F. Then h(x) = f(x)−g(x) is not the zero polynomial; hence it has a degree, say, n. But each of the infinitely many elements a ∈ F is a root of h(x), and this contradicts Theorem 11.