Introduction to Field Theory, Iain T. Adamson, 1964, Oliver & Boyd, pp. 26–37.

§ 5. Polynomials. In elementary books on algebra, polynomials are usually defined to be “expressions of the form
f(x) ≡ a0 + a1x + a2x2 + … + anxn
where a0, … an are numbers”. This definition is open to at least two objections. First, it gives no indication as to the logical status of the symbol x—and to say that x is a “variable” or an “indeterminate” simply begs further questions. Secondly, when one comes to define addition and multiplication of polynomials, it is difficult to avoid the feeling that these operations are at least partially defined already—for the polynomials are written with addition signs between their terms, and these terms contain powers of x. To avoid these objections we proceed in what may appear to be a more abstract way; in reality we are simply exploiting the well-known fact that in elementary algebra the powers of x act essentially only as “place-holders” while the coefficients are the really important constituents. These remarks may already have suggested t othe sophisticated reader that we might define polynomials to be simply finite sequences of coefficients such as (a0, a1, …, an). For technical reasons—in fact, to enable us to deal conveniently with polynomials of different degrees, which would correspond to sequences of different lengths—we prefer to deal with “essentially finite” infinite sequences. We now proceed with the formal development.

Let R be a commutative ring with identity element e. We denote by P(R) the set of infinite sequences (a0, a1, …, an, …) of elements of R, each of which has the property that only finitely many of the members ai of the sequence are non-zero; thus for each sequence a = (a0, a1, a2, …) in P(R) there is an integer Na such that ai = 0 for all integers i > Na. It is important to be clear that two sequences are equal if and only if corresponding members are equal, i.e. if a = (a0, a1, a2, …) and b = (b0, b1, b2, …) then a = b if and only if ai = bi (i = 0, 1, 2, …).

We introduce an operation of addition in P(R) by setting
(a0, a1, a2, …)+(b0, b1, b2, …) = (a0 + b0, a1 + b1, a2 + b2, …).
It is easily verified that under this law of composition P(R) forms an abelian group. The zero element is clearly the sequence z = (0, 0, 0, …) each of whose members is the zero of R; the additive inverse of (a0, a1, a2, …) is (− a0, − a1, − a2, …).

Next we introduce an operation of multiplication in P(R) by setting
(a0, a1, a2, …)(b0, b1, b2, …) = (c0, c1, c2, …),
cn = ∑i=0n aibn−i (n = 0, 1, 2, …).
Then it is a routine matter to verify that this multiplication is associative and commutative, and also distributive with respect to the addition. That is to say, P(R) is a commutative ring under the laws of composition we have defined. Further, P(R) has an identity element, namely the sequence (e, 0, 0, …) where e is the identity element of R.

Consider now the mapping κ of R into P(R) defined by setting κ(a0) = (a0, 0, 0, …) for all elements a0 of R. It is easy to see that κ is a monomorphism; we call it the canonical monomorphism of R into P(R). Then R and its image κ(R) under κ are isomorphic; they differ, of course, in the nature of their elements, but have exactly the same structure. We frequently find it convenient to blur the distinction between R and κ(R) and to use the same symbol a0 for both an element of R and for its image under κ in P(R); when we do this, we say that we are identifying R with its image under κ and regarding R as a subring of the ring P(R). It will be found in practice that very little confusion is likely to arise from this identification procedure; but any confusion which does arise can be resolved by a return to the strictly logical notation.

We now introduce a name for the special sequence (0, e, 0, 0, …) in P(R): we call it X. By induction we can prove at once that, for every positive integer n, Xn is the sequence (c0, c1, c2, …) for which cn = e and ci = 0 whenever in. Then if f = (a0, a1, …, aN, 0, 0, …) is any sequence in P(R), with an = 0 for all integers n > N, we have
f=(a0, 0, 0, …) + (0, a1, 0, …) + … + (0, …, 0, aN, 0, …)
=κ(a0) + κ(a1)X + … + κ(aN)XN.(5.1)
Carrying out the identification of R and κ(R) described in the last paragraph, we see that we have expressed f in the form
f = a0 + a1X + … + aNXN.
This provides justification for calling the elements of P(R) polynomials and for describing P(R) as the ring of polynomials with coefficients in R.

Let f be a non-zero polynomial with coefficients in R: say f = (a0, a1, a2, …). We define the degree of f to be the greatest integer n such that an is non-zero; we denote the degree of f by ∂f. The polynomials of degree zero are precisely the non-zero elements of the subring κ(R); we call them constant polynomials or simply constants. Polynomials of degree 1 are also called linear polynomials. It is convenient to define the degree of the zero polynomial z = κ(0) to be −∞, with the usual understanding that for every integer n ≧ 0 we have n > −∞ and −∞ + n = −∞. We deduce immediately from the definitions of addition and multiplication that if f and g are polynomials coefficients in R, then
∂(f + g) ≦ max(∂f, ∂g)
∂(fg) = ∂f + ∂g;
we notice, too, that if ∂f ≠ ∂g then we actually have ∂(f + g) = max(∂f, ∂g).

If f = (a0, a1, …, aN, …) is a non-zero polynomial with degree N, we call aN the leading coefficient of f; this name perhaps appears more reasonable when we express f in the form f = aNXN + … + a1X + a0. If the leading coefficient of f is the identity e of R, we say that f is a monic polynomial, and we drop the leading coefficient, writing simply f = XN + … + a1X + a0.

We now concentrate our attention on polynomials with coefficients in a field. So let F be a field; a polynomial f with coefficients in F is said to be divisible by another such polynomial d, and d is said to be a factor of f, if there exists a polynomial q such that f = qd. In this situation we say also that f is a multiple of d. The polynomial f is said to be irreducible if it has no factor d such that 0 < ∂d < ∂f; thus the only factors of an irreducible polynomial f are the constant polynomials and the products of f by the constant polynomials.

We now state without proof two theorems to which we shal have constant recourse in the next chapter. Proofs of these results may be found in Turnbull, Theory of Equations, § 17; there the coefficients of the polynomials considered are described as “constants”, but if we interpret this to mean “elements of the field F” we can extract the following statements.

Theorem 5.1. Let f be any polynomial and let d be a non-zero polynomial with coefficients in F. Then there exist unique polynomials q and r with coefficients in F such that f = qd+r and ∂r<∂d.

Theorem 5.2. Let f and g be any two non-zero polynomials with coefficients in F. Then there exists a unique monic polynomial h with coefficients in F such that (1) h is a factor of both f and g; (2) if k is any polynomial which is a factor of both f and g, then k is a factor of h. Further, there exist polynomials a and b with coefficients in F such that h = af+bg.

The polynomials q and r in Theorem 5.1 are called respectively the quotient and remainder when f is divided by d. The unique polynomial h described in Theorem 5.2 is called the highest common factor or greatest common divisor of f and g. If the highest common factor of f and g is the constant polynomial e (to be quite precise we should call it κ(e)), we say that f and g are relatively prime.

We now introduce a very important mapping of the polynomial ring P(R) into itself; this is the mapping D defined by setting
Df=D(a0 + a1X + a2X2 + … + anXn)
=a1 + 2a2X + 3a3X3 + … + nanXn−1
for every polynomial f = a0 + a1X + a2X2 + … + anXn in P(F). We naturally call the polynomial Df the derivative of the polynomial f. It is no surprise to learn that, if f and g are any two polynomials in P(F), then
D(f + g) = D(f) + D(g)andD(fg) = (Df)g + f(Dg)
Nor is it difficult to prove these results, which are purely formal consequences of the definition of D.

It is, of course, quite impracticable to define derivatives of polynomials with coefficients in a general field F by the familiar type of limiting process used in the calculus; for one reason, we have not defined polynomials as functions; for another, in a general field we do not have any notion of “limit”. But the definition we have given makes sense in any field F, since the coefficients 2a2, 3a3, … are integral multiples of elements of F and hence are well-determined elements of F.

The next result is also an immediate consequence of the definition.

Theorem 5.3. Let f = a0 + a1X + … + anXn be a polynomial in P(F). If F has characteristic zero then Df = z (the zero polynomial) if and only if f is either zero or a constant polynomial, i.e., if and only if a1 = a2 = … = an = 0. If F has non-zero characteristic p then Df = z if and only if ak = 0 for all integers k not divisible by p.

Let R be any commutative ring containing the field F, and let α be any element of R. An element β of R which can be expressed (not necessarily uniquely) in the form β = a0 + a1α + a2α2 + … + anαn, where a0, a1, …, an are elements of F and n is a non-negative integer, is called a polynomial in α with coefficients in F. The set of all such elements is a subring of R, which we denote by F[α]. If κ denotes, as before, the canonical monomorphism of F into P(F), then equation (5.1) shows that P(F) = κ(F)[X] and so, identifying F and κ(F), we have P(F) = F[X] which is a standard notation for the polynomial ring with coefficients in F.

Returning to the general case, we define a mapping σα of the polynomial ring P(F) into R by setting
σα(f)=σα(a0 + a1X + … + anXn)
=a0 + a1α + … + anαn
for every polynomial f = a0 + a1X + … + anXn in P(F). We call σα the operation of substituting α for X; clearly σα maps P(F) onto the subring F[α] of R. But we can say more than this, as follows.

Theorem 5.4. If R is a commutative ring containing the field F and α is any element of R then the mapping σα is an epimorphism of P(R) onto F[α].

Proof. Let f = a0 + a1X + a2X2 + … and g = b0 + b1X + b2X2 + … be any two polynomials in P(F). Then
f + g = (a0 + b0) + (a1 + b1)X + (a2 + b2)X2 + …
fg = (a0b0) + (a0b1 + a1b0)X + (a0b2 + a1b1 + a2b0)X2 + ….
We see at once that
σα(f) + σα(g)=(a0 + a1α + a2α2 + …) + (b0 + b1α + b2α2 + …)
=(a0 + b0) + (a1 + b1)α + (a2 + b2)α2 + …
=σα(f + g).
We have also
σα(f)σα(g)=(a0 + a1α + a2α2 + …)(b0 + b1α + b2α2 + …)
=a0b0 + (a0b1α + a1αb0) + (a0b2α2 + a1αb1α + a2α2b0) + …(5.2)
=a0b0 + (a0b1 + a1b0)α + (a0b2 + a1b1 + a2b0)α2 + …(5.3)
Thus σα is a homomorphism. We remark that we have made essential use of the commutative property of R in passing from (5.2) to (5.3).

If f is a polynomial in P(F) and σα(f) = 0, then we say that α is a root of f in R. As in elementary algebra, there is an intimate connexion between the roots of f in the field F itself and the linear factors of f in P(F). If α is any element of F we denote the linear polynomial Xα by lα.

Theorem 5.5. If α is an element of F and f is a polynomial in P(F), then α is a root of f in F if and only if lα is a factor of f in P(F).

Proof. According to Theorem 5.1 there exist polynomials q and r in P(F) such that f = qlα + r and ∂r < ∂lα = 1. Thus r has the form κ(a) where a is an element of F, possibly zero. Then
σα(f) = σα(qlα + r) = σα(q)σα(lα) + σα(r) = σα(r) = a,
since σα(lα) = αα = 0. Hence α is a root of f if and only if a = 0, i.e., if and only if r is the zero polynomial and so lα is a factor of f.

We now contend that if two fields are isomorphic then the rings of polynomials with coefficients in those fields are also isomorphic. This result is an easy consequence of the following theorem.

Theorem 5.6. Let τ be a monomorphism of a field F1 into a field F2; let κ1, κ2 be the canonical monomorphisms of F1 into P(F1), F2 into P(F2) respectively. Then there exists a monomorphism τP of P(F1) into P(F2) such that for every element a of F1 we have τP(κ1(a)) = κ2(τ(a)).

Proof. The mapping τP of P(F1) into P(F2) defined by setting
τP(f)=τP(a0 + a1X + … + anXn)
=τ(a0) + τ(a1)X + … + τ(an)Xn
for every polynomial f = a0 + a1X + … + anXn is easily seen to satisfy our requirements.

The condition that τP(κ1(a)) = κ2(τ(a)) for every element a in F1 is sometimes described by saying that the diagram in fig. 1 is commutative; for the condition asserts that if we start with any element a of F1 and “transport it” to P(F2) by either of the routes indicated in the diagram&emdash;by applying first κ1 and then τP or by applying first τ and then κ2&emdash;we obtain the same result.

If we identify F1 and F2 with their images under the canonical monomorphisms and regard them as subfields of P(F1) and P(F2) respectively, then the condition on τP is simply that τP(a) = τ(a) for every element a of F1, i.e. that τP shall act like τ on the elements of F1. For this reason we call τP the canonical extension of τ to P(F1).

§ 6. Higher polynomial rings; rational functions. Let R be any commutative ring with identity e. We define inductively a family of “higher polynomial rings” with coefficients in R, as follows: P1(R) is simply the polynomial ring P(R) as we defined it in the last section; then for n>1, we set Pn(R) = P(Pn−1(R)). We call Pn(R) the nth order polynomial ring with coefficients in R.

In order to achieve some insight into the structure of these rings we shall examine P2(R) = P(P(R)). Let κ1 and κ2 be the canonical monomorphisms of R into P(R) and of P(R) into P2(R) respectively; the mapping κ of R into P2(R) defined by setting κ(a) = κ2(κ1(a)) for all elements a of R is clearly also a monomorphism. If we denote by X the element (0, e, 0, …) of P(R), then, as we saw in § 5, every element b of P(R) can be expressed in the form
b = κ1(a0) + κ1(a1)X + … + κ1(am)Xm
where a0, a1, … am are elements of R. Similarly, if we denote by X2 the element (κ1(0), κ1(e), κ1(0), …) of P2(R), we see that every element p of P2(R) can be expressed in the form
p = κ2(b0) + κ2(b1)X2 + … + κ2(bn)X2n
where b0, b1, … bn are elements of P(R). If we write
bj = κ1(a0j) + κ1(a1j)X + … + κ1(amjj)Xmj (j = 0, 1, …, n)
and set X1 = κ2(X), then we see that p takes the form
p=κ(a00) + κ(a10)X1 + … + κ(am00)X1m0
+(κ(a01 + κ(a11)X1 + … + κ(am11)X1m1)X2
+… +
+(κ(a0n + κ(a1n)X1 + … + κ(amnn)X1mn)X2n
=j=0ni=0mj κ(aij)X1iX2j.
If we identify the original ring R with its image under the monomorphism κ, it appears that every element p of P2(R) can be expressed in the form
p = ∑i=0j=0 aijX1iX2j
where the coefficients aij belong to the ring R and only finitely many of them are non-zero.

A similar discussion shows that by suitable choice of elements X1, … Xn and the usual identification procedure, every element of Pn(R) can be expressed in the form
i1=0 … ∑in=0 ai1inX1i1Xnin
where the coefficients ai1in are in R and only finitely many of them are non-zero.

Let now S be any commutative ring containing R and let α = (α1, …, αn) be an ordered n-tuple of elements of S. We may then define a mapping σα of Pn(R) into S by setting
σα(∑ ai1inX1i1Xnin) = ∑ ai1inα1i1αnin
for every element ∑ ai1inX1i1Xnin in Pn(R). As in Theorem 5.4 we may prove that σα is a homomorphism. The image of Pn(R) under the mapping σα is a subring of S which is often denoted by R[α] or R[α1αn]. In particular, Pn(R) = R[X1, … Xn].

pp. 39–40.

5. Let F be a field. For each polynomial f in P(F) we may define a mapping f* of F into itself by setting f*(α) = σα(f) for every element α of F. Show that f* is not in general a homomorphism.

6. Let F = Zp and let f be the polynomial XpX in P(F). Show that while f is not the zero polynomial the mapping f* of F into itself determined by f as in Example 5 is the zero mapping, i.e., f*(α) = 0 for every element α of F.

8. Let F be a field. Show that the only elements of the polynomial ring P(F) which have multiplicative inverses are the constant polynomials.

9. Let F be a field, κ the canonical monomorphism of F into the polynomial ring P(F). If ϕ is an automorphism of P(F) show that there is an automorphism ϕ1 of F such that ϕ(κ(a)) = κ(ϕ1(a)) for all elements a of F and that there are elements c, d of F (c ≠ 0) such that ϕ(X) = cX + d.

10. If f is any polynomial of degree n with coefficients in a field F of characteristic zero with identity element e, prove that (in the notation of example 5)
f = ∑r=0n (r!e)−1(Drf)*(0)Xr,
where the polynomials Drf are defined inductively by setting D0f = f and Drf = D(Dr−1f) for r≧1.

Turnbull, Theory of Equations, Chapter VI.

Galois Theory, Joseph Rotman, 1990, Springer-Verlag, pp. 2–3.

3. If R is a ring, define a polynomial f(x) with coefficients in R (briefly, a polynomial over R) to be a sequence f = (r0, r1, …, rn, 0, 0, …) with riR for all i and ri = 0 for all i > n. If g(x) = (s0, s1, …, sm, 0, 0, …) is another polynomial over R, it follows that f(x) = g(x) if and only if ri = si for all i. Denote the set of all such polynomials by R[x], and define addition and multiplication on R[x] as follows: (r0, r1, …, ri, …) + (s0, s1, …, si, …) = (r0 + s0, r1 + s1, …, ri + si, …) and (r0, r1, …, ri, …)(s0, s1, …, sj, …) = (t0, t1, …, tk, …), where t0 = r0s0, t1 = r0s1 + r1s0, and, in general, tk = ∑ risj, the summation being over all i, j with i + j = k. Let (1, 0, 0, …) be abbreviated to 1 (there are now two meanings for this symbol). It is routine but tedious to verify that R[x] is a ring, the polynomial ring over R.

What is the significance of the letter x in the notation f(x)? Let x denote the specific element of R[x]: x = (0, 1, 0, 0, …). It is easy to prove that x2 = (0, 0, 1, 0, 0, …) and, by induction, that xi is the sequence having 0 everywhere except for 1 in the ith spot. The reader may now prove (thereby recapturing the usual notation) that f(x) = (r0, r1, …, rn, 0, 0, …) = r0 + r1x + … + rnxn = ∑ rixi (r0 = r01 if we identify r0 with (r0, 0, 0, …) in R[x]). Notice that x is an honest element of a ring and not a variable; its role as a variable, however, is given in Exercise 18.

We remind the reader of the usual vocabulary associated with f(x) = r0 + r1x + … + rnxn. The leading coefficient of f(x) is rn, where n is the largest integer (if any) with rn ≠ 0; n is called the degree of f(x) and is denoted by ∂f; every polynomial except 0 = (0, 0, …) has a degree.

A monic polynomial is one whose leading coefficient is 1. The constant term of f(x) is r0; a constant (polynomial) is either the zero polynomial 0 or a polynomial of degree 0; linear, quadratic, cubic, quartic (or biquadratic), and quintic polynomials have degrees, respectively, 1, 2, 3, 4, and 5.

Recall from linear algebra that a linear homogeneous system over a field with r equations and n unknowns has a nontrivial solution if r < n; if r = n, one must examine a determinant. If f(x) = (xα1)…(xαn) = ∑ rixi, then it is easy to see, by induction on n, that
rn−1=−∑ αi
rn−2=i<j αiαj
rn−3=−∑i<j<k αiαjαk
The problem of finding the roots αi of the polynomial f(x) from its coefficients ri is thus a question of solving a nonlinear system of n equations in n unknowns; we shall see that this problem is not “solvable by radicals” if n ≥ 5.

pg. 8

18. If aR, define ea : R[x] → R by f(x) = ∑ rixi ↦ ∑ riai (denote this element of R by f(a)); prove that ea is a ring map (it is called evaluation at a). If f(a) = 0, then a is called a root of f(x).

(This exercise allows one to regard x as a variable ranging over R; that is, each polynomial f(x) ∈ R(x) determines a function RR. But look at the next exercise.)

19. Give an example of distinct polynomials f(x), g(x) ∈ ℤp[x] with f(a) = g(a) for all a ∈ ℤp[x].

(Distinct polynomials (not all coefficients are the same) may determine the same function; this is one reason for our defining polynomials in such a formal way. Indeed, if F is any finite field (there are such other than ℤp), there are only finitely many functions FF but there are infinitely many polynomials. We shall see after Theorem 11 that this exercise is false if ℤp is replaced by any infinite field.)

pg. 17.

Theorem 11. If F is a field and f(x) ∈ F[x] has degree n, then F contains at most n roots of f(x).

This last theorem is false for arbitrary rings R; for example, x2 − 1 has four roots in ℤ8.

Recall that every polynomial f(x) ∈ F[x] determines a function FF, namely, af(a). In Exercise 19, however, we saw that distinct polynomials in ℤp[x] may determine the same function. This pathology vanishes when the coefficient field is infinite. Let F be an infinite field and let f(x) ≠ g(x) in F[x] satisfy f(a) = g(a) for all aF. Then h(x) = f(x)−g(x) is not the zero polynomial; hence it has a degree, say, n. But each of the infinitely many elements aF is a root of h(x), and this contradicts Theorem 11.