Description

1.Simple answers are NOT sufficient, you need to show all the relevant work

2. The exercises with a * require the use of a computer.(What sofware you use it up to you)

3. More information, notes will be provided after I confirm you

Please see the file I provided first, make sure you can do it before you choose this work

I will send you a message before I confirm you, please check it

3 attachmentsSlide 1 of 3attachment_1attachment_1attachment_2attachment_2attachment_3attachment_3

Unformatted Attachment Preview

榧

㽰

MAT 3380: Intro. to Numerical Methods

Winter 2022

Diane Guignard

HOMEWORK 1

MNNNSO

• This assignment is due Friday, January 20

21 on Brightspace. It covers Sections 3.1, 3.2

and part of 3.3.

• Simple answers are NOT sufficient, you need to show all the relevant work to receive

full credit.

烑NNONNNNNNNh懳NNrts•

• Please write the following information at the top of the front page: your full name, the

course number (MAT 3380) and the number of the assignment.

• The exercises with a * require the use of a computer. For Exercises 1 and 2, you can

use the function bisection.m provided on Brightspace. Attach the code you wrote

to your report.

Exercise 1⇤

Let f (x) = (x + 2)(x + 1)2 x(x 1)3 (x 2). To which zero of f does the bisection method

converge when applied on the following intervals?

a.

[ 1.5, 2.5]

b.

[ 0.5, 2.4]

c.

[ 0.5, 3]

d.

[ 3, 0.5].

Exercise 2⇤

p

Use the bisection method on [1, 2] to find an approximation of 3 correct to within 10 4 .

Indicate which function f you used and report the values of x0 , x1 and x2 , the final output and

the number of iterations.

Exercise 3⇤

Let f (x) = x2

a.

2x

xk =

3. To find a root of f , the following three fixed point method are proposed

3

xk

1

b.

2

xk = x2k

1

xk

1

3

c.

xk =

x2k

2xk

+3

.

2

1

1

For each method, compute (if possible) the iterates x1 , x2 , x3 and x4 starting from x0 = 0.

Report the values you obtain in a table. Which methods seem to be appropriate? Among those,

which one seems to converge the fastest?

Exercise 4

Prove Corollary 3.4 of the lecture notes (see Section 3.3), namely the estimates

|xk

and

x⇤ | ⇢k max{x0

|xk

x⇤ |

⇢k

1

⇢

|x1

a, b

x0 |,

x0 },

k

k

0,

1.

Exercise 5

Consider the function

1p

3

x+8

3

i) Show that g has a unique fixed point in [0, 1].

ii) Assuming that we start from x0 = 12 , find a bound for the number of fixed point

iterations needed to achieve 10 6 accuracy.

g(x) =

1

i

i

i

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

i

Chapter 3

Nonlinear Equations in One

Variable

This chapter is concerned with finding solutions to the scalar, nonlinear equation

f (x) = 0,

where the variable x runs in an interval [a, b]. The topic provides us with an opportunity to discuss

various issues and concepts that arise in more general circumstances.

There are many canned routines that do the job of finding a solution to a nonlinear scalar

equation; the one in MATLAB is called fzero; type help fzero to see how this function is

used and what input parameters it requires.

Following an extended introduction to our topic in Section 3.1 are three sections, 3.2–3.4, each

devoted to a different method or a class of methods for solving our nonlinear equation. A closely

related problem is that of minimizing a function in one variable, and this is discussed in Section 3.5.

3.1 Solving nonlinear equations

Referring to our prototype problem introduce above, f (x) = 0, let us further assume that the function

f is continuous on the interval, denoted f ∈ C[a, b]. Throughout our discussion we denote a solution

of the equation (called root, or zero) by x ∗ .

Note: Why introduce nonlinear equations before introducing their easier comrades, the

linear ones?! Because one linear equation in one unknown is just too easy and not particularly illuminating, and systems of equations bring a wave of complications with them. We

have a keen interest in solving scalar nonlinear equations not only because such problems

arise frequently in many applications, but also because it is an opportunity to discuss various issues and concepts that arise in more general circumstances and highlight ideas that

are extensible well beyond just one equation in one unknown.

Example 3.1. Here are a few simple functions and their roots.

1. f (x) = x − 1, on the interval [a, b] = [0, 2].

Obviously there is one root for this linear equation: x ∗ = 1.

39

i

i

i

i

i

i

i

i

40

Chapter 3. Nonlinear Equations in One Variable

f(x)

0

−1

0

2

4

6

8

10

12

14

x

f(x)

4000

2000

0

−2000

0

2

4

6

8

10

12

14

16

18

20

2

4

6

8

10

x

100

f(x)

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

1

50

0

−10

−8

−6

−4

−2

0

x

Figure 3.1. Graphs of three functions and their real roots (if there are any): (i) f (x) =

sin(x) on [0, 4π ], (ii) f (x) = x 3 − 30x 2 + 2552 on [0, 20], and (iii) f (x) = 10 cosh(x/4) − x on

[−10, 10].

2. f (x) = sin(x).

Since sin(nπ ) = 0 for any integer n, we have

∗

(a) On the interval [a, b] = [ π2 , 3π

2 ] there is one root, x = π .

(b) On the interval [a, b] = [0, 4π ] there are five roots; see Figure 3.1.

3. f (x) = x 3 − 30x 2 + 2552,

0 ≤ x ≤ 20.

In general, a cubic equation with complex coefficients has three complex roots. But if the

polynomial coefficients are real and x is restricted to be real and lie in a specific interval, then

there is no general a priori rule as to how many (real) roots to expect. A rough plot of this

function on the interval [0, 20] is given in Figure 3.1. Based on the plot we suspect there is

precisely one root x ∗ in this interval.

4. f (x) = 10 cosh(x/4) − x,

−∞ < x < ∞,
where the function cosh is defined by cosh(t) =
et +e−t
2 .
Not every equation has a solution. This one has no real roots.
Figure 3.1 indicates that the function φ(x) = 10 cosh(x/4) − x has a minimum. To find the
minimum we differentiate and equate to 0. Let f (x) = φ (x) = 2.5 sinh(x/4)−1 (where, analogously
t
−t
to cosh, we define sinh(t) = e −e
2 ). This function should have a zero in the interval [0, 4]. For more
on finding a function’s minimum, see Section 3.5.
i
i
i
i
i
i
i
i
3.1. Solving nonlinear equations
41
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
Example 3.2. Here is the MATLAB script that generates Figure 3.1.
t = 0:.1:4*pi;
tt = sin(t);
ax = zeros(1,length(t));
xrt = 0:pi:4*pi;
yrt = zeros(1,5);
subplot(3,1,1)
plot(t,tt,’b’,t,ax,’k’,xrt,yrt,’rx’);
xlabel(’x’)
ylabel(’f(x)’)
t = 0:.1:20;
tt = t.^3 - 30*t.^2 + 2552;
ax = zeros(1,length(t));
subplot(3,1,2)
plot(t,tt,’b’,t,ax,’k’,11.8615,0,’rx’);
xlabel(’x’)
ylabel(’f(x)’)
t = -10:.1:10;
tt = 10 * cosh(t ./4) - t;
ax = zeros(1,length(t));
subplot(3,1,3)
plot(t,tt,’b’,t,ax,’k’);
xlabel(’x’)
ylabel(’f(x)’)
This script should not be too difficult to read like text. Note the use of array arguments, for
instance, in defining the array tt in terms of the array t.
Iterative methods for finding roots
It is not realistic to expect, except in special cases, to find a solution of a nonlinear equation by using
a closed form formula or a procedure that has a finite, small number of steps. Indeed, it is enough
to consider finding roots of polynomials to realize how rare closed formulas are: they practically
exist only for very low order polynomials. Thus, one has to resort to iterative methods: starting with
an initial iterate x0 , the method generates a sequence of iterates x1 , x2 , . . . , xk , . . . that (if all works
well) converge to a root of the given, continuous function. In general, our methods will require
rough knowledge of the root’s location. To find more than one root, we can fire up the same method
starting from different initial iterates x0 .
To find approximate locations of roots we can roughly plot the function, as done in Example 3.1. (Yes, it sounds a little naive, but sometimes complicated things can be made simple by one
good picture.) Alternatively, we can probe the function, i.e., evaluate it at several points, looking for
locations where f (x) changes sign.
If f (a) · f (b) < 0, i.e., f changes sign on the interval [a, b], then by the Intermediate Value
Theorem given on page 10 there is a number c = x ∗ in this interval for which f (c) = s = 0. To see
this intuitively, imagine trying to graph a scalar function from a positive to a negative value, without
lifting the pen (because the function is continuous): somewhere the curve has to cross the x-axis!
Such simple procedures for roughly locating roots are unfortunately not easy to generalize to
several equations in several unknowns.
i
i
i
i
i
i
i
i
42
Chapter 3. Nonlinear Equations in One Variable
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
Stopping an iterative procedure
Typically, an iterative procedure starts with an initial iterate x0 and yields a sequence of iterates
x1 , x2 , . . . , xk , . . . . Note that in general we do not expect the iterative procedure to produce the exact
solution x ∗ exactly. We would conclude that the series of iterates converges if the values of | f (xk )|
and/or of |xk − xk−1 | decrease towards 0 sufficiently fast as the iteration counter k increases.
Correspondingly, one or more of the following general criteria are used to terminate such an
iterative process successfully after n iterations:
|xn − xn−1 | < atol and/or
|xn − xn−1 | < rtol|xn | and/or
| f (xn )| < ftol,
where atol, rtol, and ftol are user-specified constants.
Usually, but not always, the second, relative criterion is more robust than the first, absolute
one. A favorite combination uses one tolerance value tol, which reads
|xn − xn−1 | < tol(1 + |xn |).
The third criterion is independent of the first two; it takes the function value into account. The
function f (x) may in general be very flat, or very steep, or neither, near its root.
Desired properties of root finding algorithms
When assessing the qualities of a given root finding algorithm, a key property is its efficiency. In
determining an algorithm’s efficiency it is convenient to concentrate on the number of function
evaluations, i.e., the evaluations of f (x) at the iterates {xk }, required to achieve convergence to a
given accuracy. Other details of the algorithm, which may be considered as overhead, are then
generally neglected.5 Now, if the function f (x) is as simple to evaluate as those considered in
Example 3.1, then it is hard to understand why we concentrate on this aspect alone. But in these
circumstances any of the algorithms considered in this chapter is very fast indeed. What we really
keep in the back of our minds is a possibility that the evaluation of f (x) is rather costly. For instance,
think of simulating a space shuttle returning to earth, with x being a control parameter that affects
the distance f (x) of the shuttle’s landing spot from the location of the reception committee awaiting
this event. The calculation of f (x) for each value of x involves a precise simulation of the flight’s
trajectory for the given x, and it may then be very costly. An algorithm that does not require too
many such function evaluations is then sought.
Desirable qualities of a root finding algorithm are the following:
• Efficient—requires a small number of function evaluations.
• Robust—fails rarely, if ever. Announces failure if it does fail.
• Requires a minimal amount of additional data such as the function’s derivative.
• Requires f to satisfy only minimal smoothness properties.
• Generalizes easily and naturally to many equations in many unknowns.
No algorithm we are aware of satisfies all of these criteria. Moreover, which of these is more
important to honor depends on the application. So we study several possibilities in the following
sections.
5 An exception is the number of evaluations of the derivative f (x) required, for instance, by Newton’s method. See
Section 3.3.
i
i
i
i
i
i
i
i
3.2. Bisection method
43
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
3.2 Bisection method
The method developed in this section is simple and safe and requires minimal assumptions on the
function f (x). However, it is also slow and hard to generalize to higher dimensions.
Suppose that for a given f (x) we know an interval [a, b] where f changes sign, i.e., f (a) ·
f (b) < 0. The Intermediate Value Theorem given on page 10 then assures us that there is a root x ∗
such that a ≤ x ∗ ≤ b. Now evaluate f ( p), where p = a+b
2 is the midpoint, and check the sign of
f (a) · f ( p). If it is negative, then the root is in [a, p] (by the same Intermediate Value Theorem), so
we can set b ← p and repeat; else f (a) · f ( p) is positive, so the root must be in [ p, b], hence we can
set a ← p and repeat. (Of course, if f (a) · f ( p) = 0 exactly, then p is the root and we are done.)
In each such iteration the interval [a, b] where x ∗ is trapped shrinks by a factor of 2; at the kth
step, the point xk is the midpoint p of the kth subinterval trapping the root. Thus, after a total of n
−n
iterations, |x ∗ − xn | ≤ b−a
2 · 2 . Therefore, the algorithm is guaranteed to converge. Moreover, if
required to satisfy
|x ∗ − xn | ≤ atol
for a given absolute error tolerance atol > 0, we may determine the number of iterations n needed

−n ≤ atol. Multiplying both sides by 2n /atol and taking log, we see that it

by demanding b−a

2 ·2

takes

b−a

n = log2

2 atol

iterations to obtain a value p = xn that satisfies

|x ∗ − p| ≤ atol.

The following MATLAB function does the job:

function [p,n] = bisect(func,a,b,fa,fb,atol)

%

% function [p,n] = bisect(func,a,b,fa,fb,atol)

%

% Assuming fa = func(a), fb = func(b), and fa*fb < 0,
% there is a value root in (a,b) such that func(root) = 0.
% This function returns in p a value such that
%
| p - root | < atol
% and in n the number of iterations required.
% check input
if (a >= b) | (fa*fb >= 0) | (atol a and

g(b) < b. Then for the continuous function
φ(x) = g(x) − x
we have φ(a) > 0, φ(b) < 0. (Note that φ does not have to coincide with the function f that we have
started with; there are lots of φ’s for a given f .) Hence, by the Intermediate Value Theorem given
on page 10, just as before, there is a root a < x ∗ < b such that φ(x ∗ ) = 0. Thus, g(x ∗ ) = x ∗ , so x ∗
is a fixed point. We have established the existence of a root: f (x ∗ ) = 0.
Next, suppose that g is not only continuous but also differentiable and that there is a positive
number ρ < 1 such that
|g (x)| ≤ ρ,
a < x < b.
Then the root x ∗ is unique in the interval [a, b], for if there is also y ∗ which satisfies y ∗ = g(y ∗ ),
then
|x ∗ − y ∗ | = |g(x ∗ ) − g(y ∗ )| = |g (ξ )(x ∗ − y ∗ )| ≤ ρ|x ∗ − y ∗ |,
where ξ is an intermediate value between x ∗ and y ∗ . Obviously, this inequality can hold with ρ < 1
only if y ∗ = x ∗ .
We can summarize our findings so far as the Fixed Point Theorem.
i
i
i
i
i
i
i
i
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
3.3. Fixed point iteration
47
Theorem: Fixed Point.
If g ∈ C[a, b], g(a) ≥ a and g(b) ≤ b, then there is a fixed point x ∗ in the interval [a, b].
If, in addition, the derivative g exists and there is a constant ρ < 1 such that the
derivative satisfies
|g (x)| ≤ ρ ∀ x ∈ (a, b),
then the fixed point x ∗ is unique in this interval.
Convergence of the fixed point iteration
Turning to the fixed point iteration and the third question on the preceding page, similar arguments
establish convergence (now that we know that there is a unique solution): under the same assumptions we have
|xk+1 − x ∗ | = |g(xk ) − g(x ∗ )| ≤ ρ|xk − x ∗ |.
This is a contraction by the factor ρ. So
|xk+1 − x ∗ | ≤ ρ|xk − x ∗ | ≤ ρ 2 |xk−1 − x ∗ | ≤ · · · ≤ ρ k+1 |x0 − x ∗ |.
Since ρ < 1, we have ρ k → 0 as k → ∞. This establishes convergence of the fixed point iteration.
Example 3.4. For the function g(x) = e−x on [0.2, 1.2], Figure 3.2 shows the progression of iterates
∗
towards the fixed point x ∗ satisfying x ∗ = e−x .
1.4
1.2
y=x
1
0.8
0.6
0.4
g(x)=e−x
0.2
0
0
x1
0.2
0.4
x2
0.6
x0
0.8
1
1.2
1.4
x
x2 = e
Figure 3.2. Fixed point iteration for x = e−x , starting from x0 = 1. This yields x1 = e−1 ,
, . . . . Convergence is apparent.
−e−1
Note the contraction effect.
To answer the question about the speed of convergence, it should be clear from the bounds
derived before Example 3.4 that the smaller ρ is, the faster the iteration converges. In case of the
i
i
i
i
i
i
i
i
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
48
Chapter 3. Nonlinear Equations in One Variable
bisection method, the speed of convergence does not depend on the function f (or any g, for that
matter), and it is in fact identical to the speed of convergence obtained with ρ ≡ 1/2. In contrast, for
the fixed point iteration, there is dependence on |g (x)|. This, and the (negative) answer to our fifth
question on page 46, are demonstrated next.
Example 3.5. Consider the function
f (x) = α cosh(x/4) − x,
where α is a parameter. For α = 10 we saw in Example 3.1 that there is no root. But for α = 2 there
are actually two roots. Indeed, setting
g(x) = 2 cosh(x/4)
and plotting g(x) and x as functions of x, we obtain Figure 3.3, which suggests not only that there
are two fixed points (i.e., roots of f )—let’s call them x1∗ and x2∗ —but also that we can bracket them,
say, by
2 ≤ x1∗ ≤ 4, 8 ≤ x2∗ ≤ 10.
Next we apply our trusted routine bisect, introduced in the previous section, to f (x) with α = 2.
This yields
x1∗ ≈ 2.35755106, x2∗ ≈ 8.50719958,
correct to an absolute tolerance 1.e-8, in 27 iterations for each root. (You should be able to explain
why precisely the same number of iterations was required for each root.)
14
12
10
8
6
y=2cosh(x/4)
y=x
4
2
0
0
1
2
3
4
5
x
6
7
8
9
10
Figure 3.3. The functions x and 2 cosh(x/4) meet at two locations.
i
i
i
i
i
i
i
i
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
3.3. Fixed point iteration
49
Now, it is very tempting, and rather natural here, to use the same function g defined above in
a fixed point iteration, thus defining
xk+1 = 2 cosh(xk /4).
For the first root, on the interval [2, 4] we have the conditions of the Fixed Point Theorem
holding. In fact, near x1∗ , |g (x)| < 0.4, so we expect faster convergence than with the bisection
method (why?). Indeed, starting from x0 = 2 the method requires 16 iterations, and starting from
x0 = 4 it requires 18 iterations to get to within 1.e-8 of the root x1∗ .
For the second root, however, on the interval [8, 10] the conditions of the Fixed Point Theorem
do not hold! In particular, |g (x2∗ )| > 1. Thus, a fixed point iteration using this g will not converge

to the root x2∗ . Indeed, starting with x0 = 10 the iteration diverges quickly, and we obtain overflow

after 3 iterations, whereas starting this fixed point iteration with x0 = 8 we do obtain convergence,

but to x1∗ rather than to x2∗ .

It is important to realize that the root x2∗ is there, even though the fixed point iteration with the

“natural g” does not converge to it. Not everything natural is good for you! There are other choices

of g for the purpose of fixed point iteration that perform better here.

Note: A discussion of rates of convergence similar to that appearing here is also given in

Section 7.3. There it is more crucial, because here we will soon see methods that converge

faster than a rate of convergence can suitably quantify.

Rate of convergence

Suppose now that at a particular root of a given fixed point iteration, ρ = |g (x ∗ )| satisfies 0 < ρ < 1.
Starting with x0 sufficiently close to x ∗ we can write xk − x ∗ ≈ g (x ∗ )(xk−1 − x ∗ ). Hence we get
|xk − x ∗ | ≈ ρ|xk−1 − x ∗ | ≈ · · · ≈ ρ k |x0 − x ∗ |.
To quantify the speed of convergence in terms of ρ we can ask, how many iterations does it take to
reduce the error by a fixed factor, say, 10?
To answer this we set |xk − x ∗ | ≈ 0.1|x0 − x ∗ |, obtaining
ρ k ≈ 0.1.
Taking log10 of both sides yields k log10 ρ ≈ −1. Let us define the rate of convergence as
rate = − log10 ρ.
(3.1)
Then it takes about k = 1/rate iterations to reduce the error by more than an order of magnitude. Obviously, the smaller ρ the higher the convergence rate and correspondingly, the smaller the
number of iterations required to achieve the same error reduction effect.
Example 3.6. The bisection method is not exactly a fixed point iteration, but it corresponds to an
iterative method of a similar sort with ρ = 0.5. Thus its convergence rate according to (3.1) is
rate = − log10 0.5 ≈ 0.301.
This yields k = 4, and indeed the error reduction factor with this many bisections is 16, which is
more than 10.
i
i
i
i
i
i
i
i
50
Chapter 3. Nonlinear Equations in One Variable
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
For the root x1∗ of Example 3.5 we have ρ ≈ 0.3121 and
rate = − log10 0.3121 ≈ 0.506.
Here it takes only two iterations to roughly reduce the error by a factor of 10 if starting close to x ∗ .
For the root x2∗ of Example 3.5 we obtain from (3.1) a negative rate of convergence, as is
appropriate for a case where the method does not converge.
Of course, it makes no sense to calculate ρ or the convergence rate so precisely as in Example 3.6: we did this only so that you can easily follow the algebra with a calculator. Indeed,
such accurate estimation of ρ would require knowing the solution first! Moreover, there is nothing
magical about an error reduction by a particularly precise factor. The rate of convergence should
be considered as a rough indicator only. It is of importance, however, to note that the rate becomes
negative for ρ > 1, indicating no convergence of the fixed point iteration, and that it becomes infinite

for ρ = 0, indicating that the error reduction per iteration is faster than by any constant factor. We

encounter such methods next.

Specific exercises for this section: Exercises 3–4.

3.4 Newton’s method and variants

The bisection method requires the function f to be merely continuous (which is good) and makes

no further use of further information on f such as availability of its derivatives (which causes it

to be painfully slow at times). At the other end of the scale is Newton’s method, which requires

more knowledge and smoothness of the function f but which converges much faster in appropriate

circumstances.

Newton’s method

Newton’s method is the most basic fast method for root finding. The principle we use below to

derive it can be directly extended to more general problems.

Assume that the first and second derivatives of f exist and are continuous: f ∈ C 2 [a, b].

Assume also that f can be evaluated with sufficient ease. Let xk be a current iterate. By Taylor’s

expansion on page 5 we can write

f (x) = f (xk ) + f (xk )(x − xk ) + f (ξ (x))(x − xk )2 /2,

where ξ (x) is some (unknown) point between x and xk .

Now, set x = x ∗ , for which f (x ∗ ) = 0. If f were linear, i.e., f ≡ 0, then we could find

the root by solving 0 = f (xk ) + f (xk )(x ∗ − xk ), yielding x ∗ = xk − f (xk )/ f (xk ). For a nonlinear

function we therefore define the next iterate by the same formula, which gives

xk+1 = xk −

f (xk )

,

f (xk )

k = 0, 1, 2, . . . .

This corresponds to neglecting the term f (ξ (x ∗ ))(x ∗ − xk )2 /2 when defining the next iterate; however, if xk is already close to x ∗ , then (x ∗ − xk )2 is very small, so we would expect xk+1 to be much

closer to x ∗ than xk is.

A geometric interpretation of Newton’s method is that xk+1 is the x-intercept of the tangent

line to f at xk ; see Figure 3.4.

i

i

i

i

i

i

i

i

3.4. Newton’s method and variants

51

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

1.5

1

0.5

0

x1

x2

x0

−0.5

−1

−1

−0.8

−0.6

−0.4

−0.2

0

x

0.2

0.4

0.6

0.8

1

Figure 3.4. Newton’s method: the next iterate is the x-intercept of the tangent line to f at

the current iterate.

Algorithm: Newton’s Method.

Given a scalar differentiable function in one variable, f (x):

1. Start from an initial guess x0 .

2. For k = 0, 1, 2, . . . , set

xk+1 = xk −

f (xk )

,

f (xk )

until xk+1 satisfies termination criteria.

Example 3.7. Consider the same function as in Example 3.5, given by

f (x) = 2 cosh(x/4) − x.

The Newton iteration here is

2 cosh(xk /4) − xk

.

0.5 sinh(xk /4) − 1

We use the same absolute tolerance of 1.e-8 and the same four initial iterates as in Example 3.5.

xk+1 = xk −

• Starting from x0 = 2 requires 4 iterations to reach x1∗ to within the given tolerance.

• Starting from x0 = 4 requires 5 iterations to reach x1∗ to within the given tolerance.

• Starting from x0 = 8 requires 5 iterations to reach x2∗ to within the given tolerance.

• Starting from x0 = 10 requires 6 iterations to reach x2∗ to within the given tolerance.

i

i

i

i

i

i

i

i

52

Chapter 3. Nonlinear Equations in One Variable

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

The values of f (xk ), starting from x0 = 8, are displayed below:

k

0

1

2

3

4

5

f (xk )

−4.76e-1

8.43e-2

1.56e-3

5.65e-7

7.28e-14

1.78e-15

This table suggests (indirectly) that the number of significant digits essentially doubles at each iteration. Of course, when roundoff level is reached, no meaningful improvement can be obtained upon

heaping more floating point operations, and the improvement from the 4th to the 5th iteration in this

example is minimal.

Speed of convergence

How fast does a nonlinear iteration converge, if it does? Let us assume that the method indeed

converges and that xk is already “close enough” to the root x ∗ . We define convergence orders as

follows. The method is said to be

• linearly convergent if there is a constant ρ < 1 such that
|xk+1 − x ∗ | ≤ ρ|xk − x ∗ |
for all k sufficiently large;
• quadratically convergent if there is a constant M such that
|xk+1 − x ∗ | ≤ M|xk − x ∗ |2
for all k sufficiently large;
• superlinearly convergent if there is a sequence of constants ρk → 0 such that
|xk+1 − x ∗ | ≤ ρk |xk − x ∗ |
for all k sufficiently large.
Note that for quadratic convergence we can set ρk = M|xk − x ∗ | →k→∞ 0; thus the quadratic order
implies superlinear order. Superlinear convergence may require less than this but more than linear
convergence.
Higher convergence orders (e.g., cubic) may be defined similarly to the quadratic order. However, normally there is really no practical reason to consider such higher convergence orders: methods of this sort would typically require more work per iteration, and roundoff error sets in quickly
even for quadratically convergent methods, as we have seen in Example 3.7.
For Newton’s method we have already noted that in the derivation a quadratic term was
dropped. Indeed, it turns out that if f ∈ C 2 [a, b] and there is a root x ∗ in [a, b] such that f (x ∗ ) =
0, f (x ∗ ) = 0, then there is a number δ such that, starting with x0 from anywhere in the neighborhood
[x ∗ − δ, x ∗ + δ], Newton’s method converges quadratically. See Exercise 9.
How is the general fixed point iteration xk+1 = g(xk ), considered in the previous section,
related to Newton’s method? We have seen that the speed of convergence may strongly depend on
the choice of g (recall Example 3.5, for instance). It is not difficult to see (Exercise 3) that if, in
addition to the assumptions that guarantee convergence in the Fixed Point Theorem, also g (x ∗ ) = 0,
then the method converges linearly. In this case the size of |g (x ∗ )| matters and this is quantified
by the rate of convergence, defined in Section 3.3. The method may converge faster than linearly
if g (x ∗ ) = 0. This is precisely the case with Newton’s method. Here g(x) = x − f (x)/ f (x), and
i
i
i
i
i
i
i
i
3.4. Newton’s method and variants
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
2
53
hence g = 1 − ( f () f− )2f f = ( ff f )2 . Now, since x = x ∗ is the root of f it follows immediately that
g (x ∗ ) = 0.
Newton’s method does have two disadvantages: (i) the need to know not only that the derivative f exists but also how to evaluate it, and (ii) the local nature of the method’s convergence. The
first of these two aspects is addressed (or, rather, circumvented) in what follows.
Secant method
The requirement that not only the function f but also its derivative f be supplied by the user of
Newton’s method is at times simple to satisfy, as in our Example 3.7. But at other instances it can
be a drag. An extra effort is required by the user, and occasionally the evaluation of f can be much
more costly than the evaluation of f at a given argument value. Moreover, sometimes the derivative
is simply not available, for example, in certain experimental setups where it is possible to evaluate
only the function itself.
The secant method is a variant of Newton’s method, where f (xk ) is replaced by its finite
difference approximation based on the evaluated function values at xk and at the previous iterate
xk−1 . Assuming convergence, observe that near the root
f (xk ) ≈
f (xk ) − f (xk−1 )
.
xk − xk−1
(A similar formula was used for different purposes in Example 1.2.) Substitution of this approximation into the formula for Newton’s method yields the Secant method, also referred to as quasiNewton, given by
xk+1 = xk −
f (xk )(xk − xk−1 )
,
f (xk ) − f (xk−1 )
k = 1, 2, . . . .
Algorithm: Secant Method.
Given a scalar differentiable function in one variable, f (x):
1. Start from two initial guesses x0 and x1 .
2. For k = 1, 2, . . . , set
xk+1 = xk −
f (xk )(xk − xk−1 )
f (xk ) − f (xk−1 )
until xk+1 satisfies termination criteria.
It is worthwhile to compare at this point the Secant algorithm to the Newton algorithm on
page 51.
Example 3.8. Consider again the same function as in Example 3.5, given by
f (x) = 2 cosh(x/4) − x.
We apply the secant method with the same stopping criterion and the same tolerance as in Examples
3.5 and 3.7. Note that the secant method requires two initial iterates, x0 and x1 .
• Starting from x0 = 2 and x1 = 4 requires 7 iterations to reach x1∗ to within the given tolerance.
• Starting from x0 = 10 and x1 = 8 requires 7 iterations to reach x2∗ to within the given tolerance.
i
i
i
i
i
i
i
i
54
Chapter 3. Nonlinear Equations in One Variable
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
The values of f (xk ), starting from x0 = 10 and x1 = 8, are displayed below:
k
0
1
2
3
4
5
6
f (xk )
2.26
−4.76e-1
−1.64e-1
2.45e-2
−9.93e-4
−5.62e-6
1.30e-9
This table suggests that the number of significant digits increases more rapidly as we get nearer
to the root (i.e., better than a linear order), but not as fast as with Newton’s method. Thus we observe
indeed a demonstration of superlinear convergence.
Theorem: Convergence of the Newton and Secant Methods.
If f ∈ C 2 [a, b] and there is a root x ∗ in [a, b] such that f (x ∗ ) = 0, f (x ∗ ) = 0, then there
is a number δ such that, starting with x0 (and also x1 in case of the secant method) from
anywhere in the neighborhood [x ∗ − δ, x ∗ + δ], Newton’s method converges quadratically
and the secant method converges superlinearly.
This theorem specifies the conditions under which Newton’s method is guaranteed to converge
quadratically, while the secant method is guaranteed to converge superlinearly.
Constructing secant-like modifications for Newton’s method for the case of systems of equations is more involved but possible and practically useful; we will discuss this in Section 9.2.
The case of a multiple root
The fast local convergence rate of both Newton and secant methods is predicated on the assumption
that f (x ∗ ) = 0. What if f (x ∗ ) = 0? This is the case of a multiple root. In general, if f (x) can be
written as
f (x) = (x − x ∗ )m q(x),
where q(x ∗ ) = 0, then x ∗ is a root of multiplicity m. If m > 1, then obviously f (x ∗ ) = 0. As it

turns out, Newton’s method may still converge, but only at a linear (rather than quadratic) rate.

Example 3.9. For the monomial f (x) = x m , m > 1, we get

x

f (x)

= .

f (x) m

Newton’s method reads

xk+1 = xk −

m −1

xk

=

xk .

m

m

Since the root is x ∗ = 0, we can somewhat trivially write

|xk+1 − x ∗ | =

m −1

|xk − x ∗ |,

m

so the method is linearly convergent, with the contraction factor ρ = m−1

m . The corresponding rate

,

and

it

is

not

difficult

to

calculate

these values and see that

of convergence is rate = − log10 m−1

m

they deteriorate as m grows. If f (x ∗ ) = 0, i.e., m = 2, then ρ = 0.5. See Example 3.6.

i

i

i

i

i

i

i

i

3.5. Minimizing a function in one variable

55

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

Globalizing Newton’s method

Both Newton’s method and the secant method are guaranteed to converge under appropriate conditions only locally: there is the assumption in the theorem statement that the initial iterate x0 is

already “close enough” to an isolated root. This is in contrast to the situation for the bisection

method, which is guaranteed to converge provided only that a bracketing interval is found on which

f changes sign. The δ-neighborhood for which Newton’s method is guaranteed to converge may be

large for some applications, as is the case for Example 3.7, but it may be small for others, and it is

more involved to assess ahead of time which is the case for a given problem.

A natural idea, then, is to combine two methods into a hybrid one. We start with a rough plot

or probe of the given function f in order to bracket roots. Next, starting from a given bracket with

f (x0 ) · f (x1 ) < 0, initiate a Newton or a secant method, monitoring the iterative process by requiring
sufficient improvement at each iteration. This “sufficient improvement” can, for instance, be that
| f (xk+1 )| < 0.5| f (xk )| or the bisection guaranteed improvement ratio |xk+1 − xk | < 0.5|xk − xk−1 |.
If a Newton or secant iteration is applied but there is no sufficient improvement, then we conclude
that the current iterate must be far from where the superior fire power of the fast method can be
utilized, and we therefore return to x0 and x1 , apply a few bisection iterations to obtain new x0 and
x1 which bracket the root x ∗ more tightly, and restart the Newton or secant routine.
This hybrid strategy yields a robust solver for scalar nonlinear equations; see Exercise 13.
Convergence and roundoff errors
Thus far in this chapter we have scarcely noticed the possible existence of roundoff errors. But rest
assured that roundoff errors are there, as promised in Chapter 2. They are simply dominated by
the convergence errors, so long as the termination criteria of our iterative methods are well above
rounding unit, and so long as the problem is well-conditioned. In the last entry of the table in
Example 3.7, the convergence error is so small that roundoff error is no longer negligible. The result
is certainly not bad in terms of error magnitude, but the quadratic convergence pattern (which does
not hold for roundoff errors, barring a miracle) is destroyed.
Our essential question here is, given that the error tolerances are well above rounding unit in
magnitude, do we ever have to worry about roundoff errors when solving a nonlinear equation? The
answer is that we might, if the problem is such that roundoff error accumulates significantly.
When could ill-conditioning occur for the problem of finding roots of a scalar nonlinear function? This can happen if the function f (x) is very flat near its root, because small changes to f may
then significantly affect the precise location of the root. Exercise 2 illustrates this point. The general
case is that of a multiple root.
Specific exercises for this section: Exercises 5–12.
3.5 Minimizing a function in one variable
A major source of applications giving rise to root finding is optimization. In its one-variable version
we are required to find an argument x = x̂ that minimizes a given objective function φ(x).6 For a
simple instance, we have already briefly examined in Example 3.1 the minimization of the function
φ(x) = 10 cosh(x/4) − x
over the real line.
6 We arbitrarily concentrate on finding minimum rather than maximum points. If a given problem is naturally
formulated as finding a maximum of a function ψ(x), say, then define φ(x) = −ψ(x) and consider minimizing φ.
i
i
i
i
i
i
i
i
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
56
Chapter 3. Nonlinear Equations in One Variable
Another, general, instance is obtained from any root finding or fixed point problem, g(x) = x,
by setting
φ(x) = [g(x) − x]2 .
But here we concentrate on cases where the given objective is not necessarily that of root finding.
Note that in general φ(x ∗ ) = 0.
Conditions for a minimum point
Assume that φ ∈ C 2 [a, b], i.e., that φ and its first two derivatives are continuous on a given interval
[a, b]. Denote
f (x) = φ (x).
An argument x ∗ satisfying a < x ∗ < b is called a critical point if
f (x ∗ ) = 0.
Now, for a parameter h small enough so that x ∗ + h ∈ [a, b] we can expand in Taylor’s series on
page 5 and write
φ(x ∗ + h) = φ(x ∗ ) + hφ (x ∗ ) +
= φ(x ∗ ) +
h 2 ∗
φ (x ) + · · ·
2
h 2 ∗
[φ (x ) + O(h)].
2
(See the discussion on page 7 regarding order O notation.) Since |h| can be taken arbitrarily small,
it is now clear that at a critical point:
• If φ (x ∗ ) > 0, then x̂ = x ∗ is a local minimizer of φ(x). This means that φ attains minimum

at x̂ = x ∗ in some neighborhood which includes x ∗ .

• If φ (x ∗ ) < 0, then x ∗ is a local maximizer of φ(x). This means that φ attains maximum at
x̂ = x ∗ in some neighborhood which includes x ∗ .
• If φ (x ∗ ) = 0, then a further investigation at x ∗ is required.
Algorithms for function minimization
It is not difficult to see that if φ(x) attains a minimum (or maximum) at a point x̂, then this point
must be critical, i.e., f (x̂) = 0. Thus, the problem of finding all minima of a given function φ(x)
can be solved by finding all the critical roots7 and then checking for each if it is a minimum by
examining the sign of the second derivative of φ.
Example 3.10. For the function φ(x) = 10 cosh(x/4) − x we have
10
sinh(x/4) − 1,
4
10
φ (x) = f (x) =
cosh(x/4).
16
φ (x) = f (x) =
7 Note
that for quadratic convergence of Newton’s method, φ(x) must have three continuous derivatives.
i
i
i
i
i
i
i
i
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
3.5. Minimizing a function in one variable
57
Since φ (x) > 0 for all x, we can have only one minimum point. (Think of how the curve

must look like if we had more than one minimum.) Applying any of the algorithms described in

Sections 3.2–3.4, we obtain the unique minimum point as the root of f (x), given by

x̂ ≈ 1.5601412791.

At this point, φ(x̂) ≈ 9.21018833518615.

Example 3.11. For an example with many critical points, consider φ(x) = cos(x).

The critical points are where f (x) = φ (x) = − sin(x) = 0; thus

x ∗ = jπ ,

Since

j = 0, ±1, ±2, . . . .

φ (x ∗ ) = − cos( jπ ) = (−1) j+1 ,

j = 0, ±1, ±2, . . . ,

we have minimum points where j is odd, i.e., at every second critical point: the local minimizers

are

x̂ = (2l + 1)π , l = 0, ±1, ±2, . . . .

The other critical points are maximum points.

For simple problems this could mark the end of this section. However, for problems where

function evaluations are really dear we note that there are two imperfections in the simple procedure

outlined above:

• We are investing a considerable effort into finding all critical points, even those that end up

being local maxima. Is there a way to avoid finding such unwanted points?

• No advantage is taken of the fact that we are minimizing φ(x). Can’t we gauge how good an

iterate xk is by checking φ(xk )?

These objections become more of an issue when considering the minimization of functions of several

variables. Nonetheless, let us proceed to quickly outline some general ideas with Newton’s method

on page 51 for the scalar case.

The kth iteration of Newton’s method applied to f (x) = φ (x) can be written here as

xk+1 = xk + αk ξk , where

φ (xk )

ξk = −

, αk = 1.

φ (xk )

Now, if φ (xk ) > 0 and the step size αk > 0 is small enough, then

φ(xk+1 ) ≈ φ(xk ) + αk ξk φ (xk ) = φ(xk ) − αk φ (xk )2 /φ (xk ) < φ(xk ).
Therefore, we have a decrease in φ, i.e., a step in the right direction.
Thus, it is possible to design a procedure where we insist that φ(x) decrease in each iteration.
This would prevent convergence to critical points which are local maxima rather than minima. In
such a procedure we monitor φ (xk ), replacing it by max{φ (xk ), }, where > 0 is a parameter

chosen not large. We then select a step size αk by trying αk = 1 first and decreasing it if necessary

until a sufficient decrease is obtained in φ(xk+1 ) relatively to φ(xk ). We shall have more to say about

this sort of procedure when considering minimizing functions in several variables.

i

i

i

i

i

i

i

i

58

Chapter 3. Nonlinear Equations in One Variable

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

Local and global minimum

The procedures outlined above find local minima. The problem of finding a global minimizer, i.e.,

a point x̂ such that φ(x̂) ≤ φ(x) for all x ∈ [a, b], is significantly more difficult in general. In the

worst situations, where not much is known about the objective function φ, a suitable algorithm may

involve a complete enumeration, i.e., finding all local minima and then comparing them. At the

other end of the scale, if the objective function is known to be convex, which means that

φ(θ x + (1 − θ )y) ≤ θ φ(x) + (1 − θ )φ(y) ∀ 0 ≤ θ ≤ 1

for any two points x, y ∈ [a, b], then there is a unique minimum and the problems of finding local and

global minima coincide. Such is the instance considered in Example 3.10, but not that considered in

Example 3.11.

Specific exercises for this section: Exercises 21–23.

3.6 Exercises

0. Review questions

(a) What is a nonlinear equation?

(b) Is the bisection method (i) efficient? (ii) robust? Does it (iii) require a minimal amount

of additional knowledge? (iv) require f to satisfy only minimum smoothness properties?

(v) generalize easily to several functions in several variables?

(c) Answer similar questions for the Newton and secant methods.

(d) State at least one advantage and one disadvantage of the recursive implementation of the

bisection method over the iterative nonrecursive implementation.

(e) In what way is the fixed point iteration a family of methods, rather than just one method

like bisection or secant?

(f) What is the basic condition for convergence of the fixed point iteration, and how does

the speed of convergence relate to the derivative of the iteration function g?

(g) Suppose a given fixed point iteration does not converge: does this mean that there is

no root in the relevant interval? Answer a similar question for the Newton and secant

methods.

(h) State at least two advantages and two disadvantages of Newton’s method.

(i) What are order of convergence and rate of convergence, and how do they relate?

(j) State at least one advantage and one disadvantage of the secant method over Newton’s.

(k) In what situation does Newton’s method converge only linearly?

(l) Explain the role that roundoff errors play in the convergence of solvers for nonlinear

equations, and explain their relationship with convergence errors.

(m) State a similarity and a difference between the problem of minimizing a function φ(x)

and that of solving the nonlinear equation φ (x) = 0.

(n) State what a convex function is, and explain what happens if an objective function is

convex.

1. Apply the bisection routine bisect to find the root of the function

√

f (x) = x − 1.1

starting from the interval [0, 2] (that is, a = 0 and b = 2), with atol = 1.e-8.

i

i

i

i

i

i

i

i

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

3.6. Exercises

59

(a) How many iterations are required? Does the iteration count match the expectations,

based on our convergence analysis?

(b) What is the resulting absolute error? Could this absolute error be predicted by our convergence analysis?

2. Consider the polynomial function8

f (x) = (x − 2)9

= x 9 − 18x 8 + 144x 7 − 672x 6 + 2016x 5 − 4032x 4 + 5376x 3 − 4608x 2

+ 2304x − 512.

(a) Write a MATLAB script which evaluates this function at 161 equidistant points in the

interval [1.92, 2.08] using two methods:

i. Apply nested evaluation (cf. Example 1.4) for evaluating the polynomial in the

expanded form x 9 − 18x 8 + · · · .

ii. Calculate (x − 2)9 directly.

Plot the results in two separate figures.

(b) Explain the difference between the two graphs.

(c) Suppose you were to apply the bisection routine from Section 3.2 to find a root of this

function, starting from the interval [1.92, 2.08] and using the nested evaluation method,

to an absolute tolerance 10−6 . Without computing anything, select the correct outcome:

i. The routine will terminate with a root p satisfying | p − 2| ≤ 10−6 .

ii. The routine will terminate with a root p not satisfying | p − 2| ≤ 10−6 .

iii. The routine will not find a root.

Justify your choice in one short sentence.

3. Consider the fixed point iteration xk+1 = g(xk ), k = 0, 1, . . . , and let all the assumptions of

the Fixed Point Theorem hold. Use a Taylor’s series expansion to show that the order of

convergence depends on how many of the derivatives of g vanish at x = x ∗ . Use your result

to state how fast (at least) a fixed point iteration is expected to converge if g (x ∗ ) = · · · =

g (r ) (x ∗ ) = 0, where the integer r ≥ 1 is given.

3

4. Consider the function g(x) = x 2 + 16

.

(a) This function has two fixed points. What are they?

(b) Consider the fixed point iteration xk+1 = g(xk ) for this g. For which of the points you

have found in (a) can you be sure that the iterations will converge to that fixed point?

Briefly justify your answer. You may assume that the initial guess is sufficiently close to

the fixed point.

(c) For the point or points you found in (b), roughly how many iterations will be required to

reduce the convergence error by a factor of 10?

√

5. Write a MATLAB script for computing the cube root of a number, x = 3 a, with only basic

arithmetic operations using Newton’s method, by finding a root of the function f (x) = x 3 −

a. Run your program for a = 0, 2, 10. For each of these cases, start with an initial guess

reasonably close to the solution. As a stopping criterion, require the function value whose

root you are searching to be smaller than 10−8 . Print out the values of xk and f (xk ) in each

iteration. Comment on the convergence rates and explain how they match your expectations.

8 This beautiful example is inspired by Demmel [21]. We gave a modified version of it as an exam question once.

Unfortunately, not everyone thought it was beautiful.

i

i

i

i

i

i

i

i

60

Chapter 3. Nonlinear Equations in One Variable

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

6.

(a) Derive a third order method for solving f (x) = 0 in a way similar to the derivation

of Newton’s method, using evaluations of f (xn ), f (xn ), and f (xn ). The following

remarks may be helpful in constructing the algorithm:

• Use the Taylor expansion with three terms plus a remainder term.

• Show that in the course of derivation a quadratic equation arises, and therefore two

distinct schemes can be derived.

(b) Show that the order of convergence (under the appropriate conditions) is cubic.

(c) Estimate the number of iterations and the cost needed to reduce the initial error by a

factor of 10m .

(d) Write a script for solving the problem of Exercise 5. To guarantee that your program

does not generate complex roots, make sure to start sufficiently close to a real root.

(e) Can you speculate what makes this method less popular than Newton’s method, despite

its cubic convergence? Give two reasons.

7. Consider Steffensen’s method

xk+1 = xk −

where

g(x) =

f (xk )

,

g(xk )

k = 0, 1, . . . ,

f (x + f (x)) − f (x)

.

f (x)

(a) Show that in general the method converges quadratically to a root of f (x).

(b) Compare the method’s efficiency to the efficiency of the secant method.

√

8. It is known that the order of convergence of the secant method is p = 1+2 5 = 1.618 . . . and

that of Newton’s method is p = 2. Suppose that evaluating f costs approximately α times

the cost of approximating f . Determine approximately for what values of α Newton’s method

is more efficient (in terms of number of function evaluations) than the secant method. You

may neglect the asymptotic error constants in your calculations. Assume that both methods

are starting with initial guesses of a similar quality.

9. This exercise essentially utilizes various forms of Taylor’s expansion and relies on expertise

in calculus.

(a) Prove that if f ∈ C 2 [a, b] and there is a root x ∗ in [a, b] such that f (x ∗ ) = 0, f (x ∗ ) = 0,

then there is a number δ such that, starting with x0 from anywhere in the neighborhood

[x ∗ − δ, x ∗ + δ], Newton’s method converges quadratically.

(b) This is more challenging: prove the same conclusions under the same assumptions,

except that now f is only assumed to have a first Lipschitz continuous derivative. Thus,

while f may not exist everywhere in [a, b], there is a constant γ such that for any

x, y ∈ [a, b]

| f (x) − f (y)| ≤ γ |x − y|.

10. The function

f (x) = (x − 1)2 e x

has a double root at x = 1.

i

i

i

i

i

i

i

i

Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

3.6. Exercises

61

(a) Derive Newton’s iteration for this function. Show that the iteration is well-defined so

long as xk = −1 and that the convergence rate is expected to be similar to that of the

bisection method (and certainly not quadratic).

(b) Implement Newton’s method and observe its performance starting from x0 = 2.

(c) How easy would it be to apply the bisection method? Explain.

11. In general, when a smooth function f (x) has a multiple root at x ∗ , the function

ψ(x) =

f (x)

f (x)

has a simple root there.

(a) Show that a Newton method for ψ(x) yields the iteration

xk+1 = xk −

f (xk ) f (xk )

.

[ f (xk )]2 − f (xk ) f (xk )

(b) Give one potential advantage and two potential disadvantages for this iteration as compared to Newton’s method for f (x).

(c) Try this method on the problem of Exercise 10. What are your observations?

12. Given a > 0, we wish to compute x = ln a using addition, subtraction, multiplication, division,

and the exponential function, e x .

(a) Suggest an iterative formula based on Newton’s method, and write it in a way suitable

for numerical computation.

(b) Show that your formula converges quadratically.

(c) Write down an iterative formula based on the secant method.

(d) State which of the secant and Newton’s methods is expected to perform better in this

case in terms of overall number of exponential function evaluations. Assume a fair

comparison, i.e., same floating point system, “same quality” initial guesses, and identical

convergence criterion.

13. Write a MATLAB program to find all the roots of a given, twice continuously differentiable,

function f ∈ C 2 [a, b].

Your program should first probe the function f (x) on the given interval to find out where it

changes sign. (Thus, the program has, in addition to f itself, four other input arguments: a,

b, the number npr obe of equidistant values between a and b at which f is probed, and a

tolerance tol.)

For each subinterval [ai , bi ] over which the function changes sign, your program should then

find a root as follows. Use either Newton’s method or the secant method to find the root,

monitoring decrease in | f (xk )|. If an iterate is reached for which there is no sufficient decrease

(e.g., if | f (xk )| ≥ 0.5| f (xk−1 )|), then revert back to [ai , bi ], apply three bisection steps and

restart the Newton or secant method.

The ith root is deemed “found” as xk if both

|xk − xk−1 | < tol(1 + |xk |) and
| f (xk )| < tol
hold.
i
i
i
i
i
i
i
i
62
Chapter 3. Nonlinear Equations in One Variable
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
Verify your program by finding the two roots (given in Example 3.5) of the function
f (x) = 2 cosh(x/4) − x,
starting your search with [a, b] = [0, 10] and npr obe = 10. Then use your program to answer
other questions in this section where relevant.
14. Find all the roots of the function
f (x) =
sin(x)
x ,
1,
x = 0,
x = 0,
in the interval [−10, 10] for tol = 10−7 .
[This function is special enough to have a name. It is called the sinc function.]
15. For x > 0 consider the equation

x + ln x = 0.

It is a reformulation of the equation of Example 3.4.

(a) Show analytically that there is exactly one root, 0 < x ∗ < ∞.
(b) Plot a graph of the function on the interval [0.1, 1].
(c) As you can see from the graph, the root is between 0.5 and 0.6. Write MATLAB
routines for finding the root, using the following:
i. The bisection method, with the initial interval [0.5, 0.6]. Explain why this choice of
the initial interval is valid.
ii. A linearly convergent fixed point iteration, with x0 = 0.5. Show that the conditions
of the Fixed Point Theorem (for the function g you have selected) are satisfied.
iii. Newton’s method, with x0 = 0.5.
iv. The secant method, with x0 = 0.5 and x1 = 0.6.
For each of the methods:
• Use |xk − xk−1 | < 10−10 as a convergence criterion.
• Print out the iterates and show the progress in the number of correct decimal digits
throughout the iteration.
• Explain the convergence behavior and how it matches theoretical expectations.
16. We saw in Example 3.5 that the function
f (x) = α cosh(x/4) − x
has two roots for α = 2 and none for α = 10. Is there an α for which there is precisely one
root? If yes, then find such an α and the corresponding root; if not, then justify.
17. The derivative of the sinc function is given by
f (x) =
x cos(x) − sin(x)
.
x2
(a) Show that near x = 0, this function can be approximated by
f (x) ≈ −x/3.
The error in this approximation gets smaller as x approaches 0.
(b) Find all the roots of f in the interval [−10, 10] for tol = 10−8 .
i
i
i
i
i
i
i
i
3.6. Exercises
63
0.8
0.6
0.4
f
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
1
0.2
0
−0.2
−0.4
−0.6
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x
Figure 3.5. Graph of an anonymous function; see Exercise 18.
18. Consider finding the root of a given nonlinear function f (x), known to exist in a given interval
[a, b], using one of the following three methods: bisection, Newton, and secant. For each of
the following instances, one of these methods has a distinct advantage over the other two.
Match problems and methods and justify briefly.
(a) f (x) = x − 1 on the interval [0, 2.5].
(b) f (x) is given in Figure 3.5 on [0, 4].
(c) f ∈ C 5 [0.1, 0.2], the derivatives of f are all bounded in magnitude by 1, and f (x) is
hard to specify explicitly or evaluate.
19. You are required by a computer manufacturer to write a library function for a given floating
point system to find the cube root y 1/3 of any given positive number y. Any such relevant
floating point number can be represented as y = a × 2e , where a is a normalized fraction
(0.5 ≤ a < 1) and e is an integer exponent. This library function must be very efficient and
it should always work. For efficiency purposes it makes sense to store some useful constants
ahead of computation time, e.g., the constants 21/3 , 23 , and a/3, should these prove useful.
(a) Show how y 1/3 can be obtained, once a 1/3 has been calculated for the corresponding
fraction, in at most five additional flops.
(b) Derive the corresponding Newton iteration. What is the flop count per iteration?
(c) How would you choose an initial approximation? Roughly how many iterations are
needed? (The machine rounding unit is 2−52 .)
[You might find this exercise challenging.]
i
i
i
i
i
i
i
i
Downloaded 01/15/22 to 137.122.8.73 Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms
64
Chapter 3. Nonlinear Equations in One Variable
20. Suppose that the division button of your calculator has stopped working, and you have addition, subtraction, and multiplication only. Given a real number b = 0, suggest a quadratically convergent iterative formula to compute b1 , correct to a user-specified tolerance. Write
a MATLAB routine that implements your algorithm, using |xk − xk−1 | < 10−10 as a convergence criterion, and apply your algorithm to b = π (that is, we compute π1 ), with two different
initial guesses: (a) x0 = 1 ; and (b) x0 = 0.1.
Explain your results.
21. Write a MATLAB program to minimize a smooth, scalar function in one variable φ(x) over a
given interval. The specifications are similar to those in Exercise 13. Your program should first
find and display all critical points, i.e., zeros of φ (x). Then it should determine which of these
correspond to a local minimum by checking the sign of φ , and display the local minimum
points. Finally, it should determine the global minimum point(s), value, and arguments by
minimizing over the local minimum values.
22. Apply your program from Exercise 21 to find the (unique) global minimum of the function
φ(x) = cos(x) + x/10 over the interval [−10, 10]. What are x̂ and φ(x̂)?
23. Use your program from Exercise 21 to minimize
(a) φ(x) = sin(x) and
(b) φ(x) = −sinc(x) = − sin(x)
x
over the interval [−10, 10] with tol = 10−8 .
You might find that determining the global minimum of the second function is significantly
more challenging, even though the global minimum for the first one is not unique. Explain
the source of difficulties and how you have addressed them.
[Exercises 17 and 14 are relevant here.]
3.7 Additional notes
Just about any introductory text on numerical methods or numerical analysis covers the topics introduced in this chapter, although the goals and the manner of these treatments may vary significantly.
For different flavors, see Heath [39], Cheney and Kincaid [12], and Burden and Faires [11], as well
as the classics Conte and de Boor [13] and Dahlquist and Björck [15].
For us this chapter is mainly an opportunity to introduce and demonstrate, within a simple
context, several basic notions and issues arising also in much more complex numerical computations.
We have thus avoided methods that do not seem to shed much light on other problems as well.
There is a (somewhat more complicated) analogue to the bisection method for function minimization. This classical algorithm is called golden section search and is described, together with
extensions, in Press et al. [59]; alternatively, see the crisper Heath [39].
i
i
i
i
Purchase answer to see full
attachment
Tags:
mathematics
function
methods
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

## Reviews, comments, and love from our customers and community:

This page is having a slideshow that uses Javascript. Your browser either doesn't support Javascript or you have it turned off. To see this page as it is meant to appear please use a Javascript enabled browser.