🧭 Guided Exercise: Newton–Raphson Step for a Quadratic Function#

Let’s consider the function:

\[ f(x) = (x - 2)^2 - 2 = x^2 - 4x + 2 \]

This function has a stationary point at \( x_s = 2 \). In the language of geometry optimization, this would be an equilibrium geometry.

We’ll start at an initial guess \( x_0 = 0 \) and see how the Newton–Raphson method finds the optimal step toward the stationary point.


✏️ Step 1: Define the function and its derivatives#

Using the expressions given:

\[ f(x) = x^2 - 4x + 2, \quad f'(x) = 2x - 4, \quad f''(x) = 2 \]

👉 Your task: Complete the functions below.

# TODO: Define f(x), f'(x), and f''(x)

def f(x):
    """Return f(x) = x^2 - 4x + 2"""
    f_x = ### complete with appropriate code
    return f_x

def fprime(x):
    """Return the first derivative f'(x) = 2x - 4"""
    fp_x = ### complete with appropriate code
    return fp_x 

def fdoubleprime(x):
    """Return the second derivative f''(x) = 2"""
    fpp_x = ### complete with appropriate code
    return fpp_x

🔍 Step 2: Evaluate at the initial point#

We’ll start from \( x_0 = 0 \).

👉 Your task: Evaluate \( f(x_0) \), \( f'(x_0) \), and \( f''(x_0) \).

x0 = 1.0

# TODO: Compute f(x0), f'(x0), f''(x0)
f_x0 = # complete with appropriate function call
fp_x0 = # complete with appropriate function call
fpp_x0 =  # complete with appropriate function call

print(f"f({x0}) = {f_x0}")
print(f"f'({x0}) = {fp_x0}")
print(f"f''({x0}) = {fpp_x0}")

🧮 Step 3: Find the optimal step#

From the Taylor expansion, we know the Newton–Raphson step is:

\[ \Delta x = -\frac{f'(x_0)}{f''(x_0)} \]

👉 Your task: Compute \(\Delta x \) and the updated point \( x_1 = x_0 + \Delta x \). Do this by hand first and then complete the following code to do the computation with python. Compare your answers.

# TODO: Compute the Newton–Raphson step
dx = -fp_x0 / fpp_x0
x1 = x0 + dx

print(f"Newton-Raphson step Δx = {dx}")
print(f"Updated point x1 = {x1}")

✅ Question: Does \( x_1 \) match the true stationary point \( x_s = 2 \)?


💡 Discussion#

Because our function is perfectly quadratic, the Newton–Raphson step finds the stationary point in a single iteration.

In later examples, we’ll see what happens when \( f(x) \) isn’t purely quadratic.

🎨 Step 4: Visualizing the Newton–Raphson Step#

Now that we’ve computed the step \( \Delta x \), let’s visualize what’s happening.

We’ll plot the function \( f(x) = x^2 - 4x + 2 \), mark the initial point \( x_0 \), and show the tangent line scaled by the inverse of the second derivative at that point.

The Newton–Raphson step moves from \( x_0 \) to the point where this scaled tangent crosses the \( x \)-axix. For this quadratic case, this lands exactly on the stationary point \( x_s = 2 \).


👉 Run the cell below to see this plot.

import numpy as np
import matplotlib.pyplot as plt

# Define range of x values for plotting
x = np.linspace(-1, 4, 200)

# Compute function and tangent line values
y = f(x)

# Tangent line at x0:
# f(x_tangent) = f(x0) + f'(x0)*(x_tangent - x0)
y_tangent = f_x0 + fp_x0 / fpp_x0 * (x - x0)

# TODO: Create the plot showing f(x), tangent line, and key points
plt.figure(figsize=(6, 4))

# Plot the function
plt.plot(x, y, label=r"$f(x) = x^2 - 4x + 2$")

# Plot the tangent line
plt.plot(x, y_tangent, '--', label="Tangent scaled by 1 / f''(x_0) at $x_0$")

# Mark the initial point
plt.scatter(x0, f_x0, color='red', zorder=3, label=r"$x_0$")

# Mark the stationary point
plt.scatter(x1, f(x1), color='green', zorder=3, label=r"$x_1$ (stationary point)")

plt.xlabel("x")
plt.ylabel("f(x)")
plt.title("Newton–Raphson Step for a Quadratic Function")
plt.legend()
plt.grid(True)
plt.show()

🚀 Guided Exercise: Newton–Raphson Method for a Non-Quadratic Function#

In the previous example, we saw that Newton–Raphson reached the stationary point of a quadratic function in one step — because the function’s curvature was constant.

Now, let’s consider a function that is not exactly quadratic:

\[ f(x) = x^3 + x^2 - 5 \]

The stationary points are at \( x_s = 0 \) and \( x_s = -\tfrac{2}{3} \),
but imagine we don’t know these ahead of time.

We’ll start from an initial guess \( x_0 = 1 \) and try to find the stationary point iteratively.

# TODO: Define the new function and its derivatives

def f(x):
    """Return f(x) = x^3 + x^2 - 5"""
    return x**3 + x**2 - 5

def fprime(x):
    """Return f'(x) = 3x^2 + 2x"""
    return 3*x**2 + 2*x

def fdoubleprime(x):
    """Return f''(x) = 6x + 2"""
    return 6*x + 2

🔁 Step 1: Compute the Newton–Raphson update#

The update rule for finding stationary points is:

\[ x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)} \]

👉 Your task: Implement one iteration of the Newton–Raphson step starting from \( x_0 = 1 \).

x0 = 1.0

# Compute one iteration
x1 = # complete code with appropriate update

print(f"x0 = {x0:.4f}")
print(f"x1 = {x1:.4f}")

✅ Question:
How close is your updated value \( x_1\) to one of the true stationary points (\( x_s = 0 \) or \( x_s = -2/3 \))?

You should notice that one iteration isn’t enough this time because the function is not quadratic, so the curvature changes with \( x \).

🔄 Step 2: Iterate until convergence#

👉 Your task: Write a small loop to perform several Newton–Raphson iterations.

Stop when the change between iterations \(|x_{n+1} - x_n|\) becomes very small (e.g. less than \(10^{-6}\)).

# TODO: Implement an iterative Newton–Raphson loop

x = 1.0
tol = 1e-6
max_iter = 20

for i in range(max_iter):
    x_new = # complete with appropriate update to the variable x_new
    print(f"Iteration {i:2d}: x = {x:.6f}")
    if abs(x_new - x) < tol:
        print(f"Converged to stationary point at x = {x_new:.6f}")
        break
    x = x_new

🧭 Guided Exercise: Newton–Raphson Method in Two Dimensions#

In one dimension, we wrote the Newton–Raphson update for finding a stationary point as

\[ x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}. \]

In two dimensions (and higher), this generalizes to:

\[ \mathbf{x}_{n+1} = \mathbf{x}_n - \mathbf{H}^{-1}(\mathbf{x}_n) \, \nabla f(\mathbf{x}_n) \]

where

  • \( \nabla f(\mathbf{x}) \) is the gradient vector, and

  • \( \mathbf{H}(\mathbf{x}) \) is the Hessian matrix (matrix of second derivatives).

We’ll begin with a perfectly quadratic function where Newton’s method finds the minimum in a single step.

✏️ Step 1: Perfectly Quadratic 2D Function#

Let’s consider

\( f(x, y) = (x - 1)^2 + 2(y - 2)^2. \)

This function has its stationary point (and minimum) at \( (x_s, y_s) = (1, 2) \).

import numpy as np

# Define the function, gradient, and Hessian
def f(x, y):
    fxy = ### Complete with 2D function computing (x-1)^2 + 2(y-2)^2
    return fxy

def grad_f(x, y):
    """Gradient vector ∇f = [∂f/∂x, ∂f/∂y]"""
    dfdx = ### complete with partial first derivative of f(x,y) wrt x
    dfdy = ### complete with partial first derivative of f(x,y) wrt y
    return np.array([dfdx, dfdy])

def hessian_f(x, y):
    """Hessian matrix H = [[∂²f/∂x², ∂²f/∂x∂y], [∂²f/∂y∂x, ∂²f/∂y²]]"""
    d2fx2 = ### complete with ∂²f/∂x²
    d2fxy = ### complete with ∂²f/∂x∂y
    d2fyx = ### complete with ∂²f/∂y∂x
    d2fy2 = ### complete with ∂²f/∂y²
    
    return np.array([[d2fx2, d2fxy],
                     [d2fyx, d2fy2]])

🔁 Step 2: Perform one Newton–Raphson step#

Start from an initial point \(\mathbf{x}_0 = (0, 0) \) and compute one Newton update:

\[ \mathbf{x}_1 = \mathbf{x}_0 - \mathbf{H}^{-1}(\mathbf{x}_0) \nabla f(\mathbf{x}_0) \]
xy0 = np.array([0.0, 0.0])

# Compute gradient and Hessian
g = ## call function to compute grad of f(x,y) at point given by xy0 
H = ## call function to compute Hessian of f(x,y) at point given by xy0

# Newton update
x1 = xy0 - np.linalg.inv(H) @ g

print("x0 =", xy0)
print("Gradient =", g)
print("x1 =", x1)

✅ Question:
Does your single update step land exactly on the stationary point \( (1, 2) \)?
Why does it only take one step for this function?

🌊 Step 3: Newton–Raphson on a Non-Quadratic 2D Function#

Now let’s consider a non-quadratic function, where the curvature changes with position:

\[ f(x, y) = \sin(x) + \cos(y) \]

This function has stationary points where both partial derivatives vanish:

\[ \frac{\partial f}{\partial x} = \cos(x) = 0, \quad \frac{\partial f}{\partial y} = -\sin(y) = 0. \]

That means stationary points occur when \( x = \pi/2 + n\pi \) and \( y = n\pi \).

# Define the function, gradient, and Hessian for the sinusoidal surface
def f2(x, y):
    fxy = ### complete with f(x,y) = sin(x) + cos(y)
    return fxy

def grad_f2(x, y):
    dfdx = ### complete with dfdx
    dfdy = ### complete with dfdy
    return np.array([dfdx, dfdy])

def hessian_f2(x, y):
    d2fx2 = ### complete with ∂²f/∂x²
    d2fxy = ### complete with ∂²f/∂x∂y
    d2fyx = ### complete with ∂²f/∂y∂x
    d2fy2 = ### complete with ∂²f/∂y²
    
    return np.array([[d2fx2, d2fxy],
                     [d2fyx, d2fy2]])

🔄 Step 4: Iterate Newton–Raphson updates#

The next block will put these updated functions for \(f(x,y) = {\rm sin}(x) + {\rm cos}(y)\) starting from \( (x_0, y_0) = (2, 1) \) and apply a few Newton steps to find a nearby stationary point.

import numpy as np
x = np.array([2.0, 1.0])
tol = 1e-6
max_iter = 10

for i in range(max_iter):
    g = grad_f2(*x)
    H = hessian_f2(*x)
    step = np.linalg.solve(H, g)  # equivalent to inv(H) @ g but more stable
    x_new = x - step
    print(f"Iter {i:2d}: x = {x}, f(x) = {f2(*x):.4f}")
    if np.linalg.norm(x_new - x) < tol:
        print(f"Converged to stationary point: {x_new}")
        break
    x = x_new

✅ Questions:

  1. Which stationary point does the iteration converge to?

  2. What happens if you start from a different initial guess — e.g. \( (x_0, y_0) = (0, 0) \)?

  3. Does the algorithm always converge to a minimum, or can it find a maximum or saddle point?


💡 Discussion

In 2D, the Newton–Raphson method uses the Hessian matrix to locally approximate the curvature in all directions.
If the Hessian is positive definite, you’re near a minimum;
if it’s negative definite, you’re near a maximum;
and if it has both positive and negative eigenvalues, you’ve found a saddle point.


🎯 Challenge Extension

Implement a general function newton_2d(f, grad_f, hessian_f, x0, tol, max_iter)
that performs these updates automatically and returns:

  • the final point,

  • the number of iterations, and

  • the trajectory of points for visualization.

(You can, in principle, then plot the path of iterations on a contour plot of \( f(x, y) \)!)

⚙️ Step 5: Finite-Difference Gradients and Hessians#

So far, we’ve relied on analytical derivatives — but in many real problems,
we can’t easily compute \( \nabla f\) or \( \mathbf{H} \) analytically.

A common alternative is to use finite differences:

\[ \frac{\partial f}{\partial x_i} \approx \frac{f(x_i + h) - f(x_i - h)}{2h} \]

and

\[ \frac{\partial^2 f}{\partial x_i \partial x_j} \approx \frac{f(x_i + h, x_j + h) - f(x_i + h, x_j - h) - f(x_i - h, x_j + h) + f(x_i - h, x_j - h)}{4h^2}. \]

where \(h\) is some small step along the direction \(x_i\) or \(x_j\), as appropriate. This finite difference approximation typically becomes closer to the exact derivative as \(h\) approaches zero, so numerically we typically set \(h\) to some very small value like \(h = 0.001\) Angstroms for molecular geometries.

We’ll test this idea using the same function:

\[ f(x, y) = \sin(x) + \cos(y) \]

and compare our numerical gradients and Hessians to the analytical ones. Again, it will be adequate to set \(h\) to some very small value like \(h = 0.001\) for this comparison.

import numpy as np

def numerical_grad(f, x, h=1e-3):
    """
    Compute the numerical gradient (∇f) of a 2D function using central finite differences.

    Parameters
    ----------
    f : function
        A Python function that takes two arguments, e.g. f(x, y) = sin(x) + cos(y).
    x : array-like of shape (2,)
        The point [x, y] at which to evaluate the gradient.
    h : float, optional
        The finite difference step size (default is 1e-3).

    Returns
    -------
    grad : ndarray of shape (2,)
        The numerical gradient vector [∂f/∂x, ∂f/∂y].
    """

    # Initialize gradient vector with zeros (same dimension as x)
    grad = np.zeros_like(x)

    # Loop over each coordinate direction (x and y)
    for i in range(len(x)):
        # Create a step vector for perturbing one coordinate at a time
        step = np.zeros_like(x)
        step[i] = h

        # TODO: Use the central finite-difference formula
        ## fp = ### complete code here
        ## fm = ### complete code here
        # grad[i] = ### complete code here

    return grad


def numerical_hessian(f, x, h=1e-4):
    """
    Compute the numerical Hessian (matrix of second derivatives) of a 2D function 
    using central finite differences.

    Parameters
    ----------
    f : function
        A Python function that takes two arguments, e.g. f(x, y) = sin(x) + cos(y).
    x : array-like of shape (2,)
        The point [x, y] at which to evaluate the Hessian.
    h : float, optional
        The finite difference step size (default is 1e-4).

    Returns
    -------
    H : ndarray of shape (2, 2)
        The numerical Hessian matrix:
            [[∂²f/∂x², ∂²f/∂x∂y],
             [∂²f/∂y∂x, ∂²f/∂y²]]
    """

    # Initialize empty Hessian matrix
    n = len(x)
    H = np.zeros((n, n))

    # Double loop over indices (i, j)
    for i in range(n):
        for j in range(n):
            # Create small step vectors along the i-th and j-th directions
            ei = np.zeros_like(x)
            ej = np.zeros_like(x)
            ei[i] = h
            ej[j] = h

            # TODO: Use the 4-point central difference formula for second derivatives:
            # fpp = ### complete code here
            # fpm = ### complete code here
            # fmp = ### complete code here
            # fmm = ### complete code here
            #
            # H[i, j] = ### complete code here

    return H

✅ Your task:

  • Complete the formulas in each loop according to the provided comments.

  • Test your functions using f2(x, y) = np.sin(x) + np.cos(y) at a few points (e.g. [2, 1]).

  • Compare your numerical derivatives with the analytical ones to check accuracy.

💭 Hint:
Remember that the function f takes two separate arguments (x and y),
so when calling it inside your loop you’ll want to unpack the array x using *x.

⚛️ Step 7: Numerical Gradients from Quantum Chemistry Calculations#

So far, we’ve used simple mathematical functions like \( f(x, y) = \sin(x) + \cos(y) \) to test our numerical gradient and Hessian code.

Let’s now connect this idea to a real chemistry application: computing the potential energy surface of a molecule using ab initio methods.

In this example, we’ll use the PySCF package to compute the total electronic energy of CO as a function of the C–O bond length.

We’ll then see how we could use the same finite-difference formulas to estimate the numerical gradient (and eventually, the curvature) of that potential energy surface.

Installing pyscf#

You can install pyscf within this Jupypter session by running

!pip install pyscf

in the next cell. You may also install pyscf into your local conda environment, and once you do that, you can skip this step in the future.

!pip install pyscf

Running PySCF energy calculation#

The next cell will set up a template for the geometry of a simple diatomic molecule, CO, where the \(x\) value of the O atom is a variable. Note that since this is a diatomic, the internal geometry of the molecule depends only on the scalar distance between the C and O atoms (the bond length), so the energy can be seen as a 1 dimensional function of the bond length. Here, we will keep the \(x\), \(y\), and \(z\) coordinates of the C atom fixed at the origin, and the \(y\) and \(z\) coordinates of the O atom will be similarly fixed at the origin. Thus, the \(x\) coordinate of the O atom alone determines the bond length.

import numpy as np
from pyscf import gto, scf

# Conversion factor: Angstrom → Bohr (atomic units)
ang_to_au = 1.88973

# Template for the CO molecule
mol_template = """
C 0 0 0
O {} 0 0
"""

def compute_pyscf_energy(R_CO_angstrom):
    """
    Compute the RHF total energy (in Hartrees) for CO at a given C–O distance.

    Parameters
    ----------
    R_CO_angstrom : float
        The C–O bond length in Angstroms.

    Returns
    -------
    energy : float
        The Hartree–Fock total energy (in Hartree atomic units).
    """
    mol = gto.M(
        atom=mol_template.format(R_CO_angstrom),
        basis='cc-pvdz',
        verbose=0,
        spin=0
    )
    mf = scf.RHF(mol)
    mf.kernel()
    return mf.e_tot

⚙️ Step 1: Potential Energy Scan#

dx = 0.01  # step size in Angstroms
x = np.arange(0.8, 1.6, dx)

hf_energies = [compute_pyscf_energy(R) for R in x]

print("First few RHF energies (Hartree):")
print(hf_energies[:5])


# plot of energies vs bond length
plt.plot(x, hf_energies, label="RHF Energy")
plt.xlabel("Bond length (Angstroms)")
plt.ylabel("Energy (Hartree)")
plt.legend()
plt.show()

✅ Question:
What trend do you see in the energy values as the bond length changes?
Where roughly is the equilibrium bond length?

🧮 Step 2: Numerical Gradient from Energy Differences#

Now let’s numerically differentiate this energy function to estimate the force on the atoms, similar to what we did before but now we will call the compute_pyscf_energy() function!.

def numerical_gradient_energy(f, R, h=0.01):
    """
    Compute numerical derivative of energy with respect to bond length (force),
    using a central finite difference formula.

    Parameters
    ----------
    f : callable
        A function that returns the molecular energy at a given bond length.
    R : float
        Current bond length in Angstroms.
    h : float, optional
        Step size in Angstroms (default 0.01 Å).

    Returns
    -------
    dE_dR : float
        Numerical derivative of energy with respect to R (Hartree per Bohr)
        
        
    Note: PySCF and our compute_pyscf_energy function will take the geometry information in Angstroms, 
    and will return the energy in atomic units.  It is good practice to keep the gradient and Hessian in a common unit
    system, so we will want to convert the step length h from Angstroms to Bohr (where Bohr is the atomic unit of length)
    for the gradient and Hessian calculations.
    The conversion factor ang_to_au = 1.88973


    """

    # TODO: Use the central finite-difference formula
    E_plus = ### complete code here
    E_minus = ### complete code here

    # get h in atomic units for the denominator of the finite difference calculation
    ang_to_au = 1.88973
    h_au = h * ang_to_au
    dE_dR = ### complete code here
    return dE_dR

# Try evaluating at R = 1.2 Å
R0 = 1.2
grad_R = numerical_gradient_energy(compute_pyscf_energy, R0)
print(f"Numerical dE/dR at R = {R0:.2f} Å = {grad_R:.6f} Hartree/Bohr")

✅ Questions:

  1. At what bond length does your numerical gradient change sign?
    (That’s the equilibrium bond length!)

  2. How many energy evaluations does it take to compute this gradient?

  3. What would happen if you needed to compute gradients for all atomic coordinates in a polyatomic molecule?

🧮 Step 8: 1D Newton–Raphson Geometry Optimization of CO#

Now that we can compute the molecular energy \( E(R) \) and its numerical derivative \( \frac{dE}{dR} \), we can perform a simple geometry optimization.

For a diatomic molecule, this means finding the bond length \( R \) that minimizes the total energy.

We’ll use the 1D Newton–Raphson method, applied to the derivative of the potential energy: \( R_{n+1} = R_n - \frac{(dE/dR)}{(d^2E/dR^2)} \)

Since we don’t have an analytic expression for \( E(R) \), we’ll also approximate the second derivative (curvature) numerically using finite differences.

You will need to complete the second derivative function, but the Newton-Raphson optimization function that calls these functions is already complete for you below.

def numerical_second_derivative(f, R, h=0.01):
    """
    Compute the numerical second derivative of energy with respect to bond length
    using the central finite-difference formula.

    Parameters
    ----------
    f : callable
        Function returning molecular energy at a given bond length.
    R : float
        Current bond length (Angstroms).
    h : float, optional
        Step size for finite difference (default 0.01 Å).

    Returns
    -------
    d2E_dR2 : float
        Numerical second derivative (Hartree per Bohr²).

    Note Again: PySCF and our compute_pyscf_energy function will take the geometry information in Angstroms, 
    and will return the energy in atomic units.  It is good practice to keep the gradient and Hessian in a common unit
    system, so we will want to convert the step length h from Angstroms to Bohr (where Bohr is the atomic unit of length)
    for the gradient and Hessian calculations.
    The conversion factor ang_to_au = 1.88973
    """
    E_plus = ## complete code here
    E_minus = ## complete code here
    E0 = ## complete code here
    
    # get h in atomic units for the denominator of the finite difference calculation
    ang_to_au = 1.88973
    h_au = h * ang_to_au
    
    d2E_dR2 = ## complete code here
    return d2E_dR2
def newton_optimize_bond(f, R0, tol=1e-5, max_iter=10, h=0.01):
    """
    Perform a 1D Newton–Raphson optimization of a bond length.

    Parameters
    ----------
    f : callable
        Function returning molecular energy at a given bond length.
    R0 : float
        Initial bond length (Angstroms).
    tol : float
        Convergence tolerance on bond length change (Å).
    max_iter : int
        Maximum number of iterations.
    h : float
        Step size for numerical derivatives.

    Returns
    -------
    R_opt : float
        Optimized bond length (Å).
    E_opt : float
        Energy at optimized geometry (Hartree).

    Note: PySCF and our compute_pyscf_energy function will take the geometry information in Angstroms, 
    and will return the energy in atomic units.  It is good practice to keep the gradient and Hessian in a common unit
    system, so we will want to convert the step length h from Angstroms to Bohr (where Bohr is the atomic unit of length)
    for the gradient and Hessian calculations.  This means tht when we compute step = -grad / hess, this step will be in Bohr.
    We need to convert this step back to Angstroms before adding it to R, this time using the inverse of the conversion from Angstromg -> Bohr
    ang_to_au = 1.88973 -> au_to_ang = 1/1.88973
    """
    # conversion from Bohr to Angstroms
    au_to_ang = 1/1.88973
    R = R0
    print(f"Starting optimization from R = {R0:.3f} Å")

    for i in range(max_iter):
        grad = numerical_gradient_energy(f, R, h)
        hess = numerical_second_derivative(f, R, h)

        step = -grad / hess
        step_Angs = step * au_to_ang
        R_new = R + step

        print(f"Iter {i:2d}: R = {R:.4f} Å, E = {f(R):.6f} Ha, dE/dR = {grad:.3e}, step = {step_Angs:.3e}")

        if abs(step) < tol:
            print(f"\n✅ Converged to R = {R_new:.4f} Å, E = {f(R_new):.6f} Ha")
            return R_new, f(R_new)

        R = R_new

    print("\n⚠️ Did not converge within max_iter")
    return R, f(R)
R_start = 1.4  # initial guess in Å
R_opt, E_opt = newton_optimize_bond(compute_pyscf_energy, R_start)
✅ **Questions to Explore**
1. How many iterations did it take to converge?
2. Try changing the initial guess (`R_start = 0.8` or `1.8`).  
   Does Newton–Raphson still converge to the same bond length?
3. Why does the second derivative (the curvature) need to be positive at the minimum?

🌱 Toward Quasi-Newton (BFGS)#

Quasi-Newton methods like BFGS avoid explicit construction of the Hessian matrix, and instead approximate the Hessian information using gradient information. One common quasi-Newton method is called the BFGS algorithm, which you can read about here and here.

Next Steps Try your hand at adding a function BFGS_Update that will take in position and gradient information and return an approximate Hessian matrix following the BFGS algorithm. Some questions to consider include:

  1. How many positions vectors are needed (do you just need the current position vector, the current and one past position vectors, more?)

  2. How many gradient vectors are needed (do you just need the current gradient vector, the current and one past gradient vectors, more?)

  3. Are you directly building an approximation to the Hessian, or to the inverse of the Hessian? Why?