Concept

the first part is same as Steepest Descent Method, but after find the new , we calculate next instead of finding steepest descent by

and

x0 = initial guess
g0 = \nabla f(x0)
s0 = g0
for k = 0, 1, 2,...
	Choose αk to minimize f(xk + αk sk)
	xk+1 = xk + αk sk
	gk+1 = \nabla f(xk+1)
	βk+1 = (g^Tk+1 gk+1)/(g^Tk gk)
	sk+1 = gk+1 + βk+1sk
end

Conjugate Gradient Method is only different with Steepest Descent Method with , if we set constantly equal to , it will be same as Steepest Descent Method.

Cost

unlike Newton’s method, this one doesn’t need to calculate Hessian matrix nor , it only need to calculate gradient and line search.

Code

import numpy as np
from typing import Callable, Tuple
 
def backtracking_line_search(  
	f: Callable[[np.ndarray], float],  
	grad_f: Callable[[np.ndarray], np.ndarray],  
	x: np.ndarray,  
	d: np.ndarray, # direction, usually - \nabla grad_f 
	alpha_init: float = 1.0,  
	rho: float = 0.5, # back trace factor  
	c: float = 1e-4 # Armijo cond 
) -> float:
	alpha = alpha_init  
	f_x = f(x)  
	grad_dot_d = np.dot(grad_f(x), d) # \nabla f(x)^Td  
	
	# Armijo cond: f(x + α*d) \leqd f(x) + c*α* \nabla f(x)^Td  
	while f(x + alpha * d) > f_x + c * alpha * grad_dot_d:  
		alpha *= rho # backtrace
		 
	return alpha
 
def conjugate_gradient(
    f: Callable[[np.ndarray], float],
    grad_f: Callable[[np.ndarray], np.ndarray],
    x0: np.ndarray,
    max_iter: int = 1000,
    tol: float = 1e-6
) -> Tuple[np.ndarray, float, int]:
	x = x0.copy()
	n = len(x)
	
	g = grad_f(x)
	d = -g.copy()
	
	iterations = 0
	
	for i in range(max_iter):
		if np.linalg.norm(g) < tol:
			break
		
		alpha = backtracking_line_search(f, grad_f, x, d)
		x_new = x + alpha * d
		g_new = grad_f(x_new)
		beta = np.dot(g_new, g_new) / np.dot(g, g)
		d = -g_new + beta * d
		x = x_new
		g = g_new
		
		iterations += 1
		
		# reset direction
		if (i + 1) % n == 0:
			d = -g.copy()
		
	return x, f(x), iterations