BFGS in Optimization

1. What

Broyden-Fletcher-Goldfarb-Shanno

Quasi-Newton optimization algorithm
Gradient based: use first derivative of th objective function to improve the solution
differnce of different algorithm: how to update the parameters
first derivative
hessian matrix – BFGS
Maintain an approximation of the Hessian

2. How

Find the minimum, the gradient is known or calculated
$$
B_{k+1} = B_k + \frac{(\Delta \theta_k)(\Delta \theta_k)^T}{(\Delta \theta_k)^T \Delta g_k} - \frac{B_k \Delta g_k (\Delta g_k)^T B_k}{(\Delta g_k)^T B_k \Delta g_k}
$$

import numpy as np

def bfgs(f, grad_f, x0, epsilon=1e-6, max_iter=1000):
n = len(x0)
Bk = np.eye(n) # initialize Hessian approximation
xk = x0
gk = grad_f(xk)
for k in range(max_iter):
if np.linalg.norm(gk) < epsilon:
break
pk = -np.dot(Bk, gk)
alpha = backtracking_line_search(f, grad_f, xk, pk)
xk_next = xk + alpha*pk
sk = xk_next - xk
yk = grad_f(xk_next) - gk # update the second order derivative
Bk_next = Bk + np.outer(yk, yk)/np.dot(yk, sk) - np.outer(np.dot(Bk, sk), np.dot(Bk, sk))/np.dot(sk, np.dot(Bk, sk))
xk, gk, Bk = xk_next, grad_f(xk_next), Bk_next
return xk

3. Ref

Wiki

ChatGPT