Concept

Given data (𝑑𝑖,𝑏𝑖)
define residual function:

π‘Ÿπ‘–=π‘π‘–βˆ’π‘“(𝑑𝑖,π‘₯)

objective function:

πœ™(π‘₯)=12π‘Ÿπ‘‡(π‘₯)π‘Ÿ(π‘₯)

gradient vector:

βˆ‡πœ™(π‘₯)=𝐽𝑇(π‘₯)π‘Ÿ(π‘₯)

where 𝐽 is Jacobian matrix of π‘Ÿ.
Hessian matrix:

π»πœ™=𝐽𝑇(π‘₯)𝐽(π‘₯)+βˆ‘π‘šπ‘–=1π‘Ÿπ‘–(π‘₯)𝐻𝑖(π‘₯)

However it’s expensive to expend such sum and since it multiple to π‘ŸπΌ which is very small that we can just ignore this whole sum term.