In statistics and information geometry, divergence or a contrast function is a function which establishes the "distance" of one probability distribution to the other on a statistical manifold. The divergence is a weaker notion than that of the distance, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality.
Suppose S is a space of all probability distributions with common support. Then a divergence on S is a function D(· || ·): S×S → R satisfying
The dual divergence D* is defined as
Many properties of divergences can be derived if we restrict S to be a statistical manifold, meaning that it can be parametrized with a finite-dimensional coordinate system θ, so that for a distribution p ∈ S we can write p = p(θ).
For a pair of points p, q ∈ S with coordinates θp and θq, denote the partial derivatives of D(p || q) as
Now we restrict these functions to a diagonal p = q, and denote
By definition, the function D(p || q) is minimized at p = q, and therefore
where matrix g(D) is positive semi-definite and defines a unique Riemannian metric on the manifold S.
Divergence D(· || ·) also defines a unique torsion-free affine connection ∇(D) with coefficients
and the dual to this connection ∇* is generated by the dual divergence D*.