A characterization of all single-integral, non-kernel divergence estimators.*

IEEE Transactions on Information Theory, 2019

with Ayanendranath Basu [Paper link]

Abstract

Divergence measures have been used for a long time for different purposes in information theory and statistics. In particular, density-based minimum divergence estimation is a popular tool in the statistical literature. Given the sampled data and a parametric model, we estimate the model parameter by choosing the member of the model family that is closest to the data distribution in terms of the given divergence. In the absolutely continuous set up, when the distributions from the model family and the unknown data generating distribution are assumed to have densities, the application of kernel based non-parametric smoothing is sometimes unavoidable to get an estimate of the true data density. The use of kernels (or other non-parametric smoothing techniques) makes the estimation process considerably more complex, as now one has to impose necessary conditions not just on the model but also on the kernel and its bandwidth. In higher dimensions the efficiency of the kernel density estimator (KDE) often becomes too low for the minimum divergence procedure to be practically useful. It can, therefore, lead to a significant advantage to have a divergence which allows minimum divergence estimation bypassing the use of non-parametric smoothing. For the same reason, characterizing the class of such divergences would be a notable achievement. In this work, we provide a characterization of the class of divergences that bypasses the use of non-parametric smoothing in the construction of divergences, providing a solution to this very important problem.