Acta Univ. Agric. Silvic. Mendelianae Brun. 2011, 59, 267-274

https://doi.org/10.11118/actaun201159020267
Published online 2014-07-07

Advanced empirical estimate of information value for credit scoring models

Martin Řezáč

Ústav matematiky a statistiky, Přírodovědecká fakulta, Masarykova univerzita, Kotlářská 2, 611 37 Brno, Česká republika

Credit scoring, it is a term for a wide spectrum of predictive models and their underlying techniques that aid financial institutions in granting credits. These methods decide who will get credit, how much credit they should get, and what further strategies will enhance the profitability of the borrowers to the lenders. Many statistical tools are avaiable for measuring quality, within the meaning of the predictive power, of credit scoring models. Because it is impossible to use a scoring model effectively without knowing how good it is, quality indexes like Gini, Kolmogorov-Smirnov statisic and Information value are used to assess quality of given credit scoring model.
The paper deals primarily with the Information value, sometimes called divergency. Commonly it is computed by discretisation of data into bins using deciles. One constraint is required to be met in this case. Number of cases have to be nonzero for all bins. If this constraint is not fulfilled there are some practical procedures for preserving finite results. As an alternative method to the empirical estimates one can use the kernel smoothing theory, which allows to estimate unknown densities and consequently, using some numerical method for integration, to estimate value of the Information value.
The main contribution of this paper is a proposal and description of the empirical estimate with supervised interval selection. This advanced estimate is based on requirement to have at least k, where k is a positive integer, observations of socres of both good and bad client in each considered interval. A simulation study shows that this estimate outperform both the empirical estimate using deciles and the kernel estimate. Furthermore it shows high dependency on choice of the parameter k. If we choose too small value, we get overestimated value of the Information value, and vice versa. Adjusted square root of number of bad clients seems to be a reasonable compromise.

References

10 live references