How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm

BIS Working Papers  |  No 834  | 
19 December 2019

Focus

Financial technology (fintech) is taking on an ever more important role in lending decisions. This paper compares the predictive power of credit scoring models based on machine learning techniques, as used by fintech companies, with that of traditional loss and default models typically used by banks.

Contribution

Using proprietary transaction-level data from a leading Chinese fintech company for the period between May and September 2017, we test the ability of different models to predict losses and defaults both in normal times and when the economy is subject to a shock. In particular, we analyse the case of an exogenous change in shadow banking regulation in China that caused lending to decline and credit conditions to deteriorate.

Findings

We find that the model based on machine learning and non-traditional data used by the fintech company is better able to predict losses and defaults than traditional models in the presence of a negative shock to the aggregate credit supply. One possible reason for this is that machine learning can better mine the non-linear relationship between variables in a period of stress. Finally, the comparative advantage of the model that uses the fintech credit scoring technique based on machine learning and big data tends to decline for borrowers with a longer credit history.


Abstract

This paper compares the predictive power of credit scoring models based on machine learning techniques with that of traditional loss and default models. Using proprietary transaction-level data from a leading fintech company in China for the period between May and September 2017, we test the performance of different models to predict losses and defaults both in normal times and when the economy is subject to a shock. In particular, we analyse the case of an (exogenous) change in regulation policy on shadow banking in China that caused lending to decline and credit conditions to deteriorate. We find that the model based on machine learning and non-traditional data is better able to predict losses and defaults than traditional models in the presence of a negative shock to the aggregate credit supply. One possible reason for this is that machine learning can better mine the non-linear relationship between variables in a period of stress. Finally, the comparative advantage of the model that uses the fintech credit scoring technique based on machine learning and big data tends to decline for borrowers with a longer credit history.

JEL codes: G17, G18, G23, G32

Keywords: fintech, credit scoring, non-traditional information, machine learning, credit risk