gingado: a machine learning library focused on economics and finance

BIS Working Papers  |  No 1122  | 
08 September 2023

Summary

Focus

Machine learning (ML) has tremendous potential benefits for economic research and practice, but only trial and error reveal the best choice from a wide variety of algorithms and model parameters. For time series studies, testing at scale which economic series could be added to improve models is currently not a smooth process. Economists studying cross-sectional causal studies could benefit from the ability to simulate large causal data sets to test which ML models identify complex, non-linear causality. Another challenge is that, in contrast to other areas where ML is applied, economists are not typically encouraged to document their models or consider their broader implications – an important point when ML is used by a range of users in the private and public sectors.

Contribution

I describe gingado, an easy-to-use open source ML library with five main contributions. First, it helps users automatically acquire hundreds of time series from a number of official statistical sources. Second, it offers automatic benchmark models based on random forests that perform well off the shelf in a variety of cases but can be easily customised by users. Third, gingado lets users flexibly simulate large and multidimensional panel data with linear and non-linear causal relationships in the data-generating processes. Fourth, it facilitates model documentation based on ML best practices, including ethical considerations about model usage when applicable. Fifth, it features auxiliary utilities for time series data analysis.

Findings

As more economists explore the practical applications of ML, they will benefit from a specialised toolset that builds on existing general-use libraries. gingado aims to fill that space by providing a set of tools that can be helpful for a variety of economic use cases, both independently or acting together. The library is compatible with widely used ML software, and users can either use it off the shelf or customise its tools. gingado is in active development, so new features are to be expected.


Abstract

gingado is an open source Python library that offers a variety of convenience functions and objects to support usage of machine learning in economics research. It is designed to be compatible with widely used machine learning libraries. gingado facilitates augmenting user datasets with relevant data directly obtained from official sources by leveraging the SDMX data and metadata sharing protocol. The library also offers a benchmarking object that creates a random forest with a reasonably good performance out-of-the-box and, if provided with candidate models, retains the one with the best performance. gingado also includes methods to help with machine learning model documentation, including ethical considerations. Further, gingado provides a flexible simulatation of panel datasets with a variety of non-linear causal treatment effects, to support causal model prototyping and benchmarking. The library is under active development and new functionalities are periodically added or improved.

JEL classification:  C87, C14, C82

Keywords: machine learning, open source, data access, documentation