Synthetic controls with machine learning: application on the effect of labour deregulation on worker productivity in Brazil

BIS Working Papers  |  No 1181  | 
26 April 2024



Evaluation of economic policy requires careful econometric analysis. One technique is the estimation of a "synthetic control", ie a counterfactual that enables a "what if" analysis. Synthetic controls are widely used but require subjective choices, such as the criteria for selecting a comparison group and evaluating the control quality. This technique can facilitate the evaluation of structural reforms, such as the 2017 landmark labour deregulation in Brazil, on worker productivity.


Well-selected machine learning algorithms can make synthetic controls more data-driven and flexible. Clustering algorithms select the comparison group based purely on data. Supervised models such as random forests take this group and estimate the synthetic control in a flexible way. And manifold learning methods show more objectively whether the control is indeed a good approximation to the original data. To illustrate the technique, an empirical application studies the outcomes for worker productivity of a structural labour deregulatory reform in Brazil. The results can inform similar efforts to boost productivity in other countries.


The combination of specific machine learning techniques can contribute to making estimation of policy effects more objective and data-driven. The open source BIS machine learning library gingado offers a practical end-to-end implementation. As for the policy under evaluation, the analysis shows that the Brazilian labour deregulation of 2017 did not have a noticeable effect on worker productivity, even years after the reform was implemented. This underscores the challenge of durably raising productivity levels, even with structural reforms.


Synthetic control methods are a data-driven way to calculate counterfactuals from control individuals for the estimation of treatment effects in many settings of empirical importance. In canonical implementations, this weighting is linear and the key methodological steps of donor pool selection and covariate comparison between the treated entity and its synthetic control depend on some degree of subjective judgment. Thus current methods may not perform best in settings with large datasets or when the best synthetic control is obtained by a nonlinear combination of donor pool individuals. This paper proposes "machine controls", synthetic controls based on automated donor pool selection through clustering algorithms, supervised learning for flexible non-linear weighting of control entities and manifold learning to confirm numerically whether the synthetic control indeed resembles the target unit. The machine controls method is demonstrated with the effect of the 2017 labour deregulation on worker productivity in Brazil. Contrary to policymaker expectations at the time of enactment of the reform, there is no discernible effect on worker productivity. This result points to the deep challenges in increasing the level of productivity, and with it, economic welfare.

JEL classification: B41, C32, C54, E24, J50, J83, O47

Keywords: causal inference, synthetic controls, machine learning, labour reforms, productivity