Principal component analysis on an unbalanced panel with mixed frequencies

BIS Quarterly Review  |  March 2013  | 
18 March 2013

(Extract from page 40 of BIS Quarterly Review, March 2013)

PCA requires that the input data series have certain properties. Variables must be stationary (ie without deterministic or stochastic trends), they should be of a comparable range of variation (ie have similar means and volatilities), and they should be defined over a common range of dates. Not all the original series we use (see Web Appendix for a list) fulfil these criteria. Most series are quarterly, but a few are observed only annually. Most series start in 1980 but some begin later. Finally, there is considerable variation across variables in terms of their units and amplitude. We deal with these problems through a series of adjustments that are fairly standard in the literature.

As a starting point, all the series are checked for stationarity by performing a battery of unit root tests: these are the Philips-Perron test, as well as autoregressive and trend-stationary Augmented Dickey-Fuller tests. The lag choice for the tests is based on the procedure suggested by Ng and Perron (1995) and the rule-of-thumb suggested by Schwert (1989). The variables that exhibit unit roots are then differenced in the final set. All variables are normalised by dividing by their standard deviation.

In order to fill in missing observations due to the use of annual series or to extrapolating quarterly series beyond their observed range, we apply the EM algorithm proposed by Stock and Watson (2002). The algorithm is embedded in the process estimating the PCs and it comprises two steps. The first step involves the linear projection (regression) of those variables with missing observations on a balanced panel of PCs estimated on the basis of the quarterly series observed over the entire sample period. This projection is used in the second step to fill in the missing observations before a new set of PCs is estimated on the basis of the complete and projected series. The procedure is repeated until the process converges, namely the subsequent estimates of PCs are sufficiently close between iterations. In our case, this occurred after four to five iterations. As prescribed in Stock and Watson (2002), the details of the algorithm are slightly different depending on whether the interpolated series refers to a stock or flow variable, and whether it is in levels or first differences.

The final, balanced panel of variables at a quarterly frequency together with a one-quarter lag was used to calculate a final set of factors that were used in the forecasting exercise for the real variables. Stock and Watson (2002) argue that the inclusion of a one-period lag can go some way towards capturing the time dynamics of the financial variables in the estimated factors.