Description:
We examine U.S. publicly traded bank holding companies (BHCs) that failed during the 2007-2009 global financial crisis. Using consolidated data at the BHC level and 10-K filings, we investigate the determinants of bank failures during this period using nonlinear machine learning (ML). The in-sample analysis demonstrates that 90% of the failed banks can be classified during 2007-2009. In addition, our sensitivity analysis for interpretable ML shows that net tone is among the top five important features. However, the power of tone/text is less evident when we consider predictive (out-of-sample) analysis. While nonlinear ML models such as random forests and support vector regressions benefit from textual data in forming predictions, linear models that rely on actuarial data attain a similar or even better performance. Overall, our paper demonstrates that the least complex linear models use conventional financial ratios efficiently in predicting the failure of publicly-traded banks, deeming more complex ML algorithms with 10-K textual data redundant