Zachary David

Academic finance literature naively applying machine learning (ML) and artificial neural network (ANN) techniques to market price prediction is a dumb farce. While this probably won’t surprise anyone who has done a paper replication in the past 6+ years, despite all of the advancements in algorithms and hardware, and despite all of the new domains ANN’s have conquered, financial academics still insist on throwing feces at the wall. In fact, their simian proclivities might be getting worse.

A typical situation goes something like this:

“We noticed [some machine learning technique] has had success in [something unrelated to finance]. So we take [a small/arbitrary set of securities] over [a small/arbitrary window of time] and apply a [random, obnoxiously large, or empirically unjustified feature space] to said technique to predict price movements of said securities. We show that under [our ad hoc and unreasonable assumptions] said technique can sometimes predict price movements. Publish me please.”[1]

Those familiar with the replication crisis and The Garden of Forking Paths should immediately spot the numerous potential “researcher degrees of freedom” that inevitably prove these results not robust. Indeed, all it takes in order to break most of these papers is adding a few similarly behaved securities or applying the methodology just a few months before or after the paper’s sample period. But these type of failures have been covered in literature, so for that see Noah’s review of the spurious and the fleeting.

Instead, for this post I’d like to focus on one example where I’ll fire off all the things that make papers like this hopelessly fucked before they even begin.[2]