Introductory statistics and econometrics teach us that correlation does not imply causation. When predicting models on abstract data, however, it is sometimes easy to lose sight of this fact. The Financial Time’s Alphaville blog recently had an excellent post that illustrates this concept:
Just as correlation does not imply causation, this post should not imply usefulness. Consider it a bit of light frippery at the end of a rather challenging week for symmetrical trading.
Here’s how it works. We’ve drawn the graphs of a few securities into Google Correlate, which finds search terms whose popularity matches the given trend over time. It is, in short, an automatic logical fallacy generator.
For example, the S&P 500 since 2003 correlates best to searches for “Asian diner”.
Gold, perhaps unsurprisingly, is an excellent match for “hot girl”.
In the comments section of this post, a reader asks:
Awesome. Is there a way to correlate with one month delayed market data, then use it with searches that spiked or died in the last month in order to make completely meaningless predictions based entirely on this kind of data mining?
All joking aside, it does appear that Google search volume does have some predictive power. I came across a paper titled Predicting the Present with Google Trends by Choi and Varian where they demonstrate that Google search results can be used to predict the level of activity across a variety of industries. The authors do not assert that Google search volume predicts the future. Instead, they illustrate the point that most macroeconomic indicators and data are provided with a time lag — data is released in regular intervals, typically month-end or quarter-end. Google search volume, on the other hand, is real-time data which the authors assert can help in predicting the current values of macroeconomic indicators.