The Chicago Board Options Exchange Market Volatility Index (the “VIX”) is often referred to in the financial press as the *fear index. *Even casual observers of the VIX will notice that it increases (sometimes substantially) during times of extreme uncertainty and has a strong negative relationship with the overall market — the VIX increases when the overall market decreases.

The past few months I have been getting acquainted with R, a programming environment for statistical computing and graphics. It also has a robust collection of user-written packages, including many related to algorithmic trading. It is open source and used by a large number of amateur and professional traders.

Some readers may be interested in my personal background. My day job involves data analysis, modeling, and regression analysis using SAS and Stata (also programming environments for statistical computing). I come from an econometrics background (working under the guidance of economists), and this is the approach that I hope to take with my pathetic attempts at algorithmic trading. Progress on learning R has been slower than I expected due to a heavy workload at my day job and because I have started studying for the CFA Level 2 exam.

Algorithmic trading is a term so broad that it may be helpful for me to define the scope of what I am hoping to accomplish. I am interested in identifying predictive indicators and building models to generate buy and sell decisions using end-of-day data with holding periods of several days to months or longer. This definition is important because algorithmic trading can refer to a number of things. A number of algorithms are focused on best execution (VWAP, for example). These algorithms seek to find the best price when a fund wishes to purchase or sell a large block of shares where the actual buying or selling will move the market. On the other hand, there are a number of algorithms that are designed to sniff out when a big player wishes to move a lot of shares — these algorithms are in constant battle with the best execution algorithms. Other algorithms are focused on exploiting small inefficiencies in market microstructure. These algorithms can involve analysis of each individual tick of data (each tick being the smallest possible increment a security’s price can move) and analysis of the limit order book. Most algorithms also have a degree of automation which involves interfacing with a broker’s API or through some other fashion.

It is worth noting that I intend to trade manually on any buy or sell signals that my models generate. In this sense I can keep a discretionary component in my trading and focus on trading strategy instead of a broker’s API.

**The Data**

As this is my initial foray into the world of algorithmic trading and R, this post explores some of the most basic functionality of R using end of day prices of VIX and SPY, an ETF that tracks the S&P 500. I use SPY instead of the actual S&P 500 index to preserve the applicability of any signals the model generates. SPY is a trade-able security while the S&P 500 is not. Nonetheless, I use the terms SPY and the S&P 500 interchangeably throughout this post and my code.

First, let’s see a plot of the data using the ggplot2 package:

Plotting the data is an essential starting point for any kind of statistical analysis. The genesis of any model is disciplined observation of the world around you and attempting to quantify and predict the phenomenon or relationship that you observe. In time series regression analysis, it is also essential for determining model selection and whether all the numerous assumptions of regression analysis holds. These include detecting whether the following problems are present: whether the data should be transformed in some way (natural log transformation, first differencing, or some kind of non-linear transformation), heteroskedasticity (sub-populations of the data have different variance than other sub-populations), serial correlation (correlation of an observation with previous observations), whether the data is covariance stationary (constant mean and variance over time), and the integrity of the data (outliers and missing observations).

In this case, the transformation we use is taking the natural log of the daily returns. For a discussion of why academics and practitioners of quantitative finance use log returns, I refer you to an excellent post by Quantivity here.

Plots of the log returns seem much more appropriate for time series regression analysis:

I have been meaning to write more on this post, but my work schedule has been extremely demanding. Will continue writing in part two when I have a chance. In the meantime, please follow Curated Alpha via Email,RSS, or Twitter.

Related posts:

- Curated Interviews From Quants: Algorithmic Trading and High Frequency Trading (From Reddit’s IAmA) Part 3
- Curated Interviews From Quants: Algorithmic Trading and High Frequency Trading (From Reddit’s IAmA) Part 1
- Curated Interviews From Quants: Algorithmic Trading and High Frequency Trading (From Reddit’s IAmA) Part 2
- How Algorithms Shape Our World
- How to Make Money in Microseconds: A Primer on High Frequency Trading

This is a great change to the usual posts and I very much look forward to part 2. Will you be sharing any of the R source code? It would act as a primer to people new to the game (like me!).

Thanks. It’s all in the very preliminary stages. I’ll see about sharing my R code. Let me think of the best format to present it in. I signed up for Github which I may start using if I can dedicate more time to this.

Interesting post, I also look forward to part 2 and hopefully seeing some R code! I got interested in finance only recently and really enjoy reading your blog, as it offers a refreshing alternative to the usual, mindless predictions devoid of any logical reasoning that many other sites offer … keep up the good work!

Thanks for your kind words. Perhaps I’ll write a post on some of the more finance-related R resources that I’ve used to get up to speed. I think that might be of more long-term use to readers instead of looking at my simple R code. I haven’t really written much code yet.

How is the log return defined? Obviously not log(x(n)-x(n-1)) as log() is not defined for negative numbers.

To answer my own question. I found it in the link provided above:

log(1+(x(n)-x(n-1))/x(n))

I think your calculation is correct. An alternative way of expressing it is ln(price at time t / price at time t-1), where ln is the natural log.

Pingback: 5/28/2012 Linkfest « Wiliam Bing Hua