Recently, a US based fund – F-Squared Investments was in the news for “falsely advertised a successful seven-year track record for the investment strategy based on the actual performance of real investments for real clients”. Even the hypothetical results they provided via a back-test of the investment strategy was bumped up due to a error in the way it was calculated.
Here is a quote from the magazine Fortune
They had developed the model’s hypothetical performance by applying buy and sell recommendations a week before the model would have actually made those suggestions, enabling the model to buy an ETF just before the price rose and sell just before it dropped. Because of the error, F-Squared’s calculations showed that the strategy had returned 135% during the period. In fact, the strategy’s hypothetical performance should have been 38%.
Back-testing is a nice way to understand and test ideas. But the very fact that back-test has been made easy by charting / statistical programs has meant that every one now just writes a few lines of code and runs a back-test on the whole data to see if the logic works. If yes, next step is in directly trying to implement the same without even worrying about the nuances of the program and whether the back-test results are close to the real results that will be possible.
Let me provide a few basic ways in which the back-test results may be way different from what can be expected in real trading scenario.
Cost of Slippage / Commission: How much of commission / slippage are you adding to the system for every trade that is carried out. If the system is trading on a lower time frame, its guaranteed that this number will be big regardless of which low cost broker you trade with.
There are basically two ways as to how people place their orders based on signals. The first way is to enter the trade as soon as the bar is completed. The second way is to enter the trade only if the bar high / low is broken (high in case of Long / Cover and low in case of Sell / Short).
The first method is straight forward and simple. As soon as the bar gets completed, you take a trade. But how much of a difference can be expected there? A point or two would always be there. Now while that does not seem to be a big number, think about how much of difference you shall see in reality if your system has even a average slippage of 1.5 to 2 points every trade and it trades say 300 trades per year.
The second method where one places a order above the high / low of the bar is even more liable for slippage due to the very fact that it being a stop order, the order trigger and execution price will be at least 3 – 5 points to ensure guaranteed fill.
Quite a few people I know have a filter of another few points just to ensure that they do not get a entry if the bar high / low is not conclusively crossed. That adds in a way to the misery of the system since every trade will be having a even higher slippage than what is generally thought of.
And last but not the least, slippage is high when markets are volatile. So, if your system seemed to have performed pretty good during the 2008 crisis, do add a note to yourself that liquidity in moving markets can be very poor and not as per expectations.
Cost of Rolls: Most guys who back-test use data from running futures contract since that is the most widely available data plus also enables one to test a large period in one go. But embedded into that data is the rollover cost.
Lets take a recent example. Assume you were long going into the December 2014 expiry. Since once the data series of December ends, the fresh data from January starts adding to the series. But the gap which one sees is not a real gap in the sense that if you had rolled over paying 100 Rupees (during the closing 5 minutes of Expiry day), this is seen as a profit in the back-test when in reality you never saw that 100 bucks 🙂
To overcome this issue, I believe a lot of guys use Spot data for their testing. But since we cannot trade Spot and will end up trading the futures, how big will be the difference between the two. Even on intra-day basis, we sometimes see enormous shifts in the premium. The other day, Nifty opened with a premium of around 58 points and slipped to 40 by the time the day ended. And yesterday, one saw it dip down to 20 before bouncing back to 30.
And finally, how much of data are you testing? This is a very big challenge for many since the larger the sample size of data one uses, the worse the system results seem to show. So, to avoid that should one use a smaller sample, one that is closer to the current market?
Well, if you want to fool yourselves, go ahead. But if you don’t, test for as long a period as possible. The greater the period, the more robust the strategy will be (if it hold up, that is). Falls we saw in 2008 / 2000 are the kind of things one should expect going forward as well. And if your system will get hanged up due to that kind of volatility, its just a matter of time before your system reduces your equity balance to Zero.
One way to measure system robustness is also in the number of back-test trades you see. The higher the number, the better it is. Remember reading somewhere that anything over 400 trades in a back-test signifies a good sample size.
And remember that even after accounting for everything that I have written about above, you will still need to beat the Buy & Hold returns by a handsome margin. Else, you are just wasting time on a venture that will get you sleepless nights without the benefit of a bounty at the end of the road.
Finally, all of the above will be useless if your system is based on some optimization. It shall pass any and every hind-sight test but shall fail the moment you start trading it in a real environment. I personally abhor optimization though even I fall prey to it (Trading one time frame than the other because historical records indicate that this is the better than the other). But if you understand the process of why the signal is generated and if you have enough out of sample test results, this is something that can be more or less overcome to a extent.