Selecting in-sample, paper, and out-of-sample periods – Commentary by Steve Ward

This topic is important whether you are optimizing rules or training and optimizing neural nets. It is relevant for NeuroShell Trader Pro, DayTrader Pro, or ChaosHunter. Next to choosing correct rules and/or net inputs, it could be the most important part of the art of building good trading models.

They say variety is the spice of life, but it is also the spice of modeling. Optimizers work by finding a formula (like a prediction formula or set of trading rules) that work the best over a set of data usually called the “in-sample” or “training” data. What often happens is that your in-sample period is a long trend, usually upwards, and so the model only winds up learning about up-trends, which is great as long as the up-trend continues as long as you trade the model. Even if there are slight pullbacks in the in-sample up-trend, the optimizer will probably learn to ignore all but the largest of the pullbacks, because it realizes more money can be made just going long most of the time. That is, it takes the safe path of “when in doubt, go long”.

So suppose you are using daily data and in addition to the recent slowly rising bull market we have had, you decide to include the big crash in 2008. The flip side can come into play, because the crash was so sharp, the optimizer need only go short during the period to make a boatload of money. It makes so much money short in the bear market, the gains in the slow bull period don’t add much in relation to the bottom line. You might need to add the slow bull market before 2007 to help balance things out, because balance is important.

This brings us to what I will call focus. The traditional wisdom is that the longer the in-sample period the better. This use of a “sledge-hammer to kill the fly” wisdom grew out of the need to include variety and balance in your in-sample data, and the faulty (I believe) assumption that you can find a model that works well forever. But just starting your in-sample data as far back as possible might be counter-productive. For one thing you may get so much variety that the optimizer learns to only detect the largest moves correctly. For another thing, the variety that you capture and learn could be so old as to be useless tomorrow. It might be better to focus on a smaller part of that most recent slow bull market.

But won’t that cause overfitting? It could, but if you are careful and focus on a period of cyclic activity with good inputs/rules, the result might be that the model is better able to detect the onset of smaller activity and reversals. There is less hay to hide the needle. Yes, you might have to retrain or reoptimize as the markets shift.

Clearly the issue that you are trying to trade matters, in view of the above.

You could even isolate segments in the time series on which to train. The easiest way to do that is to use the Toggle Indicators in Advanced Indicator Set 3 in NeuroShell, and the new Time Flag series feature in ChaosHunter.

If you are using the paper trading feature of NeuroShell, the paper trading period should be chosen by the same criterion as the in-sample period – diversity. To do otherwise is to skew the end model in some direction.

Out-of-sample periods are pretty important too. Most people try to use pretty long ones. The truth of the matter is that it is unrealistic to expect any model to work for long periods of time. How long? Most people would be happy with a day or so of intraday profit, and a couple of weeks of daily profit, before they reoptimize and/or retrain models.

Was this article helpful?

Related Articles