We begin the next phase of our robustness testing workflow with the reduced basket of 56 GBPJPY strategies generated using StrategyQuant’s genetic algorithm.
These 56 strategies managed to remain profitable when tested over an additional 5 forex markets. We will continue to shortlist robust strategies by conducting another difficult out-of-sample (OOS) test, in the form of walk-forward optimization.
In an ideal world, we would have infinite data to backtest our strategies on. More data means more trades and more market conditions; both of these greatly improve the reliability of backtests.
If you are looking to run an OOS test on the same market, the traditional approach is to allocate a portion of your data for development, leaving the remainder untouched for OOS testing. This creates a conundrum – more in-sample (IS) data is desirable because it trains the strategy to adapt to different market conditions, yet you want to maximize the amount of data available for OOS robustness testing.
Walk-forward optimization (WFO) helps address some of these issues. The concept of WFO is illustrated in below, using 8 years of data.
In WFO, your backtest data is split into multiple in-sample and out-of-sample segments. The strategy is optimized over each in-sample segment (green), and the optimal parameters are then applied to the OOS segment that immediately follows (red).
This cycle of in-sample optimizing and OOS testing is repeated as you progress through your entire dataset, creating a string of OOS backtests.
There are two forms of walk-forward optimization: Anchored and unanchored. In anchored WFO, the start date of each in-sample period is fixed, meaning your in-sample periods get longer as you progress through the test. In unanchored WFO (also called rolling or floating WFO), the start date of each subsequent in-sample period is moved forward by an amount equal to the length of each OOS period.
Unanchored WFO is more suitable for short-term strategies or if your market has a changing ‘personality’ because it allows for quicker adaptation of your strategy to the prevailing conditions. I prefer using unanchored WFO for this reason, and this is what we will use from here on out.
Let’s see how such periodic reoptimizations will occur in practice. Suppose you apply an unanchored WFO to a basic moving average crossover strategy. You decide that you will go long when the price closes above the moving average, and vice versa for shorts. After optimizing over IS 1 (2010-2012) above, you determine that the optimal moving average period is 50. You then test your MA(50) over the 2013 data (OOS 1).
Next, you optimize from 2011-2013 (IS 2) and you find that the optimal period is now 35. You then test your MA(35) over the 2014 data (OOS 2). This cycle repeats until the end of your data. If you stitch all your OOS segments together, you get a complete OOS backtest.
Now compare WFO with traditional out-of-sample testing, where we split the 8 years of data into two equal segments, shown at the bottom of the figure above. Notice that WFO produced a 5-year OOS test, whereas the traditional method only produced 4. The total length of your OOS tests can be even longer, depending on your WFO settings (covered below).
Thus by using the same amount of data in a cleverer way, WFO gives us the following advantages:
- It provides a strict robustness test by lengthening the out-of-sample period
- It allows your strategy to adapt to prevailing market conditions through periodic reoptimizations
Walk-forward Optimization Settings
Before we can set up a WFO, let’s go through some key settings.
Number of Runs
This is the number of times each optimization and OOS testing cycle is done. There are 5 runs in the illustration below.
Out-of-sample Percentage (OOS%)
For each run, this is the OOS data length in comparison to the IS data length. A 33% OOS% is used below.
These two settings can completely change the complexion of your WFO.
The WFO on top had 10 runs and a 10% OOS%, giving a complete OOS backtest starting from 2012. The bottom WFO had 30 runs and a 40% OOS%, giving a longer OOS test from 2005.
In general, the more runs you have, and the larger your OOS%, the longer your OOS test will be. More runs also means more frequent reoptimizations. This may have implications if you manage a large basket of live strategies. I recommend the following ranges:
- Number of runs: 10-30
- OOS%: 10-40
With at least 10 runs, you can have some confidence that your results are not due to chance. The OOS% should not be too large, otherwise you will likely have too few trades in the optimization period for a reliable estimate of the optimal strategy parameters.
Lastly, the choice of WFO settings should be in alignment with the premise of your strategy. For example, if you are developing a trend following strategy on the daily timeframe, it makes little sense to reoptimize on a monthly basis.
Walk-forward Optimization Performance Metrics
WFO metrics often utilize the concept of efficiency or stability. This refers to how well the OOS metrics compare to the IS metrics. An efficiency of 100% means the OOS results are as good as the IS results. The most common metrics are covered below.
Known as the net profit stability in StrategyQuant, this is the ratio of the annualized OOS profit over the annualized IS profit. Some performance deterioration should be expected when applying your strategy to OOS data. A minimum efficiency of 50% is often used as a passing criteria.
Percentage of Profitable Runs
This measures the consistency of your profits. 70% will be used for our test.
Distribution of Profits
Ideally you want an even profit distribution, meaning each run contributes roughly the same amount to your overall profits. If any of your runs has an abnormally large contribution, it could be due to an unlikely price shock or market event. For our test below, no run should contribute more than 50% of the overall profit.
In StrategyQuant, an additional metric called the robustness score is used. This is simply the percentage of criteria that have passed. If 2 of the above 3 criteria have passed, our robustness score will be 67%.
Walk-forward Matrix (Cluster Walk-forward Optimization)
Like most robustness test settings, choose an appropriate number of runs and OOS% is a non-trivial task, and this can often mean the difference between passing and failing the test.
Fortunately, StrategyQuant’s walk-forward matrix gives you a way to test multiple WFO settings and select the best one without compromising the integrity of the test. The walk-forward matrix (WFM) runs the WFO for multiple combinations of OOS% and number of runs, in essence allowing you to optimize these two settings.
Such optimization may be excessive to some; in general the more optimization you do, the less robust your strategy becomes. To circumvent this, the WFM provides an additional filter that looks at the stability of your walk-forward robustness scores across the entire spectrum of settings used.
Using the range of recommended WFO settings above, we can generate a matrix containing 20 different WFO settings. Each cell in the matrix corresponds to one particular WFO setting, and has its own robustness score.
The WFM test only passes if there is a stable region of WFO settings. The WFM filter is configured such that there must be a 9-cell (3×3) region in which at least 7 cells have a 100% robustness score. If this stable region is present, usually the cell in the middle will yield the optimal WFO settings.
Therefore, by examining your strategy’s stability when exposed to different WFO settings, the WFM provides another layer of robustness tests while helping you select optimal WFO settings.
Setting Up StrategyQuant’s Walk-forward Matrix
With all that said, we are finally ready to setup our WFM for the 56 surviving strategies. All the settings discussed above will be applied here.
Due to the repeated processes of optimization and OOS testing over different settings, WFM is a tremendously time-consuming process. To alleviate this, StrategyQuant has a simulated testing mode, whereby genetic optimization is used to estimate the results for different WFO settings. The time-savings will usually be worth the loss in accuracy.
OOS% values will be optimized from 10-40, in steps of 10, while the number of runs will be optimized from 10-30, in steps of 5. This will give us a 20-cell matrix, with each cell corresponding to one WFO setting.
To minimize the risk of curve fitting, only important parameters such as indicator periods and exit levels will be optimized.
For a particular WFO setting (number of runs, OOS%) to be considered successful, all three metrics (Net profit stability, % profitable runs, distribution of profits) must meet their respective thresholds. For the WFM to pass overall, there must be at least 7 successful cells inside a 9-cell region.
Walk-forward Matrix Results
14/56 strategies managed to pass our WFM test. Since WFO/WFM tests typically take the longest, it is advisable to keep them at the end of your workflow.
Let’s look at the detailed results for one strategy.
17/20 different WFO settings passed our criteria. The optimal WFO setting, 20 runs and a 20% OOS, is located right in the middle of a successful region. This setting yielded a 78% walk-forward efficiency, with 16/20 runs being profitable. Profit distribution was reasonably even with the most profitable run contributing only 16% of overall profit.
To have another perspective of the strategy’s robustness, let’s see how the walk-forward efficiency varied across the matrix.
The efficiencies ranged from 46-82%, with an average across the 20 settings of 66%. The strategy displays strong OOS performance across a wide range of test periods.
With this optimal WFO setting of 20 runs and 20% OOS, you will need to reoptimize the strategy every 246 days, using a history of 1218 days. After each optimization run, the optimal parameter set will be listed, and the 5 most recent sets are shown below.
The strategy contains three optimizable parameters:
- BarsValid: Number of bars the entry buy/sell stops will be valid for
- ExitAfterBars: Bar-based time stop
- StopLossCoef: Multiple for the ATR-based stop loss
The parameter stability over the years is another indication of the strategy’s robustness. If you were to start trading this strategy now, you would use the parameters in the bottom row of the table until February 2021, at which point you will reoptimize the strategy.
Of course, no result would be complete without an equity curve.
The blue curve shows our strategy’s WFO performance, while the grey curve indicates the strategy’s performance with the static parameter set derived from the genetic algorithm.
It is evident that this strategy benefits from the periodic reoptimizations that WFO requires. Perhaps this helped the strategy adapt to market changes over the 16 years. If you stitch together the OOS portions of the WFO (red bars at the bottom), you can get a complete OOS backtest from 2007 onwards.
The detailed performance metrics are shown below. Since we traded a fixed 0.01 lots throughout the backtest, the %-based metrics are irrelevant for now. A 1.61 profit factor and a return/max drawdown ratio of 13.4 for a single strategy are pretty decent.
If your strategy does not perform well using walk-forward optimization, it does not necessarily indicate a lack of robustness. Not all strategies benefit from periodic reoptimizations; sometimes a static parameter set may work better.
Some traders also believe that optimizing your strategy periodically is akin to chasing the market. If you suspect your strategy may perform better with static parameters, consider using the Optimization Profile/System Parameter Permutation robustness test in StrategyQuant.
So what is this strategy’s trading logic?
Apparently the strategy looks for a short-term pullback, as measured by Wilder’s Parabolic SAR. If this pullback occurs, a buy stop is placed at the previous week’s opening price. No action is taken if the current price is above the previous week’s opening price. The entry conditions on the short side are symmetrical.
Considering how easily satisfied the Parabolic SAR conditions are, I believe you can remove them with little consequence. This is thus essentially a weekly breakout trend following strategy.
Trade management is kept to a minimum, with only a bar-based time stop and an ATR-based stop loss.
Initially I found the strategy’s simplicity to be shocking. But this strategy was profitable on 6 different markets, and passed our WFM with strong displays of robustness. Perhaps this simplicity is no coincidence.
Walk-forward optimization (WFO) can be used to evaluate your strategy’s robustness, and to determine whether it would benefit from periodic reoptimizations.
The process involves breaking down your data into numerous segments, each containing an in-sample (IS) and out-of-sample (OOS) portion. For each segment, strategy optimizations are conducted over the IS portion, after which the optimal parameters are applied to the OOS portion.
Two important WFO settings are the number of runs, and the percentage of each data segment that will be used for OOS testing. StrategyQuant’s walk-forward matrix (WFM) allows you to run multiple WFOs, each using a different WFO setting.
A robust strategy should consistently demonstrate strong OOS performance across the data segments, and across a broad range of WFO settings.
Of the 56 strategies that underwent WFM, 14 passed our robustness criteria. One of these is a breakout trend strategy that enters on a stop order placed at the weekly open.
In the final article on robustness testing, we will use this strategy to explore StrategyQuant’s Monte Carlo capabilities.