Robustness is the measure of a strategy’s insensitivity to variations in price behaviour. A robust strategy will continue to perform well when market conditions change, or when it is traded on a different market.
Robustness is thus a central theme in the development of any trading strategy, automated or manual.
The Challenge of Creating Robust Strategies
Why do so many strategies fail in real time? Usually it’s because they have been excessively curve fit to market noise. Let me explain.
As traders, we try to exploit the predictive potential of price patterns that reflect some fundamental market behaviour. This could be an underlying trend due to central bank policies or economic reports.
The historical data that we use for development contains some of these predictive patterns, but is contaminated with a certain amount of noise. Noise is the random and erratic price movement that obscures underlying market direction. It is a result of the vast number of market participants, each transacting at different hours, and for different agendas. Due to its random nature, the noise that is present in your backtest period will never be exactly repeated in future.
Curve fitting is often described as the enemy of robustness. It is the process of creating trading rules that demonstrate profitability when applied to historical data. Because every market contains its unique characteristics, some degree of curve fitting is required when developing a strategy.
To be precise, curve fitting is not the problem; it is excessive curve fitting (overfitting) that destroys robustness. In a bid to create a fantastic looking backtest, it is easy to overfit your rules to a particular set of data, creating a strategy whose rules rely on random noise that has no predictive value.
This is why many complex, heavily optimized strategies often deteriorate immediately when traded in real-time.
Why Include Robustness Tests in Development?
You can minimize the risk of overfitting your strategies by:
- Having fewer rules in your strategies
- Backtesting your strategies over different market conditions
- Avoiding simultaneous parameter optimizations
In doing so, you will likely produce backtests with poorer risk-adjusted returns, but these backtests are more likely to represent future performance.
Unfortunately, even with the best of intentions, it is easy to subconsciously overfit a strategy. It could take months of unsuccessful forward testing or live trading before you conclude your strategy has no predictive value.
This is why you should include robustness tests in your development. They help you estimate a strategy’s performance when exposed to different price behaviours and input parameters. If these tests show minimal performance deterioration, you can be more confident that your strategy has not been overfit to past data, and is thus more likely to maintain its edge in live trading.
Robustness Testing in StrategyQuant
The robustness tests available in StrategyQuant are shown below:
You can create a strategy development workflow that includes any of these tests. To reduce overall computation time, StrategyQuant automatically runs the faster tests first, and discards the strategy if it fails.
Starting with our rich basket of GBPJPY strategies previously generated using StrategyQuant’s genetic algorithm, we will use the additional market backtests (out-of-sample testing) and the walk-forward matrix to shortlist a small basket of robust strategies.
Monte Carlo simulations will then be applied to this final basket to predict our worst-case maximum drawdowns in live trading. Knowledge of these drawdowns can be useful when determining our capital allocation to each system. This robustness testing workflow is shown below.
This is the first of three articles that address robustness testing, and will focus on out-of-sample testing on additional forex markets. The second article discusses StrategyQuant’s walk-forward optimizer, while the final article will demonstrate the use of Monte Carlo simulations.
Out-of-sample Testing Using StrategyQuant
Out-of-sample (OOS) testing is probably the most common method of determining a strategy’s price sensitivity. It involves testing your strategy on data not previously used for development/optimization. There are numerous ways to obtain OOS data, each giving different levels of price variation.
For a more lenient robustness test, you can use StrategyQuant’s Monte Carlo simulator to introduce minor variations to your historical data and/or strategy parameters. This effectively ‘shakes’ your strategy’s fit to the market. If the performance deterioration is minor, you can be more confident that your strategy has not been overfit.
At the other end of the difficulty spectrum, you can test your strategy on a completely different market. If a strategy performs well on a variety of markets, it indicates an ability to perform well under different conditions.
Let’s do just that.
Out-of-sample Testing Setup
The choice of market is a sensitive issue that determines the difficulty of your test. Ideally, you want a strategy that performs well over a diverse range of uncorrelated markets, but such strategies are truly difficult to come by.
If you trade futures, a compromise would be to test the strategy on a basket of markets from the same category, such as meats, metals or currencies.
In the spot forex market, however, I have not found any satisfactory way to group similar currency pairs together. Even among all the USD-pairs, or the JPY-pairs, there exist significant differences in price behaviour.
Regardless, since we are starting with a huge basket of 9999 strategies, let’s just up the ante and select 5 fairly different markets. These will be EURUSD, GBPUSD, USDJPY, EURJPY and EURGBP, corresponding to the four most liquid currencies. Because these markets (except EURJPY, perhaps) are largely uncorrelated with GBPJPY, a strategy that was overfit to GBPJPY’s noise will deteriorate significantly. This test configuration is shown below.
Each of the 5 additional markets above will be tested over 17 years, from August 2003 – June 2020. The backtest period is equally split into two segments, OOS1 and OOS2.
A strategy will be considered to be sufficiently robust on a particular additional market if it is able to generate a profit on both segments. To pass this robustness test, the strategy needs to demonstrate robustness on at least 4/5 additional markets.
Out-of-sample Testing Results
The 60000 backtests were completed in under an hour. Only 56/9999 strategies passed!
This could be due to the difficulty of the robustness test, or a general lack of robustness among the strategies generated by the genetic algorithm. When configuring the strategy generation, we only selected strategies that performed reasonably well over an 8-year out-of-sample period on GBPJPY.
I suspect that the high failure rate is due to the different characters of the 5 additional markets.
Let’s look at one of the surviving strategies. For each strategy, StrategyQuant generates equity curves for the trading market (GBPJPY), each additional market, and a portfolio of all 6 markets.
The strategy managed to generate a profit for all 6 markets from 2003-2020.
In a way, this chart illustrates the effects of curve fitting. GBPJPY obviously performed the best, since the strategy was developed on GBPJPY.
EURJPY, which is somewhat correlated with GBPJPY, was the second best performer. At the bottom of the spectrum, we have EURGBP, a traditionally ranging market, which struggled to make a profit.
If none of your strategies pass your backtests on additional markets, a more lenient alternative would be to test on the same market, but using a different timeframe. Markets usually have very different price structures when viewed on different timeframes.
Robust strategies remain profitable under a variety of market conditions. We can improve a strategy’s robustness by implementing simple trading rules, since this lowers the risk of overfitting the strategy to market noise. Implementing robustness tests during development helps us eliminate overfit strategies before putting real money on the line.
Out-of-sample testing is a common method of evaluating robustness, and involves testing the strategy using previously unseen data. Our large basket of GBPJPY trend strategies was tested on 5 additional forex markets. Only 56 strategies remained profitable over 4 or more markets. These shortlisted strategies will subsequently be put through another difficult robustness test: Walk-forward optimization.
Great work with this blog and articles. I try to use your advices. You do not mention nothing about the Ranking in further tests. I assume that it should be same as in building strategies?