Facebook creative testing is a crucial and hotly debated topic among marketers. Understandably so, since ad creative is one of the key levers that impact performance, and identifying high performing creatives can be difficult, expensive or both. To make things more complicated, creative testing can be done in a myriad of ways; for a platform as opaque as Facebook, there simply isn’t just one optimal approach.
In this post, we’ll look at two different, and equally successful, approaches to creative testing – one led by us at Miri Growth, and the other by our friends at RocketShip HQ. We hope that by discussing the differing advantages and disadvantages of each method we can identify the main levers in the process, and in turn, inspire you to establish your own best practices.
RocketShip HQ: how we do it
Our testing methodology utilises Facebook’s Bayesian testing paradigm – chosen primarily because this is how we see Facebook allocate spends among competing creatives ‘in the wild’.
Typically, when we have multiple new concepts to test, we set up an ad set (which we label as ‘untested’) with only new ads that we want to test. This is done as a mobile app install (MAI) optimised ad set targeting a 1% lookalike of US purchasers (if the target CPA is low, we set this up as AEO). We establish this without a split test to let Facebook’s Bayesian testing operate as it would ‘in the wild’.
Our goal at this stage is to get a directional read of which of the new ads are potential winners. We don’t run ‘control’ ads in this ‘untested’ ad set in order to avoid delivery being skewed toward established ads that are only favoured for their pre-existing histories with Facebook.
We run this ‘untested’ ad set at about $50-$100 a day, depending on the CPIs, and usually within a couple of days we have a read of the winning creative(s). At this stage we look at the performance of CPIs and IPMs against our proven and running control ad(s), as well as their performance against CPI and IPM benchmarks we’ve set based on our expectations of an MAI campaign.
We also assess if there is cost per registration or CPA data (ideally CPP/ROAS data) that can support the CPI and IPM data that we are seeing for each ad. With larger budgets, we often have multiple ‘untested’ ad sets running.
We take ‘early winner’ ads that have 20+ installs and hit benchmark CPIs/IPMs and add them to a couple of AEO/ROAS optimized ad sets that contain past winners (or control ads, if you will). We then let these ‘early winners’ compete against the control ads, pausing to replace the winners and losers in the untested ad set with new iterations and variants.
In summary, our approach is:
- Set up ‘untested’ creatives in a separate ad set.
- Use MAI (if target CPA is low, use AEO).
- Use 1% lookalike audience if using MAI, or else use 5-10% lookalikes.
- Compete benchmark top performers in ‘untested’ ad set against ‘control’ ads, and look at CPI, CPA and IPM for comparable audiences.
Once the new ads outperform the control ads, we move them to ad sets outside of the untested ones. This way, as ‘control ads’ start to see saturation, we have new ads to replace them – thus maintaining a regular supply of new creatives in our pipeline within our untested ad sets.
While this approach has the advantage of conserving budgets by avoiding spend diversion to potential underperformers, there is the risk of surfacing false negatives or false positives – especially if Facebook decides to direct spend toward ads that don’t perform, or conversely, divert spend away from a potential high-performer.
Miri Growth: how we do it
Zach and I began running creative tests four years ago while we were working at Peak Brain Training. The company had hired a motion graphics designer to deliver 5 new ads every week, and we were tasked with testing each one to determine the best performing ad. It was around this time that Facebook unveiled their split testing feature, which subsequently influenced our strategy. Here’s what we did:
- Set up a split test using six ad sets, each with one ad; one set with the control and the other five with the new creatives
- Ensure each ad set targets the same audience – usually a 5-10% LAL audience in a core market
- Look at the performance of the creatives based on the Impressions to Install ratio (IPM) to avoid being influenced by CPMs
- Make sure the campaign is a MAI one. This way, CPIs are under control
- Run tests with a $20 daily budget per ad set, or $120 a day
- Return to the test after one week and analyse results, including uncertainty bounds, based on a statistical significance of 90%. We theorised this was enough as it allowed us to run tests with a limited budget, but also identify ads that would substantially outperform, underperform or perform in line with the control ad
This set-up has proven to be so successful that, since Miri’s conception to today, it’s still how we run tests. Why? Its strength lies in the fact that it’s not impacted by CPMs and ensures equal delivery to all creatives, allowing us to identify ads that are at least as good as the best performing creatives with statistically significant results.
However, recent changes and evolutions in the landscape have led us to question our preconceived assumptions. The development of Facebook Audience Network has drastically changed CPMs between platforms. Delivery skewed towards high CPMs on Audience Network mean it’s now possible to have ads that have an excellent IPM, but very high CPIs. On top of this, the existence of ad libraries and the advent of fake ads have led to pervasive copycat behaviour – so much so that all well-performing ads are very similar to each other, which may have an impact on which ads are privileged by Facebook. Finally, with Value Optimisation being an option, we have to consider how this means MAI campaigns may not be the best way to run a test; when rolled out to Value Optimisation campaigns, winning fake ads don’t always deliver great results. In other words, the combination of these factors – the advent of Audience Network, ads that are divergent from app content and Value Optimisation – have prompted us to rethink and question our historical approach to testing.
When designing your creative testing framework we recommend you look at the below parameters:
- Target audience: do you want to target broadly or just your core audience?
- Bidding: MAI deliver lower CPIs, but results may not be replicable in VO
- Platform: as much as similar ads tend to work on Facebook and Instagram, Audience Network is a different environment with different performance. The answer depends on how much your delivery relies on Audience Network.
Regardless of the creative testing framework that we design alongside our clients, we roll out winning ads to all existing ad sets once performance drops to a level below the target set by the UA team. In most cases, the creatives pick up spend and allow for improved performance and scale.
Conclusion
While both our tried-and-tested methods can be equally successful, split testing seems to be the main point of difference between them. At Miri Growth, we prefer to guarantee equal delivery to all ads during testing to ensure they perform. But, this doesn’t necessarily mean the ad will be picked up on ad sets once it’s rolled out. Rocketship HQ, however, lets Facebook decide which ads receive delivery under the assumption that it’s hard to outsmart it. This methodology is combined with the use of benchmark CPIs and IPMs to uncover potential new winners, but can lead to false negatives and positives.
What we’ve learned from this exercise is that there’s no single perfect approach; we recommend tailoring one to fit your specific needs. Just don’t forget to consider other important parameters such as target audience, bidding type and platform delivery.
Let us know if this has inspired you to develop your own best practices, or feel free to share any tips you think we’ve missed.
[This article has been co-authored with Shamanth Rao. Shamanth is the founder and CEO of the boutique user acquisition firm RocketShip HQ and host of the Mobile User Acquisition Show podcast. He’s managed user acquisition leading up to 3x exits (Bash Gaming sold for $170mm, PuzzleSocial sold to Zynga and FreshPlanet sold to GameLoft) – and has managed 8 figures in user acquisition spends.]