Jan 25, 2019 IN Data Science
Data Analyst: Why we changed the A/B testing approach and how it improve our work
TALKS WITH EXPERTS
Welcome to the 13th and the last episode of Talks with experts. This is the second time we’re talking about data but you know the best how important this topic is in the F2P world. Last time we spoke about Game Analytics in general, today we will focus on our A/B testing method and reasons why we switched from Frequentist to Bayesian approach. If you are hungry for more exact numbers, check out this fact-packed blog or if you are more into watching than reading, visit our Youtube Channel. But now, let's find out what our Senior Data Analyst Viktor has to tell you.
A while ago, you guys from our data team changed our A/B testing methodology and switched from the Frequentist approach to the Bayesian method. Why did you do it? What were your reasons?
We ditched the Frequentist approach as the Bayesian method is much more practical in business applications. Frequentist approach may perform great in scientific research but it has several issues that are just way too impractical in business. We needed to get more flexible and quick in testing and in production. Moreover, the frequentist test told us nothing in case of insignificant results from the statistical point of view. You cannot say the version B is better but at the same time, you can't say that both versions are the same, because this is not how the test works. You can just say that you don’t have enough data to tell whether there is a difference. And this sucks if you want to communicate this as a data scientist to the producers. Imagine you’ve just spent 2 months on testing, for example, some monetization feature and in the end, you have to tell them that you haven’t found anything. The second thing is that in order to set up the Frequentist test, you need a fixed sample size. This means that you first have to estimate the effect of the feature you are testing and by that, you estimate how many players you need in the test. But you launch the test because you don't know how big is the effect in the first place! In the result, it happens quite often that you either underestimated or overestimated the effect because basically, you are just guessing. And this is definitely not very cost-effective. You either end up with a test which is not significant or you end up with a really big sample size and you are wasting time on the test.
How can the Bayesian approach improve your work?
First of all, we always have a meaningful and informative result. The Bayesian methodology works with probability (for example, that B is bigger than A) and expected gain or loss with each version. We basically get with 3 numbers that are easily interpretable to anyone without a statistical background. Moreover, we can even offer simple risk evaluation of the A and B version. The other thing is that the sample size does not have to be fixed before the test so we can stop the test anytime we want.
The Bayesian approach works much better with smaller sample size than the Frequentists method. In some situations, we can get a result with even 60% smaller sample size, which means a significant time and money saving.
So what has actually changed in Pixel Federation?
We have less of inconclusive tests and we say “we do not know” less often. We always have at least some valuable information from the results. Just recently, during the soft launch of our new game, we were testing the completion rate of levels. This is something that we would not be able to do with the Frequentist approach due to the smaller sample size - the player base. Unfortunately, we are still not using the full potential we have with the new A/B testing. We are not testing more frequently than a year ago as it still takes a lot of time for our production team to prepare the test. In near future, we want to introduce more practical tools to facilitate the process so designers can setup a test without programmers having to rebuild the app. This way, we will be able to test more often and get even more results.
To conclude this interview, can you share some more tips and tricks regarding testing?
Question yourself, your team and even cross team when planning the test whether the result of it will affect any decision. It really makes no sense to make an A/B test just to confirm what you want to do anyway. Always test the most impactful feature first. And last but not least reflect on how to select the metric. Don’t focus only on the retention as some features can indirectly affect it and it may take time to manifest the change. It is often better to make a metric that is directly connected to your feature, to the change what you are doing.
*So that’s it, thank you for staying with us this whole season 1 of Talks with experts. Do you have more questions? How does testing look in your company? Tell us about the challenges you’ve faced and let’s talk about the topic in our Facebook group called Free to Play Game Developers. Please, feel free to invite your fellow game developers as well. :)