Probably the most important difference between games-as-a-service and traditional retail products is the role that live data capture can have on the ongoing design and development of the playing experience.
Effective data capture is like a sensory system for your game so you can understand the actual behaviour it exhibits rather than what we expect it to do or what focus groups tell us it might do.
But can we always trust what the data tells us?
What Is Data Good For?
Back in the day, Zynga and Playfish made a big deal of how well they could tailor their experiences for players.
But as we now know those incremental improvements didn’t stop those big social games from going the way of the dodo. With all the data gurus out there, surely these companies could have predicted what was needed and kept their games going on forever?
The trouble is that data isn’t the same as insight and cannot replace creativity.
We also often mess up when interpreting data, particularly when we fail to understand how to interpret the data we have captured and its limitations. With games, we are typically capturing that a trigger event has occurred and with it associated parameters such as date/time, scene, player status, etc.
Often the most important data point is missing; i.e. when a player decided to stop playing. We maybe able to identify when the player disconnected from our server but we can rarely get the actual decision point. Instead we have to infer this from where the breadcrumb trails of data stop.
But we don’t want to do this for every single player in isolation. We want to be able to compare each player with others who have also stopped playing the game and find some common reason.
The trouble is that their reasons may not be the same. We don’t know if real-life got in the way, if the level was too difficult or if they just got bored. Perhaps something better came along. All we can do is try our best to find what we can change to improve our games’ overall performance.
Data on What?
How we approach this comparison really matters. There is a Lore of Averages (bad pun intended) as the way we calculate comparisons can drastically affect our interpretation.
What’s more important? The Median? The Mean? The Mode? Heaven forbid if we start adding up different averages!
Perhaps even the very idea of an ‘average’ might disguise vital information.
Let’s say we have data which shown on average people spend 2.5 minutes playing our game. But when we break out the data into segments we might find 70 percent spent less than 10 seconds before closing the app. What if the remaining cohort of players spent an average of 12-15 minutes per playing session? How relevant is the average then?
As we have already said there will also be missing data points. Some of which will be the result of a lost ‘end-point’ such as forced disconnection, others may come from data corruption or failed data posts. How relevant might that data have been? Would it have been significant? Missing data can distort our interpretations as much as choosing the wrong calculation method.
An Uncertainty Principle
Things get worse when we realise we can’t know if our results are ‘good’ or not.
What is good for our game anyway? We need to have some kind of baseline to work with – to plot our progress against – otherwise we can’t appreciate how we are doing.
A/B Testing can help here. We offer two versions of our software to our players and see which engenders the best result. If we are sensible we will set up a control group which lets us see what happens when we don’t make changes as well as a number of variants; but whether you can do that will depend on the size of your audience.
We can then see which option people prefer. Is it the red or green button which makes people buy the most Gems?
Sadly this isn’t without its problems. Making a change itself might be the cause of any improvement in sales. Making a button change probably doesn’t address the underlying desire to buy or not and different versions might appeal to different people.
Worse than this we might not know if the different reactions are statistically significant or not. How likely is it that the data we capture at any specific point is actually important rather than just being random?
This will depend on the size of the sample group of players in each cohort and whether the result is repeatable or not. Worse still it’s hard to be certain if the changes we see are the result of a correlation or if the changes are the cause of the improvement. And on top of that the reality is that the very act of change and testing affects the outcome.
Tracking the Right Way
Having different sample groups of players is very useful, but for me the most important thing is to be able to test against a consistent audience. Players evolve as they play our game and their attitudes to playing, or indeed paying, change over time.
We need to take that into account; not just the versions of the game they are using. That’s why I always try to have a report showing the progress of players through a service based on the number of days they have been playing; rather than using actual dates.
There are similar techniques we can use to help mitigate the other problems I’ve mentioned by using a good statistical approach and ideally with the support of a genius data analyst. However, no data analyst can account for external factors or what players might do in future.
Know Your Limits
The data we capture only tells us what did happen – not what might happen when we make a change. It can’t predict the arrival of a rival substitute game which may lure our players away or the reaction they might have to a new feature. We have to show some vision and use that to gauge what’s important for our game.
But without that data our attempts to come up with new ideas will be like stumbling in the dark. We might get lucky; but there is a risk we will fall and break something. Believe in your data, make sure you understand its limitations and use what it can provide to inform your decisions, but don’t trust it blindly.