· 6 min read
Game Data Mining: Fundamentals
Anders Drachen
Anders Drachen, Ph.D. is a veteran Data Scientist, Game Analytics consultant and Professor at the DC Labs, University of York (UK).
Game data mining is the answer – this is the umbrella term for the methods employed when working with game telemetry data. In this article, we describe what telemetry, metrics and data mining is, and introduce why data mining is useful in game development.
Game data mining is how we work with telemetry data – without it the knowledge that can be obtained from game telemetry is limited to simple aggregates (e.g. average playtime). If you want in-depth knowledge about player behavior, game data mining is the umbrella term for the techniques that can be employed towards that goal.
Game telemetry and game metrics
To begin with, it is necessary to introduce the term game telemetry, as this source of data is the basis for most kinds (but not all) of quantitative analytics in game development. Game telemetry is data logged from clients or servers about how players play games, or conversely about how the game client itself responds to player behaviour. Another important source of telemetry data is server logs, e.g. from load testing. What kinds of features, or variables, that it is interesting to log and analyze varies from game to game, but all of it is game telemetry.
Telemetry is the raw logged data, for example “playtime” for a specific session. In order to derive meaning from telemetry data, they are often converted to game metrics, for example “average playtime per player” or “daily active users”.
Game metrics are interpretable measures of something, whereas telemetry is the raw data that we work with. The methods used to derive meaning from telemetry or metrics varies, but can generally be referred to as game data mining. Game data mining is about exploring the properties of and finding patterns in, game telemetry datasets. Applied right, game data mining is a powerful tool covering a range of scenarios, from behaviour analysis of individual players and how they give rise to patterns, to interpretation of larger scale structures like guilds in massively multiplayer online games.
We will be writing a lot more about different techniques and examples of analyses, but for now, suffice to say that game data mining is generally focused on finding patterns in the data, e.g. figuring out which players that are about to leave the game, or how to increase virality in a social online game.
Using data mining on game telemetry data we can find out the weak spots in a games design, figure out how to convert nonpaying to paying users, discover geographical patterns in our player community, figure out how players spend their time when playing, discover gold farmers in an MMORPGs, optimize quests, find the most motivating achievements, explore how people play a game, how much time they spend playing, predict when they will stop playing or predict what they will do while playing and which assets that are not getting used, develop better AI-controlled opponents or make games that adapt to the player, explore and take advantage of social grouping – and much, much more.
The rise of big game data
Recent years has seen an exponential increase in the availability of various forms of business intelligence data for game development (and any other ICT-related field). Games can generate massive amounts of user telemetry data, as well as data on production and performance, forming a potentially very valuable source of business intelligence applicable at all levels of game development.
“The data revolution” calls for analysis methods that scale to massive data sizes, and which provide interesting, useful and intuitively accessible results. Unfortunately, when datasets become large, many traditional methods used on small datasets break down, which has led to an increased focus on developing algorithms for large scale data mining over the past decade.
In games, some of the typical problems include multi-dimensionality (e.g. tracking the purchases of 750 different in-game items), time-dependency (e.g. if one player finds a bug that permits gold duplication in an MMOG, the news will spread fast and we need to react fast) and the focus on user experience (e.g. evaluating whether specific player behaviors lead to good or bad user experiences).
[bctt tweet=”Applied right, game data mining is a powerful tool covering a range of scenarios, from behavior analysis of individual players and how they give rise to patterns, to interpretation of larger scale structures like guilds in massively multiplayer online games.” username=”GameAnalytics”]
Computer games range from relatively simple applications to sophisticated information systems, but common for all of them is that need to keep track of the actions of players and calculate a response to them. There are a wide variety of ways that game telemetry data can be employed to assist a variety of stakeholders during and following the development process. Not only for analyzing and tuning games, figuring out and correcting problems and generally learning about effective game design, but also to guide marketing, strategic decision making, technical development, customer support, etc.
However, it is generally far from obvious how to employ the analysis: what data should we record, how can we analyze it, and how should it be presented to facilitate effect transformation of raw data to knowledge that if fully integrated into the organization?
New methods
There is a wealth of information hidden in game metrics data, but not all of it is readily available, and some very hard to discover without the proper expert knowledge (or even with it). The challenge faced by the game industry to take advantage of game telemetry data mirrors the general challenge of working with large-scale data. Simply retrieving information from databases – irrespective of the field of application – is not enough anymore to guide decision-making. Instead, new methods have emerged to assist analysts and decision makers to obtain the information they need to make better decisions: The need is automatic summarization of data, the extraction of the essence of the stored information, and the discovery of patterns in raw data. When datasets become very large (we can consider any dataset that does not fit into the memory of a high-end PC as large-scale, i.e. several GB and beyond) and complex, many traditional methodologies and algorithms used on smaller datasets break down. Knowing the right techniques and algorithms when performing more than the most basic of analyses, is therefore vitally important.