· 3 min read
Frequent Itemset and Association Rule Mining
Anders Drachen
Anders Drachen, Ph.D. is a veteran Data Scientist, Game Analytics consultant and Professor at the DC Labs, University of York (UK).
Sequence analysis is used in a lot of different areas, and is also highly useful in games for finding behavioural patterns that lead to particular behaviours, for example a player quitting a game. Here is how it works.
In frequent itemset mining, the base data takes the form of sets of instances (also called transactions) that each has a number of features (also called items). For example, a dataset of the items players bought in a social online game might contain 4 transactions as follows:
{Sword of Grungni, Shirt of Awesomeness, Pretty Pet}
{Shirt of Awesomeness, Pretty Pet, Healing Potion}
{Sword of Grungni, Healing Potion}
{Shirt of Awesomeness, Fancy Hat, Pretty Pet}
The task for the frequent itemset mining algorithm is then to find all common sets of items, defined as those itemsets that have at least a minimum support (exists at least a minimum amount of times). If the support is set to 3, the following 1-itemsets (sets of only one item) can be found in the dataset described above: {Sword of Grungni} and {Pretty Pet}.
It is also possible to find one 2-itemset: {Shirt of Awesomeness, Pretty Pet}, as three of the transactions contain both Shirt of Awesomeness and Pretty Pet.
Other itemsets of the same lengths are considered non-frequent as they recur less than three times.
The original algorithm for mining frequent itemsets, which was published in 1993 by Agrawal and is still frequently used. This algorithm functions by first scanning the database to find all frequent 1-itemsets, then proceeding to find all frequent 2-itemsets, then 3-itemsets etc. At each iteration, candidate itemsets of length n are generated by joining frequent itemsets of length n – 1; the frequency of each candidate itemset is evaluated before being added to the set of frequent itemsets.
There exist several alternatives to this algorithm, e.g. the FP-growth algorithm, which finds frequent itemsets through building prefix trees.
Once a set of frequent itemsets has been found, association rules can be generated. Association rules are of the form A→B, and could be read as “A implies B”. Each association rule has support (how common the precondition is in the dataset), confidence (how often the precondition leads to the consequence in the dataset) and lift (how much more common the consequence is in instances covered by the rule compared to the whole dataset). From the dataset and frequent itemsets above, the association rule Shirt of Awesomeness → Pretty Pet can be derived with support 3 and confidence 1, whereas the rule Shirt of Awesomeness → Pretty Pet and Sword of Grungni only has a confidence of only 1=3 and so would most likely not be selected as a useful association rule.
Frequent itemset mining can be used in several different ways for understanding game data. One way is to find patterns among players. If a database is organized so that each instance describes a separate player and the (binary or ordinal) attributes of each instance describe the player’s playing style (e.g. {violent, speedrunner, cleared_level_3, dies_from_falling}), frequent itemset mining can be used to find playing style characteristics that frequently co-occur.