Improving funnels processing from ~20 mins to seconds
Editor’s note: we recently shared details of a major GameAnalytics update which includes a substantial overhaul to our product and infrastructure. Check it out for more context. At GameAnalytics we have been using Apache Druid and Imply (Imply.io) for over four years to power our analytical backend, allowing us to achieve a responsive frontend experience for our users. This is due to the low latency querying that Druid enables through the usage of approximation algorithms. Recently we had the opportunity to present the work behind our Druid implementation in the BigDataLDN 2022 event, where we covered our new funnels feature and the usage of the Theta Sketch algorithm from the Apache DataSketches library (https://datasketches.apache.org/). Here’s a recording of the talk. Enjoy!
Our Approach to Open Source – From SDKs to Data Libraries
Three ways to reduce the costs of your HTTP(S) API on AWS
Here at GameAnalytics, we receive, store and process game events from 1.2 billion monthly players in nearly 90,000 games. These events all pass through a system we call the data collection API, which forwards the events to other internal systems so that we eventually end up with statistics and graphs on a dashboard, displaying user activity, game revenue, and more. The data collection API is fairly simple in principle: games send events to us as JSON objects through HTTP POST requests, and we send a short response and take the event from there. Clients either use one of our SDKs or invoke our REST API directly. We get approximately five billion requests per day, each typically containing two or three events for a total of a few kilobytes. The response is a simple HTTP 200 “OK” response with a small...
Blue-Green Deployments on Terraform (For 850 Million Monthly Active Players)
The data collection API is one of the most critical and highly loaded services in GameAnalytics’ backend infrastructure, responsible for receiving and storing raw game events from 850+ million unique monthly players in 70,000 games currently. An outage of the service at that scale would lead to irreversible data loss and thousands of sad customers. In this blog post, we will discuss how we improved our infrastructure deployment practices by utilising the “Blue-Green deployment approach” powered by Terraform – a recipe that helps us achieve 100% uptime for our Data Collectors while continuously delivering new releases. Data Collectors Collectors are responsible for receiving high volumes of raw game events from players around the globe and storing them for subsequent processing by our analytics systems. It’s a REST service that, on busy days, handles up to 4.5 million HTTP requests per...
About Druid: Our New Backend Technology
By now, many of you will be aware of the fact that GameAnalytics is about to undergo a major technical overhaul. For the past 18 months, myself and a small team of engineers have been rebuilding (and also migrating), our existing infrastructure to an innovative new solution. That solution is called Druid, and it’s essentially a high performance data store. In this post, my aim is to give more of a technical overview of the new systems and solutions that we’re putting in place. I hope this helps to give all users of our service more context about the change, why it was necessary, and the benefits it will bring. For more information about the upcoming changes, you can check out this Beta announcement post. What is Druid? Druid is an open source data store designed for real-time exploratory analytics...