Use Designers in Data Analysis

The last time inflation was unusually high, the economist Robert Lucas wrote a paper criticizing policy evaluation by performing statistical analysis on historical aggregates. It is now known as the Lucas Critique (not to be confused with the more common Lucas Critique concerning the Star Wars prequels). The paper (“Econometric Policy Evaluation: A Critique”) does get technical, but the intuitions are straightforward to understand.

Historical data, and specifically aggregated data, reflect people’s individual decisions under a set of policies. The analyst is attempting to figure out what would happen if those policies changed. The problem is that the change is being evaluated on the basis of people’s decisions under the old policy, meaning the historical data are likely a poor indicator as to how people will act after the change.

We can illustrate this with a gaming example. Suppose we have prior releases on Steam that have been in the market for a few years and enjoy a modest but steady stream of sales every week. Would it make sense to self-finance our next game by doubling the price and investing the windfall into production? Certainly not. Price is one of the factors that goes into a customer’s decision to buy the game and so changing the price should be expected to change the decision.

This conclusion does not require any training in statistics and yet we often do not carry this intuition into similar situations. While we’re moving further away from the specific problem Lucas was dealing with, inference on aggregates in gaming are quite common especially when it comes to measures like follows, the Boxleiter number, or wishlist conversion ratios. Goodhart’s Law (another economist, in this case writing about monetary policy a year before Lucas published his critique) often comes up in these cases, usually reformulated as “When a measure becomes a target, it ceases to be a good measure.” While both are appropriate observations on the misuse of data, Goodhart (as commonly stated) seems to be used to dismiss analysis out of hand more often, while Lucas offers ideas about why things have gone wrong.

Analysts are not necessarily wasting their time by reporting or using these measures, but it is incumbent on them to make sure they are using them appropriately. A common mistake with wishlists is to take the conversion rate as some immutable feature of Steam and seek to maximize them. The previously observed relationship between wishlists and sales reflects developers’ efforts to maximize sales and so it is perfectly reasonable use a wishlist as an indication of a customer’s interest. Developers cannot observe sales until the game is released and so wishlists can provide a useful proxy. However, different outreach strategies can alter the relationship between wishlists and the intent to buy, such as the strategy of associating wishlists (as opposed to sales) with support. Two marketing approaches intended to maximize sales can be compared and evaluated on the basis of wishlists, provided that the underlying relationship between sales and wishlists remains undisturbed.

So far this amounts to practicing good analytical hygiene, but there is a more subtle point to be made here. Advanced statistical learning methods are easily accessible and can be applied to anyone with data. This is a good thing because accessible software has been indispensable for turning data into insight. Unfortunately, ease of use can translate into unfamiliarity with the underlying assumptions of the tools being used. Despite the success of these tools, Lucas’ critique is not about a particular estimation method but rather whether or not what we are estimating carries any information about our proposed changes. An analyst could access Valve’s entire database and warm an orphanage for winter while training some impossibly complex neural network and still derive an estimate of wishlist conversion rates that will not survive a campaign that reseats the focus on the wishlist instead of the sale.

The critique does not necessarily tell us what to do but it should inspire action. Some of the problems facing games analysis can be solved by better research design. A more interesting answer is to follow the response to Lucas’ original critique and attempt to model the specific behaviours underlying the aggregate behaviour. This is hard work, and it does not lend itself to off the shelf solutions. The benefits in terms of insight more than repay the effort, but it is difficult to motivate people to do something difficult and unfamiliar.

But this approach is more familiar and simpler than it first appears. Gaming is in an advantageous position compared to economics. Economists struggle to find measures of relevant factors, while games are able to record nearly every factor they could want for their models. Running experiments in gaming are comparatively inexpensive, and the policies are often clearly defined by designers themselves. Designers may not be accustomed to formally modelling player behaviour but they likely have some idea of what is going on in the player’s head. More impactful analysis has less to do with keeping up to date on the latest machine learning algorithms and is more a matter of buying a designer a coffee and talking about player psychology.

With greater access to data from a virtual environment and the people who are able to change it, the analysts’ job can be focused on formalizing the stated intentions of the designer, and then transforming the resulting analysis into insight for the team. And if you happen to be a designer without a data savvy friend or coworker, I’m always on the lookout for interesting data to look at.

Leave a comment