Skip to main content
In a nutshell: MaxDiff reveals how customers prioritize a large set of options without overwhelming them to evaluate everything at once. The index score estimates how likely an option is to be chosen as “best” when compared against a random selection of competing options. Scores are normalized so that the average option equals 100. An option with an index score of 200 is expected to be chosen roughly twice as often as an option with a score of 100. We compute it using a state-of-the-art Hierarchical Bayes analysis.
Imagine you have 20 product feature ideas but only have the capacity to build three of them this quarter. Your first thought may be to ask respondents to rate each option on a 1 to 5 likert scale and pick the top three by average rating. While conceptually simple, the result will most likely be a large number of uninformative ties. Respondents tend to put most reasonable options at the top of the scale, especially in Western culture, where it is polite to agree. To get something informative, we need to force respondents to make tough choices: We do not ask respondents how much they like each option; instead we ask them to rank options against each other. However, asking respondents rank 20 options at once is likely to overwhelm them. This is where MaxDiff comes in.

How MaxDiff works

Instead of ranking all the options at once, your participants repeatedly select the best and worst option from random subsets of e.g. four options. Each selection in itself is simple for the respondent. But in aggregate, we can reconstruct how much they like each option relative to each other. Let’s go through this in an example. Suppose you want to know how people rank eight dessert options: Apple Pie, Chocolate Cake, Ice Cream, Cheesecake, Brownies, Donuts, Cupcakes, and Cookies. Participants will then see a series of six random subsets and select the best and worst option from that subset. For an exemplary participant, this may look as follows:
StepOptions shownSelection
1Chocolate Cake, Ice Cream, Apple Pie, DonutsBest: Chocolate Cake, Worst: Donuts
2Chocolate Cake, Cheesecake, Brownies, CupcakesBest: Chocolate Cake, Worst: Cupcakes
3Chocolate Cake, Cookies, Apple Pie, DonutsBest: Chocolate Cake, Worst: Donuts
4Ice Cream, Cheesecake, Brownies, CookiesBest: Ice Cream, Worst: Cookies
5Ice Cream, Cheesecake, Apple Pie, CupcakesBest: Ice Cream, Worst: Cupcakes
6Brownies, Cookies, Apple Pie, DonutsBest: Brownies, Worst: Donuts
We may find a ranking consistent with the selections we observer, for example:
Chocolate Cake > Ice Cream > Cheesecake > Brownies > Cookies > Apple Pie > Cupcakes > Donuts
Notice, however, that we cannot tell for sure whether this particular respondent prefers cheesecake to brownies or the other way around. The selections indicate the underlying preferences but do not uniquely determine them. Therefore, we infer how respondents rank each option through a statistical model that even uses similarities between responses to better estimate what each respondent thinks about each option.

Inferring the underlying rankings

After observing all selections, we compute for each participant and option how likely they will like that option best on a random screen of other options. In the example above, the respondent chose Chocolate Cake as the best option three times. Therefore, we expect Chocolate Cake to perform pretty well against random competitors. How much participants like an option is quantified through the so-called index score. It estimates for each option how likely a random participant is to select that option against a random subset of other options. The score is calibrated so that the average option has an index score of 100. If option A has double the index score of option B, the participant is twice as likely to like A best on a random subset than to like B best. We estimate index scores with a state-of-the-art statistical model of how respondents selections on a screen: Hierarchical Bayes. This accounts for various factors such as:
  • Correlations between options: Imagine the example above but with no general consensus. However, our model may find that respondents who like chocolate cake tend to also like brownies, and the respondent in question liked chocolate cake. Then, our model will infer that the respondent may prefer brownies.
  • The hierarchy between options: If I know Johnny likes Donuts more than Cookies, and Brownies more than Donuts, then Johnny probably also likes Brownies more than Cookies, even if Johnny was never asked to pick between the two.
  • Correlations between respondents: Imagine that for a particular respondent, we lack indicators on whether they like e.g. brownies or cookies better. If, however, the general consensus is that cookies are preferred to brownies, our model will infer that this respondent will probably follow the trend.
  • Accidental misclicks: Sometimes, respondents make mistakes. Our model is robust to those. If, for example, a respondent consistently likes Chocolate Cake best but in one screen likes it worst, our model will infer that this was probably a misclick.

What Should I use MaxDiff for?

MaxDiff shines whenever you need to prioritize among a long list of options. In the real world:
  • Marketers cannot launch 10 campaigns at once
  • Engineers cannot build the entire feature roadmap in one sprint
  • Designers cannot highlight everything on prime real estate
Some exemplary use cases include:
  • Prioritizing a feature roadmap into which new capabilities are most likely to 1) drive net new app downloads, or 2) encourage existing customers to re-up their subscription
  • Prioritizing which messaging themes should be featured first on a LinkedIn campaign
  • Prioritizing CMF design, guiding which color smart speaker to launch 1st, 2nd, and 3rd, and the impact each additional color has on your customer’s likelihood to purchase your speaker vs. a competitors
  • Prioritizing which features belong in the free vs. pro vs. enterprise subscription tier. Placing the right features in free to attract new users, while placing the most valuable, potentially more niche features behind a paywall to maximize monetization
  • Prioritize which customer frustrations and pain points generate the most angst and risk of churn, guiding engineering to focus on addressing the biggest risks
MaxDiff should be avoided when
  • You need absolute scoring instead of relative rankings: Use matrix questions instead; MaxDiff only ranks options relative to each other.
  • The number of options is small: For five or less options, a ranking question is more effective.