Sample Sizes: Beyond the Magic Numbers

One question I frequently hear is: "How many users do we need to test/interview/survey?"

The answer is fundamental to the value of our work because the ultimate goal of most research is generalization. We study a small, manageable group of people (a sample) to draw reasonably confident conclusions about a much larger group (the population), our customer segments or entire user base.

The Famous "Rule of Five"

For decades, the conversation around sample size in UX research has been dominated by a single number: five. The idea that you only need to test with five users ^[1] is one of the most famous, and most misunderstood, heuristics in our field.

While it was instrumental in making research feel accessible, it is critical to understand its limitations.

Why It Works for Usability

Usability issues often violate near-universal cognitive principles, like the need for clear feedback or consistent design patterns. Because these principles are shared by most people, the same problems surface for many users, allowing a small sample to be surprisingly effective at finding them.

Why It Does Not Generalize

Broader questions of usefulness or desirability depend on highly variable personal and cultural experiences. To understand these, a larger and more diverse sample is required.

Practical Sample Size Guidelines

Based on experience, here are rules of thumb for common study types:

For Qualitative Methods (Interviews, UX Tests)

Where the focus is on understanding issues:

Minimum: n = 10 per target group
Ideal range: n = 15-30 per target group

For Quantitative Methods (Surveys)

Where the aim is statistical confidence:

Minimum: n = 30 per target group
Ideal range: n = 50-200 per target group

When the Heuristics Fall Short

These numbers are starting points, not magic. The optimal sample size is less about a single number and more about the confidence you need to make a good decision.

Your Users Are Very Diverse

The n = 15 or n = 30 rules apply to each distinct user segment. If your product serves both "casual hobbyists" and "expert professionals," and you need to understand both, you must recruit a sample for each.

Three key segments means 45-90 total participants, not 15-30 overall. A small, mixed sample often masks the real issues affecting a specific group.

You Are Measuring Tricky Metrics

Some metrics, like time-on-task or money spent, can be skewed by a few extreme outliers. A single slow user can dramatically inflate the average. In these cases, the median is more reliable, and a larger sample helps ensure stability.

You Are Looking for a Small Improvement

Spotting a subtle change, like a 5-point increase in a SUS score or a 5% improvement in task success, requires more statistical power. A small sample may not reliably detect a small effect.

You Need to Find Rare Problems

If you are hunting for a critical but rare issue affecting only 5% of users, your odds of seeing it even once in a 10-person study are less than 50/50. Even with 30 participants, there is roughly a 20% chance you will miss it entirely.

You Need High Precision

A survey result is never a single, perfect number, it is a number with a "plus or minus" range (margin of error):

Sample Size	Margin of Error (95% confidence)
n = 30	±18%
n = 100	±10%
n = 400	±5%

Values assume maximum variance (p = 0.5) at 95% confidence, using Cochran's formula for proportions.

With 30 participants, a result of 50% is really "50% ±18%", the true value could be anywhere from 32% to 68%.

The Critical Question: How Many of Whom?

The most important question is not "how many?" but "how many of whom?"

If your product serves different types of users, you must test with each type. Whether you define these as Personas or simply distinct segments, the rules are the same: each segment needs its own sample.

The Saturation Principle

For qualitative research, the goal is to reach saturation, the point where you are no longer hearing new information. ^[4] When the eighth user in a row points out the same confusing button, you have likely reached saturation for that issue within that segment.

Using smaller samples works best when testing with a single, homogeneous group. The variance (degree of difference from one person to the next) is low, so patterns repeat quickly.

Managing Stakeholder Expectations

In the real world, researchers are often caught in a difficult position. We conduct a qualitative study with a small sample (say, n = 8), but stakeholders want to generalize. They hear "three out of eight users were confused" and immediately want to report that "nearly 40% of our users will be confused."

This is where your role becomes critical. Frame findings not as statistical inference but as logical inference:

This ability to translate analytical output into responsible, strategic communication is a skill that becomes even more valuable as AI handles more raw analysis.

How to Calculate Your Own Number

Instead of memorizing heuristics, ask yourself these three questions before every study:

Question 1: "What is the smallest change that matters?"

This is your Minimum Detectable Effect (MDE). It determines everything.

If you need to spot a 1% conversion lift: You need a massive sample (thousands).
If you only care about major usability blockers (30%+ failure rate): A small sample of 5-10 is fine.

The bigger the change you care about, the fewer participants you need to see it reliably.

Question 2: "How diverse is the audience?"

The "n=5" rule applies to one homogeneous segment. The moment you have distinct user types, you multiply.

Segments	Minimum Sample
1 (e.g., "All users")	5-10
2 (e.g., "Experts" + "Novices")	10-20
3 (e.g., "Admin" + "Manager" + "End User")	15-30

These ranges assume tactical usability testing aimed at discovering common problems (p ≈ 0.30). For strategic or generative research with different saturation targets, see the methodology details below.

If your sample mixes segments without separating them, you will see contradictory findings and miss segment-specific patterns entirely.

Question 3: "Are you measuring time?"

Time-on-task data is notoriously noisy. One slow user (distracted, confused, or just methodical) can ruin the average and make your data meaningless.

For metrics (where you need stable averages): Aim for n=30+ to stabilize the data.
For insights (where you are looking for patterns, not precise measurements): n=5-10 is often enough.

Guardrails for Smarter Decisions

Instead of relying only on rules of thumb, use these guardrails:

Start with the decision: Ask "What is the smallest change that would actually make us do something different?" This is your Minimum Detectable Effect. If the change you care about is small, you need a larger sample.

Quantify your uncertainty: Get comfortable with "plus or minus." Always report a confidence interval to show the range of uncertainty around your metrics.

Choose the right study design: Within-subjects designs are more statistically powerful and require fewer participants than between-subjects designs.

When stakes are high, check your power: For critical decisions, major redesigns, pricing changes, do a power analysis. It ensures you do not miss a real finding or act on a statistical fluke.

What This Means for Practice

The rise of automation has dramatically lowered the cost of processing larger datasets. The old constraints are no longer as binding. We now have an opportunity to reinvest resources into recruiting larger, more method-appropriate samples.

The goal is not to create a new magic number, but to move beyond the old debate and invest in sample sizes that give us confidence in our findings.

The Math Behind the Numbers

This article gives you the strategic framework for sample size decisions. If you want to see the actual formulas, assumptions, and literature behind these numbers, read Sample Size Formulas Explained: The Math Behind the Numbers. You can also try our interactive Sample Size Calculator to run the numbers for your specific study.

Sample size is not a bureaucratic requirement. It is the foundation of whether your findings can be trusted.

Summary