Pre/Post Benchmarking Design

"The redesign looks great" is not evidence. "SUS improved from 62 to 78" is evidence.

UX benchmarking transforms subjective opinions about design quality into objective measurements you can track over time, compare across competitors, and use to calculate ROI.

The Three Goals of Benchmarking

Every benchmarking study answers one of three questions:

Goal	Question	Use Case
Benchmark	"Where are we now?"	Establishing a baseline before changes
Track	"Did we get better?"	Measuring pre/post redesign impact
Compare	"Are we better than them?"	Competitive analysis

Goal 1: Benchmark (Baseline)

Before you can measure improvement, you need to know where you started.

When to use:

Before a major redesign initiative
When taking over a new product
At regular intervals (quarterly, annually) for trending

What you get:

A quantified starting point
Objective evidence of current state
Ammunition for securing redesign budget

Goal 2: Track (Pre/Post)

The most powerful use of benchmarking: proving that your work made a measurable difference.

When to use:

After a significant redesign ships
To validate that fixes actually improved the experience
For quarterly/annual progress reporting

What you get:

Evidence of improvement (or regression)
ROI calculation inputs
Credibility for future initiatives

The Design:

Goal 3: Compare (Competitive)

How does your experience stack up against alternatives?

When to use:

Competitive intelligence gathering
Identifying industry best practices
Setting realistic improvement targets

What you get:

Relative positioning in the market
Specific areas where competitors excel
Evidence for competitive differentiation strategy

The Design:

The Study Design

Method: Unmoderated Remote Testing

For benchmarking at scale, unmoderated remote testing is typically the right choice:

Factor	Moderated	Unmoderated
Sample size	5-12 (expensive)	30-100+ (scalable)
Cost per participant	High	Low
Depth of insight	Deep qualitative	Quantitative metrics
Geographic reach	Limited	Global
Scheduling	Complex	Participants self-schedule

Sample Size: n=30+ Per Segment

Sample size determines how stable your metrics are:

Sample Size	What You Get	Use Case
n=5	Insights, not metrics	Qualitative usability testing
n=12	Rough directional signal	Early-stage evaluation
n=30	Stable mean, narrow confidence interval	Benchmarking single segment
n=50+	High precision	When small differences matter

The Math:

With n=30, a typical SUS study has a 95% confidence interval of approximately ±6 points. This means if your measured SUS is 72, the true score is likely between 66 and 78.

With n=12, that interval might be ±10 points—too wide to detect meaningful differences.

Segmentation

If your product serves distinct user groups, benchmark each separately:

Segment	Why Separate
New vs. Returning users	Learnability vs. efficiency
Free vs. Paid users	Different feature access
Mobile vs. Desktop	Different interaction patterns
Power users vs. Casual	Different mental models

Each segment needs n=30+ for stable metrics. A study with n=30 total across 3 segments (n=10 each) produces unreliable segment-level comparisons.

The Metric: System Usability Scale (SUS)

The System Usability Scale is the industry standard for measuring perceived usability. It is fast, reliable, and benchmarkable.

Why SUS?

Advantage	Explanation
Standardized	Same 10 questions everywhere, enabling comparison
Benchmarkable	Decades of data establish what scores mean
Quick	10 questions, under 2 minutes to complete
Reliable	High internal consistency across contexts
Technology-agnostic	Works for websites, apps, hardware, anything

Interpreting SUS Scores

Score	Grade	Interpretation
80+	A	Excellent—users love it
70-79	B	Good—above average
68	C	Average—industry midpoint
50-67	D	Below average—needs work
<50	F	Poor—significant usability problems

Complementary Metrics

SUS measures overall perceived usability. For a complete picture, add:

Metric	What It Measures	When to Add
Task Success Rate	Can users complete key tasks?	Always
Time on Task	How efficiently can they complete tasks?	When speed matters
SEQ	Per-task difficulty rating	When task-level insight needed
NPS	Likelihood to recommend	When loyalty/advocacy matters
CSAT	Satisfaction with specific interaction	For transactional experiences

The Trap: Comparing Apples to Oranges

This is where benchmarking studies go wrong.

The Fidelity Problem

Never compare a live site with a Figma prototype.

Live Site	Prototype
Real load times	Instant transitions
Actual data	Placeholder content
Full functionality	Partial flows only
Real errors and edge cases	Happy path only
Authentication, sessions	None

The Solution: Compare Apples to Apples

Comparison Type	Valid Approach
Pre/Post Redesign	Both must be live, or both must be same-fidelity prototype
Competitor Analysis	All must be live production sites
Concept Testing	All concepts at same prototype fidelity

Other Comparison Traps

Trap	Problem	Fix
Different task sets	Cannot compare if tasks differ	Use identical task scenarios
Different user segments	Novices vs. experts skews results	Recruit same profile for all conditions
Different time periods	Seasonal effects, market changes	Run conditions simultaneously when possible
Different devices	Mobile vs. desktop not comparable	Control for device type

Running a Benchmark Study

Step-by-Step Process

1. Define Success Metrics

Before recruiting, decide exactly what you are measuring:

Primary metric (usually SUS)
Secondary metrics (task success, time, SEQ)
Target score (if tracking improvement)

2. Design Task Scenarios

Create realistic tasks that cover key user journeys:

Task	Coverage	Success Criterion
"Find the pricing for the Pro plan"	Discovery, navigation	Correct answer given
"Add a new team member to your account"	Core workflow	Task completed
"Cancel your subscription"	Support flow	Reached confirmation

3. Build the Test

Using an unmoderated testing platform:

Welcome and consent
Screening questions (if needed)
Task scenarios with success measures
Post-task questions (SEQ for each task)
Post-study questionnaire (SUS, open-ended)
Thank you and compensation

4. Recruit Participants

n=30+ per segment
Match your actual user profile
Screen out irrelevant populations
Consider over-recruiting by 15-20% for dropouts

5. Analyze and Report

Metric	Report
SUS	Mean, 95% CI, comparison to benchmark/target
Task Success	Percentage per task, overall rate
Time on Task	Median (means are skewed by outliers)
SEQ	Mean per task, identify problem tasks

6. Track Over Time

Maintain a benchmark history:

Calculating ROI

Benchmarking provides the inputs for calculating research ROI:

The Formula

ROI = (Value of Improvement - Cost of Research) / Cost of Research

Example Calculation

Factor	Value
Baseline conversion rate	2.0%
Post-redesign conversion rate	2.4%
Monthly visitors	100,000
Average order value	€50
Research + redesign cost	€25,000

Monthly revenue lift:

Before: 100,000 × 2.0% × €50 = €100,000
After: 100,000 × 2.4% × €50 = €120,000
Lift: €20,000/month

ROI (first year):

Annual lift: €240,000
Cost: €25,000
ROI: (€240,000 - €25,000) / €25,000 = 860%

What This Means for Practice

Benchmarking transforms UX from opinion to evidence.

Establish baselines before any major initiative—you cannot prove improvement without a starting point
Use n=30+ per segment for stable metrics; n=5 is for insights, not measurement
Standardize on SUS for comparability across time and competitors
Compare apples to apples—never benchmark live sites against prototypes
Track over time to demonstrate cumulative impact
Calculate ROI to secure future investment

The goal is not to produce impressive numbers. It is to produce defensible evidence that your work made a measurable difference.

UX Benchmarking: Measuring Progress Over Time

Summary

The Three Goals of Benchmarking

Goal 1: Benchmark (Baseline)

Goal 2: Track (Pre/Post)

Goal 3: Compare (Competitive)

The Study Design

Method: Unmoderated Remote Testing

Sample Size: n=30+ Per Segment

Segmentation

The Metric: System Usability Scale (SUS)

Why SUS?

Interpreting SUS Scores

Complementary Metrics

The Trap: Comparing Apples to Oranges

The Fidelity Problem

The Solution: Compare Apples to Apples

Other Comparison Traps

Running a Benchmark Study

Step-by-Step Process

Calculating ROI

The Formula

Example Calculation

What This Means for Practice

RELATED RESOURCES

Advanced Surveys: Pricing & Feature Prioritization

Sample Size Calculator — Tool and Explanations

UX Measurement Instruments: Scales, Scores, and What They Actually Measure

READY TO TAKE ACTION?