Summary
LLMs are concept-transformation engines, not knowledge generators. They excel at restructuring and categorizing information you provide but struggle with novel insights and deep contextual understanding. Research shows AI can identify 77% of usability issues experts agree with, but misses 60% of unique human-identified problems. The researcher's value shifts toward strategic framing, critical validation, and influential communication.
The biggest mistake I see teams make with AI is treating it like a magic black box. They throw unstructured data in and expect coherent, reliable insights to come out. This is dangerous. We are currently spending more time figuring out how to define reliable workflows than we are saving time by using them.
Large Language Models like ChatGPT, Claude, and Gemini have become ubiquitous, and with them comes a torrent of hype, fear, and misunderstanding. It is crucial to curb the hype train with a dose of reality.
The Current State
For many agencies, vendors, and individual researchers, we are currently in an investment phase. This is not a failure of the technology, it is a natural stage in its adoption. We are not just using a tool; we are inventing the processes for that tool.
On one side, you have evangelists who claim AI will automate the entire research process, making human researchers obsolete. On the other, you have skeptics who dismiss the technology as a flawed, biased, and unreliable gimmick.
As is often the case, the reality is more nuanced.
What LLMs Actually Are
The "T" in GPT stands for Transformer [2]. This is not just a technical term, it is the most useful description of its core function.
An LLM is not a knowledge-generation machine; it is a concept-transformation engine. It is exceptionally good at taking information in one format and structuring or rephrasing it into another. It is less a generator of new facts and more a manipulator of existing concepts.
How They Work
At its core, an LLM is an advanced prediction machine. It analyzes vast amounts of text to learn statistical relationships between words. When you give it a prompt, it does not "think" or "know" the answer like a human does, it calculates the most probable next word (or "token") based on patterns it has learned.
This predictive nature is why models can "hallucinate". They are designed to generate plausible-sounding text, not to state verified facts.
Understanding this is the key to using an LLM effectively: treat it as a powerful autocomplete, search, and connector for concepts, not as a perfectly factual database.
What the Research Shows
One study [1] provides a clear picture of the current state:
When experienced UX professionals evaluated usability issues suggested by an LLM:
- They agreed with 77% of the issues the AI found
- But the AI missed around 60% of the unique problems that human experts identified
This is not a failure of the technology. It is a clear signal of its proper role: assistant, not replacement.
AI excels at identifying common, pattern-based issues because it has been trained on vast amounts of data reflecting these known problems. It is less effective at:
- Uncovering novel issues
- Understanding deep contextual nuance
- Identifying subtle emotional reactions that a human observer would catch
The Evolving Role of the Researcher
These limitations highlight the evolution of the researcher's role. The value of the human researcher shifts away from the tedious work of raw analysis and toward the strategic work that AI cannot do:
Strategic Framing
Asking the right questions and designing sound research in the first place. AI cannot tell you what questions matter, that requires understanding the business context and user landscape.
Critical Validation
Questioning the AI's output, spotting its biases, and separating signal from noise. The AI produces drafts; you produce judgment.
Foundational models are often trained to be helpful and agreeable, a trait known as "sycophancy". To get objective results, you must turn an agreeable assistant into a critical sparring partner.
Influential Communication
Translating findings into clear, actionable recommendations that drive business decisions. The political and organizational skill of getting insights implemented remains distinctly human.
Best Use Cases for LLMs in Research
Based on experience, here are the tasks where LLMs provide the most reliable value:
| Task | Why It Works |
|---|---|
| Tagging and Thematic Analysis | Systematically categorizing qualitative data based on a taxonomy you provide |
| Generative Ideation | Exploring ideas for target groups, segments, or research questions based on a brief |
| Instrument Stress-Testing | Reviewing interview guides or survey questions for structural issues |
| Code Generation | Writing Python or R scripts for quantitative analysis |
| Translation and Localization | Initial translations for cross-cultural research (with human review) |
| Communication Polish | Feedback on reports and clearer ways to present findings |
| Efficiency Gains | Reducing time on repetitive tasks (see ROI of UX Research) |
Practical Workflow: Thematic Analysis with an LLM
Here is a concrete, step-by-step workflow for one of the most common AI-assisted research tasks: thematic analysis of qualitative data.
Step 1: Prepare Tidy Data
The biggest mistake is feeding unstructured transcripts into an LLM. Instead, use "Tidy Data" principles. Create a simple table where every row is a participant quote and columns represent metadata (participant ID, task context, timestamp). Anonymize all PII (Personally Identifiable Information) before upload.
Step 2: Engineer a Structured Prompt
Do not ask the AI to "find insights." Give it a mechanical task with explicit constraints:
- Role: "You are a meticulous UX Researcher."
- Task: "Categorize each user quote based on the taxonomy provided below."
- Taxonomy: Provide strict definitions (e.g., "Usability," "Feature Request," "Trust/Security").
Step 3: The Committee of Raters
To increase reliability, use multiple models (e.g., GPT-4 and Claude) as a "Committee of Raters." Feed them the same data and prompt.
- Where they agree, you have high confidence.
- Where they disagree, you have a signal for nuance that requires human review.
This approach mirrors traditional inter-rater reliability practices in qualitative research, using AI disagreement as a flag for human attention rather than a failure.
Step 4: Human Validation (The Nuance Check)
The AI sees text; you saw the session. Perform a "Nuance Check" on the output:
- Sarcasm: Did the user say "Great job" with an eye-roll? AI will tag that as "Positive Sentiment." You must correct it.
- Silence: Did the user hesitate before clicking? AI cannot see silence.
- Context: Did the user's frustration stem from the interface or from an unrelated interruption during the session?
What This Means for Practice
The goal is not to replace your judgment with AI but to use AI to amplify your judgment. The most effective researchers will be those who:
- Understand what LLMs are actually good at (transformation, not generation)
- Provide structured inputs that play to those strengths
- Maintain rigorous human oversight of all outputs
- Focus their own energy on the strategic work AI cannot do
This is not about learning a specific tool, tools will change. It is about learning a way of thinking about human-AI partnership that will outlast any particular model or platform.
References
- [1]
- [2]Ashish Vaswani et al.. (2017). "Attention Is All You Need". Advances in Neural Information Processing Systems.Link