Ethics of AI in QA: GDPR, hallucinations, transparency
AI in QA is not only a technical problem — it is also a governance problem. Once your team starts using Claude Code or GitHub Copilot, you are sending data to external servers. In regulated industries (banking, healthcare, insurance) that is a question for legal and compliance, not for a senior QA engineer.
This article covers the 3 main risks and how to mitigate them.
Risk 1: Production data leakage
Scenario: a tester pastes into a Claude Code prompt a real dataset so the AI can analyse 'why this user can't complete checkout'. The Anthropic log now contains the name, email and address of a real customer.
Mitigation:
- Anthropic / OpenAI enterprise plan guarantees data is not used for training. But it is still stored in logs.
- Self-hosted alternative — LLMs like Llama or Mistral running on-premise. Expensive but for highly-regulated data the only way.
- Data sanitization layer — before sending to AI, run
anonymize.pywhich replaces PII with regexes or an ML classifier. We use Microsoft Presidio. - Prompt guidelines — the team has an internal doc 'what never to paste into AI'. Plus staff training.
Risk 2: Hallucinations
AI produces output that looks authoritative but is false. In a test it can look like this:
// AI hallucinated selector — tento element neexistuje
cy.get('[data-qa="premium-badge"]').should('be.visible');
// AI hallucinated API response
cy.intercept('GET', '/api/users', { total: 42, users: [/* vymyslené */] });
The test passes in a mocked environment but fails in production. Worse: the test does not even fail, it just doesn't cover what it was supposed to cover.
Mitigation:
- Always run the test in a real staging environment before merging.
- Human review — AI does not replace review, it complements it.
- Negative review — for important tests, ask a second AI (different model) 'what could be wrong'. Disagreement = red flag.
- Coverage verification — if the test passes but a bug in the area still escapes → the test wasn't enough for that scenario.
Risk 3: Non-auditability
The regulator asks: 'Why did this test fail at 14:32 and get approved at 14:35?' If your QA workflow was 'AI edited it and I clicked merge', you don't have a good answer.
Mitigation:
- Versioned prompts — every AI prompt that edited a test or code is versioned in git.
- AI-generated commit messages separated from human commit messages. In the PR template: '[AI-assisted]' tag.
- Audit log — every AI tool with an enterprise plan (Claude Enterprise, Copilot Business) offers an audit log API. Download daily, retain for 2 years.
- Human-in-the-loop sign-off — for regulated changes (e.g. tests touching financial calculations) a senior engineer sign-off is required.
Regulated industries — specific requirements
| Industry | Key regulation | Impact on AI in QA |
|---|---|---|
| Banking | PSD2, DORA, GDPR | Self-hosted LLM or enterprise contract with EU data residency |
| Healthcare | HIPAA, GDPR, EHDS | PHI must not reach external AI; rigorous anonymization pipeline |
| Insurance | Solvency II, GDPR | AI for regression OK; AI for risk-engine logic problematic |
| Education (detské dáta) | COPPA ekvivalent, GDPR | Never feed real children's data to AI |
Practical checklist for a QA lead
- Do we have a signed DPA (data processing agreement) with the AI vendor? (Anthropic, OpenAI, GitHub — yes, you get one after Enterprise signup.)
- Are we using an enterprise plan with the 'do not train' flag?
- Do we have a prompt guideline document and has the team been trained?
- Is a PII anonymization layer deployed before AI calls?
- Is the audit log archived for at least 2 years?
- Has the security team approved the AI workflow in a written statement?
Conclusion
AI ethics in QA is not a limitation — it is a prerequisite for use. Teams that skip it accumulate debt that shows up in an audit. Five days invested in governance setup saves months of panicking a year later.
Want the same approach at your company? Get in touch — dohodneme 30-minute discovery call.