Governance

Ethics of AI in QA: GDPR, hallucinations, transparency

AI in QA is not only a technical problem — it is also a governance problem. Once your team starts using Claude Code or GitHub Copilot, you are sending data to external servers. In regulated industries (banking, healthcare, insurance) that is a question for legal and compliance, not for a senior QA engineer.

This article covers the 3 main risks and how to mitigate them.

Risk 1: Production data leakage

Scenario: a tester pastes into a Claude Code prompt a real dataset so the AI can analyse 'why this user can't complete checkout'. The Anthropic log now contains the name, email and address of a real customer.

Mitigation:

  • Anthropic / OpenAI enterprise plan guarantees data is not used for training. But it is still stored in logs.
  • Self-hosted alternative — LLMs like Llama or Mistral running on-premise. Expensive but for highly-regulated data the only way.
  • Data sanitization layer — before sending to AI, run anonymize.pywhich replaces PII with regexes or an ML classifier. We use Microsoft Presidio.
  • Prompt guidelines — the team has an internal doc 'what never to paste into AI'. Plus staff training.

Risk 2: Hallucinations

AI produces output that looks authoritative but is false. In a test it can look like this:

// AI hallucinated selector — tento element neexistuje
cy.get('[data-qa="premium-badge"]').should('be.visible');

// AI hallucinated API response
cy.intercept('GET', '/api/users', { total: 42, users: [/* vymyslené */] });

The test passes in a mocked environment but fails in production. Worse: the test does not even fail, it just doesn't cover what it was supposed to cover.

Mitigation:

  • Always run the test in a real staging environment before merging.
  • Human review — AI does not replace review, it complements it.
  • Negative review — for important tests, ask a second AI (different model) 'what could be wrong'. Disagreement = red flag.
  • Coverage verification — if the test passes but a bug in the area still escapes → the test wasn't enough for that scenario.

Risk 3: Non-auditability

The regulator asks: 'Why did this test fail at 14:32 and get approved at 14:35?' If your QA workflow was 'AI edited it and I clicked merge', you don't have a good answer.

Mitigation:

  • Versioned prompts — every AI prompt that edited a test or code is versioned in git.
  • AI-generated commit messages separated from human commit messages. In the PR template: '[AI-assisted]' tag.
  • Audit log — every AI tool with an enterprise plan (Claude Enterprise, Copilot Business) offers an audit log API. Download daily, retain for 2 years.
  • Human-in-the-loop sign-off — for regulated changes (e.g. tests touching financial calculations) a senior engineer sign-off is required.

Regulated industries — specific requirements

Industry Key regulation Impact on AI in QA
BankingPSD2, DORA, GDPRSelf-hosted LLM or enterprise contract with EU data residency
HealthcareHIPAA, GDPR, EHDSPHI must not reach external AI; rigorous anonymization pipeline
InsuranceSolvency II, GDPRAI for regression OK; AI for risk-engine logic problematic
Education (detské dáta)COPPA ekvivalent, GDPRNever feed real children's data to AI

Practical checklist for a QA lead

  1. Do we have a signed DPA (data processing agreement) with the AI vendor? (Anthropic, OpenAI, GitHub — yes, you get one after Enterprise signup.)
  2. Are we using an enterprise plan with the 'do not train' flag?
  3. Do we have a prompt guideline document and has the team been trained?
  4. Is a PII anonymization layer deployed before AI calls?
  5. Is the audit log archived for at least 2 years?
  6. Has the security team approved the AI workflow in a written statement?

Conclusion

AI ethics in QA is not a limitation — it is a prerequisite for use. Teams that skip it accumulate debt that shows up in an audit. Five days invested in governance setup saves months of panicking a year later.


Want the same approach at your company? Get in touch — dohodneme 30-minute discovery call.