Technical

AI-powered test data generation — from fixtures to E2E scenarios

Where do you find real user data for testing that isn't a GDPR nightmare? Nowhere. You either have them anonymised (expensive, laborious) or you generate them. AI has made synthetic data practically usable.

This article covers 3 levels of generation: fixtures, factory patterns, and E2E data scenarios.

Level 1: AI-generated fixtures

A fixture is a static JSON file in cypress/fixtures/ that a test calls. AI can generate consistent, schema-aware fixtures in seconds.

> Vygeneruj cypress/fixtures/users/premium-users.json
obsahujúci 20 používateľov s nasledujúcou schemou:

{
  id: UUID v4,
  email: unikátny gmail,
  name: { first, last } — slovenské mená,
  subscription: { plan: "premium", validUntil: ISO date in future },
  preferences: { language: "sk" | "en", marketing: boolean }
}

Dátum validity rozlož od 30 do 400 dní od dnes.

Claude generates valid JSON you can use in a test immediately: cy.fixture('users/premium-users.json').

Level 2: Factory patterns

A factory generates a user with test-time overrides. Use @faker-js/faker + AI for the methods:

// cypress/support/factories/userFactory.ts
import { faker } from '@faker-js/faker/locale/sk';

export const userFactory = {
  build(overrides = {}) {
    return {
      id: faker.string.uuid(),
      email: faker.internet.email({ provider: 'example.sk' }),
      firstName: faker.person.firstName(),
      lastName: faker.person.lastName(),
      phone: faker.phone.number('+421 9## ### ###'),
      address: {
        street: faker.location.streetAddress(),
        city: faker.location.city(),
        zip: faker.location.zipCode(),
      },
      ...overrides,
    };
  },

  buildPremium(overrides = {}) {
    return this.build({
      subscription: {
        plan: 'premium',
        validUntil: faker.date.future({ years: 1 }).toISOString(),
      },
      ...overrides,
    });
  },
};

// Use in test:
const user = userFactory.build({ email: 'test@example.sk' });

When writing a test scenario, AI understands userFactory.buildPremium() and uses it spontaneously.

Level 3: E2E data scenarios (more advanced)

A complex E2E test needs consistent data across 3+ entities: user + their orders + payment history + refund scenarios. Writing it by hand = hours. AI generates it from a high-level prompt:

> Vygeneruj seed dáta pre E2E scenár "refund process":

Scenár potrebuje:
- 1 používateľ (premium plan, valid payment method)
- 3 objednávky za posledných 30 dní (rôzne statusy: delivered,
  in-transit, cancelled)
- 1 objednávka s poškodeným produktom, vhodná na refund
- Admin používateľ s refund permissions

Výstup:
- cypress/fixtures/scenarios/refund.json — dáta
- cypress/support/scenarios/refund.ts — setup helper
  (API calls na seednutie)

The output is a complete setup you invoke in the test via cy.setupScenario('refund').

GDPR-safe patterns

Especially critical in insurance, healthcare, banking:

  • Never use production data. Even anonymised data can be de-anonymised.
  • Locale-respecting data. Slovak name, Slovak address, Slovak birth number (with valid check digit).
  • Birth numbers generated synthetically — faker.cz has a birth-number generator; for SK we build a helper that respects the format RRMMDD/XXXX with a valid check digit.
  • Avoid real emails. Use domains like example.sk, test.local. If you need deliverability, Mailgun has a sandbox domain.

Schema-aware AI: inspiration from OpenAPI

If you have the OpenAPI spec of your backend, AI can automatically generate fixtures for every endpoint:

> Pozri na api/openapi.yaml. Pre každý GET endpoint
vygeneruj 1 fixture súbor v cypress/fixtures/api/
s realistickou odpoveďou podľa schémy.
Použij sk locale, GDPR-safe dáta.

Claude reads the YAML, understands the schemas, and generates consistent JSON responses. 20+ fixtures in 60 seconds.

Multi-tenant patterns

SaaS apps need to isolate data between tenants. The factory should be tenant-aware:

export const orderFactory = {
  buildForTenant(tenantId: string, overrides = {}) {
    return {
      id: `${tenantId}-${faker.string.uuid()}`,
      tenantId,
      // ... ostatné fields
      ...overrides,
    };
  },
};

For an E2E test that calls an AI generator, always provide tenant context: 'seed data for tenant \'acme-corp\' so it is isolated from other tests'. AI respects it.

Practical ROI

  • Time to write a seed file: from an average of 45 min → 4 min.
  • Fixture consistency between tests: higher (AI follows convention).
  • GDPR compliance: we do not touch production data.

Want the same approach at your company? Get in touch — dohodneme 30-minute discovery call.