AI-powered test data generation — from fixtures to E2E scenarios
Where do you find real user data for testing that isn't a GDPR nightmare? Nowhere. You either have them anonymised (expensive, laborious) or you generate them. AI has made synthetic data practically usable.
This article covers 3 levels of generation: fixtures, factory patterns, and E2E data scenarios.
Level 1: AI-generated fixtures
A fixture is a static JSON file in cypress/fixtures/ that a test calls. AI can generate consistent, schema-aware fixtures in seconds.
> Vygeneruj cypress/fixtures/users/premium-users.json
obsahujúci 20 používateľov s nasledujúcou schemou:
{
id: UUID v4,
email: unikátny gmail,
name: { first, last } — slovenské mená,
subscription: { plan: "premium", validUntil: ISO date in future },
preferences: { language: "sk" | "en", marketing: boolean }
}
Dátum validity rozlož od 30 do 400 dní od dnes.
Claude generates valid JSON you can use in a test immediately: cy.fixture('users/premium-users.json').
Level 2: Factory patterns
A factory generates a user with test-time overrides. Use @faker-js/faker + AI for the methods:
// cypress/support/factories/userFactory.ts
import { faker } from '@faker-js/faker/locale/sk';
export const userFactory = {
build(overrides = {}) {
return {
id: faker.string.uuid(),
email: faker.internet.email({ provider: 'example.sk' }),
firstName: faker.person.firstName(),
lastName: faker.person.lastName(),
phone: faker.phone.number('+421 9## ### ###'),
address: {
street: faker.location.streetAddress(),
city: faker.location.city(),
zip: faker.location.zipCode(),
},
...overrides,
};
},
buildPremium(overrides = {}) {
return this.build({
subscription: {
plan: 'premium',
validUntil: faker.date.future({ years: 1 }).toISOString(),
},
...overrides,
});
},
};
// Use in test:
const user = userFactory.build({ email: 'test@example.sk' });
When writing a test scenario, AI understands userFactory.buildPremium() and uses it spontaneously.
Level 3: E2E data scenarios (more advanced)
A complex E2E test needs consistent data across 3+ entities: user + their orders + payment history + refund scenarios. Writing it by hand = hours. AI generates it from a high-level prompt:
> Vygeneruj seed dáta pre E2E scenár "refund process": Scenár potrebuje: - 1 používateľ (premium plan, valid payment method) - 3 objednávky za posledných 30 dní (rôzne statusy: delivered, in-transit, cancelled) - 1 objednávka s poškodeným produktom, vhodná na refund - Admin používateľ s refund permissions Výstup: - cypress/fixtures/scenarios/refund.json — dáta - cypress/support/scenarios/refund.ts — setup helper (API calls na seednutie)
The output is a complete setup you invoke in the test via cy.setupScenario('refund').
GDPR-safe patterns
Especially critical in insurance, healthcare, banking:
- Never use production data. Even anonymised data can be de-anonymised.
- Locale-respecting data. Slovak name, Slovak address, Slovak birth number (with valid check digit).
- Birth numbers generated synthetically — faker.cz has a birth-number generator; for SK we build a helper that respects the format
RRMMDD/XXXXwith a valid check digit. - Avoid real emails. Use domains like
example.sk,test.local. If you need deliverability, Mailgun has a sandbox domain.
Schema-aware AI: inspiration from OpenAPI
If you have the OpenAPI spec of your backend, AI can automatically generate fixtures for every endpoint:
> Pozri na api/openapi.yaml. Pre každý GET endpoint vygeneruj 1 fixture súbor v cypress/fixtures/api/ s realistickou odpoveďou podľa schémy. Použij sk locale, GDPR-safe dáta.
Claude reads the YAML, understands the schemas, and generates consistent JSON responses. 20+ fixtures in 60 seconds.
Multi-tenant patterns
SaaS apps need to isolate data between tenants. The factory should be tenant-aware:
export const orderFactory = {
buildForTenant(tenantId: string, overrides = {}) {
return {
id: `${tenantId}-${faker.string.uuid()}`,
tenantId,
// ... ostatné fields
...overrides,
};
},
};
For an E2E test that calls an AI generator, always provide tenant context: 'seed data for tenant \'acme-corp\' so it is isolated from other tests'. AI respects it.
Practical ROI
- Time to write a seed file: from an average of 45 min → 4 min.
- Fixture consistency between tests: higher (AI follows convention).
- GDPR compliance: we do not touch production data.
Want the same approach at your company? Get in touch — dohodneme 30-minute discovery call.