Performance

AI-assisted performance testing with Locust

Classic load tests have a problem: they simulate a naive user. One login per second, one click, one POST. Reality is different — users open 5 tabs, abandon checkout for 3 minutes, come back, try 2 payments. AI can model this behaviour more realistically than a hand-written Locust script.

This article shows how to build an AI-powered Locust setup with a realistic user-behaviour model.

What Locust is and why not JMeter

Locust is a Python-based framework for distributed load testing. Compared to JMeter:

  • You write tests in Python (not by clicking in a UI).
  • Easier to distribute across workers.
  • Native support for complex user behaviour (task sets, weights).
  • Live web UI with real-time metrics.

For AI-integrated scenarios Locust is clearly the better choice — Python stack, flexibility.

Traditional Locust script vs. realistic

Classic — all users do the same thing:

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)

    @task
    def browse(self):
        self.client.get("/")
        self.client.get("/products")
        self.client.get("/products/42")

Reality is that 60% of users are 'window shoppers', 25% are thinking about buying and 15% actually buy. AI-modelled version:

from locust import HttpUser, TaskSet, task, between
import random

class WindowShopper(TaskSet):
    @task(10)
    def browse_homepage(self): self.client.get("/")
    @task(8)
    def view_category(self): self.client.get(f"/category/{random.randint(1,20)}")
    @task(3)
    def view_product(self): self.client.get(f"/products/{random.randint(1,500)}")
    @task(1)
    def abandon(self): self.interrupt()

class ActiveBuyer(TaskSet):
    @task(5)
    def add_to_cart(self):
        self.client.post("/cart/add", json={"product_id": random.randint(1,500)})
    @task(3)
    def view_cart(self): self.client.get("/cart")
    @task(2)
    def checkout(self):
        self.client.post("/checkout", json={"payment_method": "card"})

class RealisticUser(HttpUser):
    wait_time = between(2, 8)
    tasks = {WindowShopper: 60, ActiveBuyer: 15, FreshBrowser: 25}

Weights {60, 15, 25} reflect a real conversion funnel.

Where AI comes into play

You get this data not from your head, but from Google Analytics / Mixpanel events. AI processes it and generates a Locust script:

> Stiahni posledných 7 dní eventu z cypress/data/ga-exports/.
Analyzuj user flows a vygeneruj Locust script, ktorý simuluje
realistické správanie s weights podľa skutočnej frekvencie
eventu. Zohľadni session duration distribúciu.

Claude reads the CSV export from GA, counts transitions between pages and generates TaskSets with correct weights.

Anomaly detection during the run

On every run you have time-series RPS, p95 latency, error rate. An AI anomaly detector (e.g. Prophet or a simple Isolation Forest) flags them:

import pandas as pd
from sklearn.ensemble import IsolationForest

# Načítaj metriky z Locust stats CSV
df = pd.read_csv('locust_stats_history.csv')
features = df[['num_requests', 'avg_response_time', 'fail_ratio']]

model = IsolationForest(contamination=0.05)
df['anomaly'] = model.fit_predict(features)

anomalies = df[df['anomaly'] == -1]
if not anomalies.empty:
    print(f"⚠ Detected {len(anomalies)} anomalies:")
    print(anomalies[['timestamp', 'avg_response_time', 'fail_ratio']])

Anomalies usually coincide with deployment events or DB locking issues — which without AI would take hours of grepping logs.

CI integration

# Jenkinsfile stage
stage('Performance test') {
    steps {
        sh 'locust --headless -u 500 -r 20 -t 10m --html report.html'
        sh 'python3 analyze_anomalies.py'
    }
    post {
        always {
            archiveArtifacts 'report.html'
            slackSend(
                channel: '#perf-alerts',
                message: "Performance test: ${currentBuild.result}"
            )
        }
    }
}

Real use case: school enrolment portal

A client in education has a portal where 50,000 parents sign in simultaneously in September. A classic load test with 500 stable users would not catch the spike. AI-modelled scenario:

  • Burst simulation — all at once at 07:00
  • 70% first login (cold cache), 30% session restart
  • Realistic thinking time (parents read forms 2–5 min)
  • Retry behaviour on 5xx responses

Outcome: we uncovered a connection pool limit that in production would cause a crash 5 minutes after launch. Fixed before go-live.

When you don't need AI with Locust

  • A simple baseline test 'can the server handle 100 RPS?' — traditional Locust without AI.
  • A small application with a simple flow (login + one operation).
  • When you have no data on real behaviour — you have to start from analytics.

Want the same approach at your company? Get in touch — dohodneme 30-minute discovery call.