← Back to blog

7 Security Holes We Found by Auditing Our Own AI Service

· 8 min
security ai launch

Before launching Haamu.ai, we did something most AI startups skip: we ran adversarial security agents against our own service. Not a checklist audit — actual AI agents role-playing as penetration testers, white-hat hackers, and privacy researchers, each trying to break our product in different ways.

They found seven real vulnerabilities. Some were embarrassing. Some were critical. All of them are common in AI-powered web services. Here's what we found and the patterns behind each one, so you can check your own deployments.


1. PII leaking through error responses

Severity: Critical

When a user submits invalid input to an API, many frameworks helpfully echo the input back in the error message. FastAPI and Pydantic do this by default — the 422 validation error includes an input field containing exactly what the user sent.

For most APIs, that's fine. For a privacy service that handles sensitive text? It means PII could appear in error logs, monitoring dashboards, and HTTP responses even when the request fails. A user submitting text too long or with an invalid mode gets their private data reflected right back in cleartext.

Fix: Override the default validation error handler to strip the input field from all 422 responses. Return the error type and location, but never the data itself.

2. PII bypass via invisible Unicode characters

Severity: Critical

Zero-width characters are Unicode code points that render as nothing — no visible output, no width, no space. But they're real characters that break pattern matching. Insert a zero-width space into an email address (john​@example.com) and most regex-based PII detectors won't match it. The text looks identical to a human reader, but the detector sees gibberish.

There are at least 15 invisible Unicode characters that can be used this way: zero-width spaces, joiners, direction marks, word joiners, soft hyphens, the BOM character, and various script-specific fillers.

Fix: Strip all known invisible characters before PII analysis. Also normalize common email obfuscation patterns — (at) to @, (dot) to ., and HTML entities like @ back to their real characters.

3. All users sharing one identity behind a reverse proxy

Severity: Critical

This is the most common deployment mistake in AI services. Your rate limiter uses request.client.host to identify users. Locally, that's the client's IP. In production behind Cloudflare, Nginx, or any reverse proxy? That's the proxy's internal IP — the same for every user on earth.

Result: either all users share one rate limit pool (one heavy user blocks everyone), or your rate limiter is meaningless because the "per-user" limit is actually per-server.

Fix: Use the CF-Connecting-IP header (if behind Cloudflare) or the first entry in X-Forwarded-For. Add --proxy-headers to your uvicorn startup. Never trust request.client.host in a proxied deployment.

4. API documentation exposed in production

Severity: High

FastAPI ships with Swagger UI at /docs, ReDoc at /redoc, and the full OpenAPI schema at /openapi.json — all enabled by default. These are invaluable during development and a gift to attackers in production.

The OpenAPI schema reveals every endpoint, every parameter name, every enum value, and every response schema. An attacker doesn't need to guess — your API is fully documented and interactive.

Fix: Conditionally disable docs based on an environment flag: docs_url="/docs" if debug else None. Do the same for redoc_url and openapi_url.

5. HSTS header not working behind a proxy

Severity: Medium

You added Strict-Transport-Security headers — good. But you gated it behind if request.url.scheme == "https". Behind a reverse proxy, the connection between proxy and app server is plain HTTP. The scheme is always http. Your HSTS header never gets sent.

Fix: Also check the X-Forwarded-Proto header, which the proxy sets to https when the original client connection was encrypted.

6. LLM cost exposure without a ceiling

Severity: High

If your AI service calls a paid API (Claude, GPT-4, etc.) on every request, you need layers of cost protection. Per-IP rate limits help against individual abuse, but an attacker with access to a proxy rotation service can use thousands of IPs. At even $0.05 per request, 10,000 IPs making 10 requests each is $5,000/day.

What we implemented: a daily word quota per IP (product limit), a per-IP LLM call counter (cost limit), a per-minute request rate limit (burst protection), and a reduced max_tokens on LLM responses (output cost cap). What we still need: a global server-wide daily spending cap that disables LLM features entirely if total cost exceeds a threshold.

Fix: Layer your defenses. Per-IP limits, per-minute burst limits, output token caps, and a global kill switch. Set spending alerts on your API provider dashboards. No single layer is enough.

7. In-memory rate limits that don't survive restarts

Severity: Medium

Storing rate limit counters in a Python dictionary is simple and fast. It also means every server restart resets all limits. Every deployment gives every user a fresh quota. If you deploy 10 times a day (common with CI/CD), your daily limits are effectively 10x what you intended.

Worse: if you run multiple workers (uvicorn --workers 4), each worker has its own independent counters. Four workers means 4x the allowed requests.

Fix: For MVP, run a single worker and accept the restart reset. For production scale, move rate limit state to Redis or a shared SQLite database in WAL mode.


The meta-lesson: AI agents are good at finding these

We didn't find these vulnerabilities through manual code review. We spawned specialized AI agents — one playing a penetration tester, one a cost-focused attacker, one a privacy auditor, one a PII bypass specialist — and let them probe the running service and review the source code simultaneously.

The agents found issues that a human reviewer might have caught individually, but the speed and thoroughness of parallel automated testing is hard to match. The zero-width Unicode bypass, for example, is the kind of thing that slips past most security reviews because it requires thinking about text encoding at a level most developers don't reach.

If you're building an AI-powered service, consider running this kind of adversarial audit before launch. The cost of finding these issues in staging is a few API calls. The cost of finding them in production is your users' trust.


Security checklist for AI web services

Based on our audit, here's what to verify before launching any AI-powered service:

Every item on this list is something we either got wrong initially or had to deliberately engineer. None of them are exotic attacks — they're the basics that get overlooked when you're focused on making the product work.