An AI coding assistant once confidently wrote me a call to array.findLast() in a context where it didn't exist, imported a function from a library that had renamed it two versions ago, and accessed a property on a Drizzle row that the schema had never defined. All three looked completely plausible. All three would have shipped if I were reviewing by eye. None of them survived tsc. That is the entire thesis of this post: AI-generated code fails in a specific, recognisable way, confident hallucination of APIs that don't exist, and TypeScript's strict mode is the cheapest, most reliable detector of that failure that I have found.

I write a lot of code with AI assistance now, including on adatepe.dev, which is TypeScript strict end to end with around 1,280 Bun tests. The volume only works because the machine, not me, catches the machine's mistakes. Here is the guardrail stack that makes generated code something I can actually trust in production.

your guessAn AI assistant wrote me this. Three things are wrong. How many survive a strict tsc run?

const data: any = await res.json();
if (data.status === true) doThing(data.findLast());

That is not discipline I apply by hand on every line. It is a stack of machines that each refuse to let a specific class of mistake through, and on this site the numbers behind it look like this:

0Bun tests on this site

0any / ts-ignore allowed

0%boundaries Zod-parsed

0gates before "done"

Strict mode is non-negotiable, and `any` is the hole in the net

Hallucinated APIs are, almost by definition, type errors. The method that doesn't exist, the renamed import, the property that isn't on the type, strict TypeScript rejects all of them before they run. This only works if the type system is actually engaged, which means strict mode on and, critically, no escape hatches that let the model's guess slip through unchecked.

Coming from a BMW engineering mindset and an LMU M.Sc. CS background, I treat my tsconfig like a tolerance spec: the tighter the flags, the fewer defects make it off the line. These are the compiler options I lean on most to catch the mistakes AI assistants love to make.

annotatedThe tsconfig flags that catch AI mistakes

{
  "compilerOptions": {
    "strict": true,
    "noUncheckedIndexedAccess": true,
    "exactOptionalPropertyTypes": true,
    "noImplicitOverride": true
  }
}

strict turns on a bundle of checks at once (noImplicitAny, strictNullChecks, and more). It is the baseline every project should start from, but on its own it still lets real bugs through.

Each of those flags has paid for itself by turning a plausible-looking generated diff into a red build. With the baseline set, the next job is to close the escape hatches that would let the model's guesses slip through anyway.

The two escape hatches that matter are any and @ts-ignore, and my repo bans both. The moment a value is any, every hallucinated method call on it becomes legal again, because any tells the compiler to stop checking. An AI that writes const data: any = await fetch(...) and then accesses data.thisFieldIsImaginary has just smuggled a hallucination past your strongest defence. So I lint it out.

{
  "rules": {
    "@typescript-eslint/no-explicit-any": "error",
    "@typescript-eslint/no-unsafe-assignment": "error",
    "@typescript-eslint/ban-ts-comment": "error"
  }
}

When the assistant reaches for any to make an error go away, and it will, because suppressing the error is the locally easy move, the lint rule fails the build. That forces the real fix: a proper type, or a validated parse. The discipline I follow and enforce in CI is simple to state and hard to bypass: never any, never @ts-ignore, never --no-verify. If the pipeline is red, the cause gets fixed, not silenced.

The compiler can't see the network, so validate at the boundary

Strict types catch hallucinated APIs in code you wrote. They are powerless at the edges where data enters at runtime, an HTTP response, a database row, a parsed file, and especially a language model's output. TypeScript types are erased at runtime. A type annotation on a fetch result is a promise the compiler trusts and never verifies, which means a hallucinated assumption about an API's response shape will type-check perfectly and explode in production.

The fix is to validate every boundary with a runtime schema. I use Zod, and the rule is that nothing crosses from "external" to "internal" without a parse.

import { z } from "zod";

const ApiResponse = z.object({
  id: z.string(),
  status: z.enum(["pending", "done"]),
});

const parsed = ApiResponse.safeParse(await res.json());
if (!parsed.success) {
  throw new BoundaryError("unexpected API shape", parsed.error);
}
// parsed.data is now genuinely typed AND genuinely verified

Here's the same hallucinated-API moment, with and without the guardrail in the way:

const data: any = await res.json();

// model guessed this field exists. it doesn't.
if (data.status === true) {        // API returns a string enum
  doThing(data.findLast());        // method that doesn't exist here
}
// tsc: passes. ships. explodes 3 layers downstream
//   → "undefined is not a function" in production

Strict types reject hallucinated APIs at compile time; Zod rejects wrong runtime shapes at the edge. The build says no before users do.

This catches a whole second class of AI mistakes: the ones where the model assumed a response shape that is almost right. It guessed status is a boolean when the API returns a string enum. It assumed a field is always present when it's optional. The types alone would have believed the guess. The schema doesn't, it checks at runtime and fails loudly with a useful error instead of an undefined-is-not-a-function three layers downstream. Schema-validate the boundaries, and let strict types carry everything inside.

Tests catch the logic the types can't

Types prove the shape is right. They do not prove the logic is right, and AI-generated code is very good at producing something that type-checks and is also subtly wrong, an off-by-one, an inverted condition, a timezone assumption. That is what the test suite is for, and it is why adatepe.dev has roughly 1,280 of them.

I don't write all of those by hand, and I don't trust the ones the AI writes without reading them, because an assistant will cheerfully write a test that asserts the buggy behaviour it just produced. The pattern that works for me is to write the assertion myself, the thing I actually want to be true, and let the assistant fill in the scaffolding around it. The test encodes my intent; the implementation has to satisfy it. When generated code changes behaviour I didn't ask it to change, a test that used to pass goes red, and that red is the whole point.

The leverage with Bun's test runner is speed. A suite that runs in seconds is a suite I'll actually run on every change, and a guardrail you skip because it's slow is not a guardrail. Fast tests are how generated code stays honest at volume.

The gotcha I learned the expensive way is that a passing suite proves nothing if the tests were generated alongside the code they cover. Early on, I let an assistant write both an email-beacon handler and its tests in the same pass. Every test was green. Every test also asserted the broken behaviour, because the model encoded its own wrong assumption into the implementation and the assertion at once, so they agreed with each other and disagreed with reality. The bug, a Resend call whose error field was never inspected, shipped silently and cost us a chunk of a deploy window before the missing emails surfaced in production. That incident is why I now treat AI-written assertions as untrusted input in exactly the same way I treat a network response: read before believing. The cheap rule that came out of it is that the assertion, the line that says what must be true, is mine to write, and only the scaffolding around it is the assistant's to fill. The split sounds pedantic until it saves you a postmortem. One number from running this for months: of the roughly 1,280 tests on the site, the ones that have actually caught a regression were almost all assertions I wrote by hand against intent, not the ones generated wholesale. If you want the failure mode in full, I wrote up how the same trust gap plays out at the data edge in building AI features in the App Router, where the model's output is the input you can least afford to believe.

The guardrail the machine can't be: reading the diff

I want to be honest about the limit of all this automation, because pretending the gate catches everything is its own kind of danger. Strict types, schemas, and tests catch the mechanical failures: the hallucinated API, the wrong shape, the broken logic a test pins down. What they do not catch is code that is correct and also wrong, code that compiles, passes, and does a thing you never actually wanted. The model will sometimes solve a slightly different problem than the one you asked about, and do it flawlessly. No compiler flags that.

So the last guardrail is not a tool, it is a habit: I read the diff, every time, before I trust it. Not line by line like a paranoid auditor, but enough to answer one question. Did it do what I meant, or did it do something that merely passes my checks? That distinction is the whole job once the mechanical failures are automated away. A green pipeline tells me the change is valid and internally consistent. It cannot tell me the change is the one I wanted, because my intent lives in my head, not in the test suite.

This is exactly why I keep tasks small and diffs reviewable. A forty-line change I can read in a minute and confirm matches my intent. A six-hundred-line change I cannot, so I either trust it blindly or spend an hour I did not save. Small scope is not just kinder to the gate's feedback, it is what makes the human review step cheap enough that I will actually do it. The moment a diff is too big to read, the model's confidence quietly becomes my only signal, and the model's confidence is exactly the thing I built this whole stack to stop relying on.

The automation and the reading are not redundant, they cover different failures. The machine catches what I would miss from fatigue or volume. I catch what the machine cannot see, the gap between correct and intended. Drop either half and the generated code gets more dangerous, not less.

Before we go further, I am curious where you actually land on this question of trust.

pollHow much do you trust AI-generated TypeScript before it runs?

My own answer used to be the first option, and reading every line stopped scaling the moment the assistant could write faster than I could read.

Why the type checker is the cheapest reviewer you have

A human reviewer is expensive. They get tired, they skim, and at the volume an assistant produces they simply cannot read every line with full attention. The type checker has none of those limits. It reads every token, never gets bored, and runs in seconds on every save. When I let tsc be the first reviewer, it does the mechanical pass I would otherwise do by eye, the hallucinated method, the renamed import, the property that was never on the type, and it does that pass for free, every time, before I look at anything.

That is what makes strict guardrails the thing that lets me trust generated TypeScript without reading every line. The trust is not in the model, it is in the wall the model has to clear. If strict mode is on, any and @ts-ignore are banned, and the boundaries are parsed, then a green compile means every mechanical failure mode is already closed. I get to spend my limited human attention on the one question the compiler cannot answer, did this do what I meant, instead of burning it on typo-hunting the machine is better at anyway.

The one escape hatch I allow: unknown

Banning any and @ts-ignore does not mean I pretend every type is knowable up front. Sometimes a value genuinely has no type yet: a JSON blob that just came off the wire, the argument of a catch that could be anything someone decided to throw. For those cases I do keep one escape hatch, and I reach for it deliberately: unknown.

The difference between any and unknown is the whole point. any switches the checker off for that value. Every property access, every method call, every assignment becomes legal again, which is exactly the gap a hallucinated API slips through. unknown does the opposite. It keeps the checker on and refuses to let you touch the value until you have proven what it is. You cannot read a field, call a method, or pass it anywhere typed until you narrow it with a real check. The compiler demands evidence, and the model has to supply it.

function parseUser(raw: unknown): { id: number } {
  if (
    typeof raw === "object" &&
    raw !== null &&
    "id" in raw &&
    typeof raw.id === "number"
  ) {
    return { id: raw.id };
  }
  throw new Error("unexpected user payload");
}

With any, raw.id would compile no matter what, and so would raw.idd, raw.nope.deeper, and every other typo the assistant invents. With unknown, none of that survives. The narrowing block is the proof, and once it passes, the type that flows out the other side is real, not asserted.

So my rule is not "never admit you don't know the type." It is "admit it honestly, with unknown, and let the compiler make you earn the access." That is the one hatch that keeps the net intact instead of cutting a hole in it.

Make the gate a wall, not a suggestion

A guardrail that you can choose to ignore is decoration. The thing that makes all of the above real is that they run as a single ordered pipeline and the work isn't done until every step is green. On my repo that's one command: format, then tsc --noEmit, then lint, then a format check, and all of it has to pass, not just the parts touching my change.

bun run verify   # prettier --write → tsc --noEmit → eslint → prettier --check

Here is what that one command actually looks like when a hallucinated API slips into a generated diff and the gate catches it on my machine before it ever reaches a pull request.

The verify gate catching a hallucinated API

The first run is red because tsc rejected the imaginary method, the middle run proves the real fix still satisfies the test that pins my intent, and the third run is the all-green wall that means the change is safe to commit. The rule I hold to is that partial green is not green. If the pipeline surfaces a type error or a lint warning anywhere, even somewhere the AI's change didn't directly touch, it gets fixed before the task is called done. This matters more with generated code than with hand-written code, because the assistant produces more of it, faster, and any gap in the gate is a gap that fills up quickly. The gate is also where the no-any and no-@ts-ignore rules get teeth: they're lint errors, lint is in the pipeline, the pipeline is the wall.

Before you trust a single line of AI-written code in production, tick these off. They're the difference between "I hope this is right" and "the build says so", check the ones your repo already enforces:

checklistIs your repo ready to accept generated code?0/5

The whole stack is four layers, each catching a failure the others cannot. Switch between them:

compareThe four layers, and what each catches

Catches the hallucinated API: the method that does not exist, the renamed import, the property that was never on the type.

This only works with strict mode on and no escape hatches. The moment a value is any, every hallucinated access on it becomes legal again, so I lint out any and ts-ignore entirely.

The guardrails I add beyond the type checker

The type checker is the floor, not the whole building. Once strict mode is doing its job, I keep finding failure modes that slip past it cleanly, and each one taught me to add another guardrail the compiler simply cannot be.

The first is lint rules that catch what types cannot express. Types tell you a value is the right shape; they say nothing about whether you used it sensibly. A floating promise nobody awaited, an exhaustive switch that quietly stopped being exhaustive, a console.log left in a request handler, a dependency array that lies to React, none of those are type errors, but all of them are bugs an AI assistant produces happily because they compile. Rules like no-floating-promises and switch-exhaustiveness-check encode the judgement a tired reviewer would apply, except they apply it on every save and never get tired. The assistant cannot charm its way past a lint rule the way it can past a human skim.

The second is runtime validation at the boundary, because types are erased the moment the code runs. An annotation on a fetch result or a model's JSON output is a promise the compiler trusts and never verifies, so a hallucinated assumption about a response shape type-checks perfectly and detonates in production. I parse every untrusted input with Zod before it crosses into typed code, which is exactly the discipline I lean on when building AI features in the App Router where a model's output is the least trustworthy input in the whole system. The schema fails loudly at the edge with a real error instead of an undefined is not a function three layers downstream.

The third is a CI gate that blocks the merge, not just a check I run locally when I remember to. A guardrail I can forget is decoration. The same ordered pipeline that runs on my machine runs on every pull request, and a red pipeline blocks the merge button, no override. That is what turns the no-any rule, the Zod boundaries, and the test suite from good intentions into a wall the generated code physically cannot get around.

Types alone are not enough for AI-generated code precisely because the model's mistakes cluster in the gaps types leave open: correct shapes used wrongly, untrusted data assumed trustworthy, and the human discipline of actually running the checks. Each guardrail closes one of those gaps. Coming from a BMW engineering mindset and an LMU M.Sc. CS background, I think of it as defence in depth, no single layer is trusted to catch everything, and that is the point.

Why this lets me move fast, not slow

People assume strict types and a hard gate slow you down. With AI in the loop it's the opposite. The reason I can accept a large generated diff and trust it is that I have a deterministic machine standing between the model's confidence and my production branch. The compiler catches the hallucinated APIs, the schemas catch the bad assumptions about runtime data, the tests catch the wrong logic, and the gate makes sure none of it is optional. Without that stack I'd have to read every line as if it were trying to deceive me, which, in a sense, it is, because the model has no idea when it's wrong.

The guardrails turn "I hope this is right" into "the build says this is right," and that's the only version of AI-assisted development I'm willing to ship. The stack runs on everything at /#projects, and I keep writing up what the compiler catches on /blog. Strict mode isn't bureaucracy. With a language model writing half your code, it's the safety net you cannot work without.

your move

Where's the hole in your net right now?

Pick the one you haven't closed yet.

Wherever your net has a hole, the fix is the same one I trust: let the compiler catch what review misses. Before you reach out, it helps to know which layer of the stack to add first, because the cheapest win depends on where your repo actually stands today.

find your answer

Which guardrail should you add first?

Route yourself to the one change that closes the biggest gap in your net right now.

Is TypeScript strict mode already on, with any and ts-ignore banned in lint?

If you want to trade guardrail notes, reach out.

built by alperenI ship AI-assisted code to production, behind a wall this strict.Full-stack engineer, M.Sc. CS at LMU Munich. See what the compiler lets through, or get in touch.Explore my work

Strict mode is non-negotiable, and any is the hole in the net