Can AI be your security guard?

I’ve seen a lot of articles online about AI and coding. Everyone seems to have a strong opinion, especially on LinkedIn, from the die-hard optimists to the deeply skeptical. But opinions don’t tell the whole story. I went looking for the data, and what the research shows paints a complicated, sometimes contradictory, picture.

We all know AI coding assistants like Claude, ChatGPT and GitHub Copilot are fast. They have completely changed how developers work. But this new speed has a trade-off, and it’s a big one: security. It seems we’ve stumbled into an "AI Paradox," where the tools that accelerate our work are also the ones introducing serious security flaws.

This raises a simple question. If an AI is clever enough to write our code, shouldn’t it be clever enough to secure it? The research, so far, points to a clear "no." It turns out that AI tools, on their own, are quite bad at spotting the very vulnerabilities they help create. The real path forward seems to be a more balanced approach, one that combines the contextual strengths of AI with the proven reliability of traditional security tools.

The generated code is a problem

The first thing that stands out from the research is a bit of a shock: the code AI generates is often dangerously insecure.

A major study tested how well AIs could build real-world applications and the results were not encouraging. The best models struggled to produce code that was both working and secure, managing it only about 37% of the time. Even when the code functioned as intended, it was still vulnerable to being exploited about half of the cases. On top of the security gaps, the best model only wrote functionally correct code 62% of the time to begin with. For those of us in the security field, this is what we call job security.

So if we can’t trust the AI to write secure code, can we at least trust it to find the flaws?

Using AI as a security guard doesn’t work

The next logical step was to use AI as an automated code reviewer. The idea is simple. If AI can write code, it should be able to read it and find mistakes. But in practice, these tools are not yet the reliable watchdogs we need them to be.

Take GitHub Copilot. One study of its code review feature found it "surprisingly limited" at spotting common security vulnerabilities. It missed textbook flaws like SQL injection and cross-site scripting. Instead of flagging these critical risks, Copilot often got distracted by minor issues like coding style or typos. It’s reassuring to know that for now, the AI is more interested in correcting grammar than preventing a full-scale data breach.

Other AI tools like Anthropic’s Claude Code and OpenAI’s Codex had a different problem: they were too noisy. These tools generated false positives 86% and 82% of the time, respectively. That’s a lot of wasted time for developers chasing ghosts. To be fair, the AIs did show a talent for finding "big picture" logical flaws that traditional tools sometimes miss, but the noise level is a major issue.

Why AI fails at security analysis

So why does AI struggle so much with security? It comes down to a few key technical reasons. Unlike traditional security tools that are built on the hard logic of programming, LLMs work with patterns. This makes them a poor fit for the precision that security requires.

The single biggest weakness is that an AI can’t follow the path of data through an application. Finding major flaws like SQL injection requires tracking user input from the "source" all the way to a dangerous "sink." AIs are terrible at this. The data shows Claude only found 5% of SQL injection flaws, and Codex found zero.

They are also inconsistent. Researchers point to context rot and compaction as a major reason. When an AI analyzes a large codebase, it has to compress the information. During this "compaction" important details can get lost. This leads to "context rot" where the AI can’t accurately retrieve what it knows. The result is that running the exact same scan on the same code can give you completely different results each time. For a security audit, that lack of repeatability is a deal-breaker.

Is the hybrid approach a way forward?

This doesn’t mean AI is useless for security. It just means we’ve been using it wrong. The consensus is now moving toward Cyber Reasoning Systems (CRSes), which are smarter systems that combine multiple techniques.

In these hybrid systems, AI isn’t the star of the show. It’s a specialist that assists a team of more reliable, traditional tools. The winning system of a recent AI Cyber Challenge, ATLANTIS, used this approach perfectly. It relied on a foundation of deterministic tools such as CodeQL for the heavy lifting. The AI’s job was to act as a "tool-user", generating high-quality tests that other, more precise tools could run.

AI excels at the last mile

AI isn’t a silver bullet that can find a security flaw from start to finish. However, it’s incredibly good at solving the "last mile" problem.

This happens when a traditional tool finds a vulnerable spot in the code but can’t quite craft the complex input needed to prove it’s exploitable. This is where an AI shines. It can analyze the context, the code, and the location, and intelligently generate the precise payload needed to trigger the vulnerability. In one study, this "last mile" approach had an impressive 81.3% success rate.

The future of automated security isn’t about getting a simple AI assistant to do everything. It’s about adopting smarter systems that use AI’s reasoning to solve the hardest parts of the problem, all within a framework of reliable, traditional tools.