How Agentic Pentest Works at DNA
From multi-agent fleet to DNA LLM Gateway and privacy-first routing, the technical detail of how DNA operates.
The Shift to Agentic
The traditional pentest that DNA has run for nearly two decades places multiple domain-specialist senior experts at the center. Each line (Network, Red Team, Application, Smart Contract, AI Security) has its own specialist. They read the code, form hypotheses, write exploits, and write the report. The "AI-Powered" generation that followed added an AI assistant alongside the specialist. A little faster, still a one-person sequential flow. Agentic Pentest Workflows are different in kind. Dozens of agents run in parallel. Each one on a narrow question. Results are deduplicated and adversarially verified. The HPTSA industry benchmark records multi-agent architectures outperforming single-agent ones by 4.3x. The domain-specialist senior expert shifts from "the one doing the work" to "the one orchestrating the fleet". Their judgement shapes the decision at every stage, not only at the end.
Anti-Noise Engineering
The first question a security team asks when they hear "hundreds of agents in parallel" is: doesn't that mean hundreds of times the noise, hundreds of false positives? The answer lives in the refutation pipeline design. A candidate signal the fleet raises is never trusted on first pass. It goes through an independent agent, a different model (routed via the DNA LLM Gateway), and a prompt written to refute by default. The second agent's job is to disprove the signal. Out of 350+ candidate signals the fleet raises, the refutation pipeline lets ~30 through. The domain-specialist senior expert does the final fine-pick, shipping 12-18 actionable. Refutation is the heavy filter. Human triage is the precise pick. Multi-agent doesn't add noise. Multi-agent culls it.
Inside the Gateway
The DNA LLM Gateway is not a proxy. It is three stacked layers, each with one responsibility and an independently testable boundary. Every prompt from the agent fleet passes through all three before it ever reaches a vendor LLM.
Privacy Masker
First layer. Detects and masks PII, credentials, and secrets by policy. Output is a masked prompt plus a token map that stays inside DNA infrastructure.
Router
Second layer. Reads task type, picks the best-fit frontier model. Code reasoning routes to Claude, tooling to GPT, long-context to Gemini. Every routing decision is logged.
Audit Logger
Third layer. Every prompt, every response, every masking decision, every routing decision is logged with a timestamp. Clients can request an audit log report at any point.
Privacy by Data Type
DNA does not mask in bulk. Each sensitive data type has its own mechanism. Clients can request an audit log report to verify that raw data never left DNA infrastructure unmasked. This is not a claim. It is an engagement deliverable.
Tokenized. Reversed inside DNA when the response returns.
Fully redacted. The LLM never gets the chance to learn or leak a credential.
Hash-substituted for sensitive repos. Threshold set by the client at the scoping call.
Fleet Telemetry
An engagement is not a single event. It is an orchestrated flow. At any moment, hundreds of agents may be scanning, hunting, refuting, or waiting on expert sign-off. This is a look inside the fleet at work.
The Senior Expert Decision Surface
Agents do not sign off findings. Agents do not set scope. Agents do not talk to the client. In every engagement there are six decisions that only the domain-specialist senior expert makes. This is not a "human in the loop" buzzword. These are six concrete signatures, every one verifiable in the audit log.
Scope ceiling
What is in-scope, what is not. The fleet operates inside the boundary and never crosses it.
Attack-class priority
Out of dozens of possible attack classes, the expert picks the seven worth running first against this target.
Exploit-chain sign-off
Agents propose chains. The expert decides whether the chain is realistic enough to spend exploit time on.
False-positive override
Agents confirm a finding is real. The expert reviews business context and sometimes overrides because they know a compensating control already exists.
Business-impact translation
SQL injection on a test endpoint is not SQL injection on the production payment endpoint. The expert is the one who translates a finding into business risk.
Sole client communication
The client never receives a message from an agent. Every communication goes through the senior expert.
Every finding that leaves DNA carries a human signature. Verifiable in the audit log.
Ready for an engagement?
DNA's domain-specialist senior expert scopes an engagement around your target.
Contact us to start an engagement