HOW IT WORKS

How Agentic Pentest Works at DNA

From multi-agent fleet to DNA LLM Gateway and privacy-first routing, the technical detail of how DNA operates.

The Shift to Agentic

The traditional pentest that DNA has run for nearly two decades places multiple domain-specialist senior experts at the center. Each line (Network, Red Team, Application, Smart Contract, AI Security) has its own specialist. They read the code, form hypotheses, write exploits, and write the report. The "AI-Powered" generation that followed added an AI assistant alongside the specialist. A little faster, still a one-person sequential flow. Agentic Pentest Workflows are different in kind. Dozens of agents run in parallel. Each one on a narrow question. Results are deduplicated and adversarially verified. The HPTSA industry benchmark records multi-agent architectures outperforming single-agent ones by 4.3x. The domain-specialist senior expert shifts from "the one doing the work" to "the one orchestrating the fleet". Their judgement shapes the decision at every stage, not only at the end.

Anti-Noise Engineering

The first question a security team asks when they hear "hundreds of agents in parallel" is: doesn't that mean hundreds of times the noise, hundreds of false positives? The answer lives in the refutation pipeline design. A candidate signal the fleet raises is never trusted on first pass. It goes through an independent agent, a different model (routed via the DNA LLM Gateway), and a prompt written to refute by default. The second agent's job is to disprove the signal. Out of 350+ candidate signals the fleet raises, the refutation pipeline lets ~30 through. The domain-specialist senior expert does the final fine-pick, shipping 12-18 actionable. Refutation is the heavy filter. Human triage is the precise pick. Multi-agent doesn't add noise. Multi-agent culls it.

350+

Candidate signals raised by the fleet

~30

Survive refutation (~91% culled)

12-18

Actionable, shipped to client

Inside the Gateway

The DNA LLM Gateway is not a proxy. It is three stacked layers, each with one responsibility and an independently testable boundary. Every prompt from the agent fleet passes through all three before it ever reaches a vendor LLM.

FLEET

100+ agents

Privacy Masker

First layer. Detects and masks PII, credentials, and secrets by policy. Output is a masked prompt plus a token map that stays inside DNA infrastructure.

Router

Second layer. Reads task type, picks the best-fit frontier model. Code reasoning routes to Claude, tooling to GPT, long-context to Gemini. Every routing decision is logged.

Audit Logger

Third layer. Every prompt, every response, every masking decision, every routing decision is logged with a timestamp. Clients can request an audit log report at any point.

VENDORS

Claude

Anthropic

GPT

OpenAI

Gemini

Google

Privacy by Data Type

DNA does not mask in bulk. Each sensitive data type has its own mechanism. Clients can request an audit log report to verify that raw data never left DNA infrastructure unmasked. This is not a claim. It is an engagement deliverable.

01PII

RAW

alice@bigcorp.com

MASKED

USER_3F7B

Tokenized. Reversed inside DNA when the response returns.

02Credentials / Secrets

RAW

AKIA1234567890ABCDEF

MASKED

[REDACTED:AWS_KEY]

Fully redacted. The LLM never gets the chance to learn or leak a credential.

03Source code

RAW

function login(user, pwd) { …

MASKED

HASH_E3D4A1

Hash-substituted for sensitive repos. Threshold set by the client at the scoping call.

Fleet Telemetry

An engagement is not a single event. It is an orchestrated flow. At any moment, hundreds of agents may be scanning, hunting, refuting, or waiting on expert sign-off. This is a look inside the fleet at work.

The Senior Expert Decision Surface

Agents do not sign off findings. Agents do not set scope. Agents do not talk to the client. In every engagement there are six decisions that only the domain-specialist senior expert makes. This is not a "human in the loop" buzzword. These are six concrete signatures, every one verifiable in the audit log.

Scope ceiling

What is in-scope, what is not. The fleet operates inside the boundary and never crosses it.

Attack-class priority

Out of dozens of possible attack classes, the expert picks the seven worth running first against this target.

Exploit-chain sign-off

Agents propose chains. The expert decides whether the chain is realistic enough to spend exploit time on.

False-positive override

Agents confirm a finding is real. The expert reviews business context and sometimes overrides because they know a compensating control already exists.

Business-impact translation

SQL injection on a test endpoint is not SQL injection on the production payment endpoint. The expert is the one who translates a finding into business risk.

Sole client communication

The client never receives a message from an agent. Every communication goes through the senior expert.

Every finding that leaves DNA carries a human signature. Verifiable in the audit log.

Ready for an engagement?

DNA's domain-specialist senior expert scopes an engagement around your target.