CREST-Certified AI and LLM Penetration Testing for UK Businesses
AI penetration testing for production LLM-powered applications, RAG pipelines, agent systems, and ML models. Aligned to the OWASP Top 10 for LLM Applications and NCSC AI Cyber Security Code of Practice. Manual exploitation of prompt injection, jailbreak, training-data poisoning, model theft, and excessive agency.
AI security is an emerging attack surface. We test it like a real attacker.
Modern AI applications — production chatbots, RAG-augmented assistants, autonomous agents, fine-tuned LLMs — introduce attack surfaces unfamiliar to traditional pen testing. Prompt injection bypasses guardrails. Training-data poisoning persists across model retraining cycles. RAG retrievers leak confidential data through clever query manipulation. Agentic systems with tool access can be coerced into destructive actions. Standard SAST / DAST tools cannot find these.
Our AI penetration testing is aligned to the OWASP Top 10 for LLM Applications, the NCSC AI Cyber Security Code of Practice, and the EU AI Act security obligations. Manual prompt-injection chains, jailbreak validation, training-data probe attacks, RAG retrieval manipulation, and agentic-system tool abuse — all delivered by CREST-certified testers with practical AI security experience.
OWASP LLM TOP 10 (2025)
What We Test in AI / LLM Penetration Testing
Aligned to OWASP Top 10 for LLM Applications (2025). Manual exploitation across all 10 risk categories.
Prompt Injection
Direct and indirect prompt injection attacks that manipulate the LLM into ignoring its system prompt, leaking confidential information, or executing unauthorised tool calls.
Sensitive Information Disclosure
Unintended exposure of training data, system prompt, RAG corpus, or PII via cleverly-constructed queries. Membership inference attacks against model weights.
Supply Chain Vulnerabilities
Compromised model weights, tampered fine-tuning datasets, malicious LoRA adapters, vulnerable open-source models from HuggingFace, supply-chain attack via embedding model.
Data & Model Poisoning
Training-data poisoning persists across retraining cycles. Backdoor injection in fine-tuning data. Adversarial examples that bypass safety filters.
Improper Output Handling
LLM-generated content (HTML, SQL, shell commands, code) used downstream without validation — leading to XSS, SQL injection, command injection, deserialization attacks.
Excessive Agency
Agentic systems with overly-broad tool permissions. LLM coerced via prompt injection into destructive actions (sending emails, transferring funds, deleting data).
System Prompt Leakage
Extraction of the system prompt revealing business logic, security instructions, or proprietary IP. Direct extraction via prompt manipulation or side-channel inference.
Vector & Embedding Weaknesses
RAG retrieval poisoning, embedding-collision attacks, vector store enumeration, cross-tenant retrieval leakage in shared embedding databases.
Misinformation
Reliance on hallucinated content for production decisions. Lack of human-in-the-loop on high-stakes outputs. False citation generation.
Unbounded Consumption
Cost-amplification attacks: prompt amplification, recursive agent loops, expensive context-window abuse — denial-of-wallet for LLM-powered SaaS.
FOUR-PHASE METHODOLOGY
AI / LLM Penetration Testing — From Architecture Review to Exploit
AI security testing requires both classical pen-test rigour and AI-specific expertise. We deliver both.
Architecture & Threat Model
OWASP LLM Top 10 Coverage
Agentic System Exploitation
Report & Retest
Verified Accreditations Auditors Accept
Every accreditation independently issued by a recognised UK certification body. Click CREST to verify our membership.
COMPLIANCE READY
AI Security Reports Mapped to Every Framework
Findings tagged to OWASP LLM IDs and the AI-specific compliance frameworks UK regulators are establishing.
OWASP LLM Top 10 (2025)
Full coverage of LLM01-LLM10. Each finding pre-mapped to the specific OWASP LLM ID for audit submission.
NCSC AI Code of Practice
UK government’s AI Cyber Security Code of Practice — secure by design, ongoing security, supply chain, transparent reporting.
EU AI Act
High-risk AI system security obligations — Article 15 (cyber security), Article 16 (provider obligations), conformity assessment evidence.
ISO 42001 + ISO 27001
ISO/IEC 42001 (AI management system) plus ISO 27001 A.5.30 (ICT readiness) — combined evidence for AI-driven products.
FCA / PRA AI Risk
UK financial regulator AI risk frameworks (FCA AI Discussion Paper, PRA SS1/23 third-party risk) — AI testing evidence for regulated firms.
NIS2 + DORA
AI-driven essential services / financial entities — AI testing supports operational resilience evidence under NIS2 and DORA.
TRANSPARENT PRICING
Transparent AI / LLM Penetration Testing Pricing
All tiers cover the OWASP LLM Top 10. Price varies by AI system complexity — agent capabilities, RAG breadth, fine-tuning scope.
Depends on AI system complexity
Single LLM-powered chatbot, basic RAG (≤100 documents), no agent tools. Typically 5-7 day engagement.
- ✓Free retests included
- ✓Free rescheduling
- ✓No cancellation fees
- ✓24-hour scope to active testing
- ✓Live findings to client portal
- ✓Executive + technical report
- ✓60-min walkthrough call
- ✓Letter of attestation
Depends on AI system complexity
Multi-tool agent system, complex RAG pipeline, fine-tuned model, multi-tenant. Typically 8-12 day engagement.
- ✓Free retests included
- ✓Free rescheduling
- ✓No cancellation fees
- ✓24-hour scope to active testing
- ✓Live findings to client portal
- ✓Executive + technical report
- ✓60-min walkthrough call
- ✓Letter of attestation
Depends on AI system complexity
Production AI platform, multi-agent orchestration, regulated AI use case (FCA, NHS), custom-trained models. Typically 12-18 day engagement.
- ✓Free retests included
- ✓Free rescheduling
- ✓No cancellation fees
- ✓24-hour scope to active testing
- ✓Live findings to client portal
- ✓Executive + technical report
- ✓60-min walkthrough call
- ✓Letter of attestation
AI Penetration Testing for Your Sector
AI use cases vary by sector. We test the AI controls your industry’s regulators are establishing.
Fintech
FCA-regulated firms, Open Banking, payment APIs, PCI scoping.
SaaS
Multi-tenant isolation, SSO/SAML/OIDC, customer-data perimeter, SOC 2 evidence.
Healthcare
NHS DSPT, NHS DTAC, EHR integration, telehealth, patient-data PII.
Insurance
FCA / PRA Operational Resilience, claims data, broker integrations, cyber underwriting evidence.
Law
Privileged-data confidentiality, partner-tier scrutiny, SRA Cyber Standard alignment.
Public Sector
CCS / G-Cloud framework, NCSC-aligned, SC-cleared testers available.
What You Actually Get
Five things that distinguish our service from automated scans and box-tick competitors.
What You Get From AI Penetration Testing
OWASP LLM Top 10 Aligned
NCSC AI Code Aligned
Practical AI Security Experience
UK CREST + IASME + ISO 27001 + ISO 9001
Frequently Asked
What is AI / LLM penetration testing?
AI penetration testing is the manual security assessment of AI-powered applications — chatbots, RAG systems, agentic platforms, fine-tuned models. Aligned to the OWASP Top 10 for LLM Applications and the NCSC AI Cyber Security Code of Practice. Tests prompt injection, jailbreak, training-data poisoning, model theft, RAG poisoning, and excessive agency in agent systems.
How is AI testing different from regular pen testing?
Regular pen testing covers OWASP Top 10 (web), MASVS (mobile), API Top 10, infrastructure CVEs. AI testing covers OWASP LLM Top 10 — prompt injection, jailbreak, RAG manipulation, training-data poisoning, model theft, agent tool abuse. Different attack surface, different exploitation techniques. Both should be tested for AI-powered applications.
How long does AI penetration testing take?
Single chatbot or RAG system: 5-7 working days. Agent system or fine-tuned model: 8-12 days. Enterprise AI platform with regulated use case: 12-18 days. Test duration is determined during scoping based on system complexity, agent capabilities, and RAG breadth.
How much does AI penetration testing cost in the UK?
Chatbot / basic RAG: £6,000-£12,000. Agent system / complex RAG: £12,000-£25,000. Enterprise AI platform: £25,000+. All quotes are fixed-price after scoping. UK day rates for CREST + AI specialist testers are £1,200-£2,000 per day.
Do you test against the OWASP LLM Top 10?
Yes. Every AI engagement covers all 10 categories of the OWASP Top 10 for LLM Applications (2025 edition). Findings tagged to specific OWASP LLM IDs (LLM01:2025 Prompt Injection, LLM02:2025 Sensitive Information Disclosure, etc.) for audit submission.
Do you test prompt injection attacks?
Yes. Prompt injection (LLM01:2025) is the #1 attack against production LLM applications. We test direct prompt injection (user input attempting to override the system prompt), indirect prompt injection (malicious content in RAG documents or web pages the agent processes), and stored prompt injection (malicious content persisted in vector stores).
Can you test agentic AI systems?
Yes. Agentic system testing is a major focus. We test tool-permission abuse chains, where prompt injection coerces the LLM agent into using its authorised tools (email send, file write, database query, financial transaction) for unauthorised purposes. This is OWASP LLM06:2025 (Excessive Agency).
Do you test RAG systems and vector databases?
Yes. RAG testing covers OWASP LLM08:2025 (Vector & Embedding Weaknesses) — retrieval poisoning attacks, embedding-collision attacks, cross-tenant retrieval leakage in shared vector databases (Pinecone, Weaviate, Qdrant, pgvector), and confidential-data exfiltration via clever query construction.
Do you test against the NCSC AI Code of Practice?
Yes. The UK government’s AI Cyber Security Code of Practice provides emerging baseline AI security obligations. Our testing is aligned to its principles — secure by design, secure development, secure deployment, secure operation, ongoing security, security reviews.
Can you test fine-tuned models for backdoors?
Yes. Backdoor detection in fine-tuned models is an advanced AI testing capability. We test for trigger phrases, adversarial examples that bypass safety filters, and supply-chain compromise through tainted fine-tuning datasets or malicious LoRA adapters.
Are your testers UK-based and what AI experience do they have?
All AI testers are vetted UK or international engineers with hands-on AI security experience. Relevant background: practical LLM application development, OWASP LLM Top 10 contributor experience, AI red teaming community participation, plus traditional pen-test certifications (CREST CRT, OSCP).
Do you sign NDAs?
Yes. Standard NDA before any technical detail is shared. AI engagements often involve highly proprietary system prompts, training data, and model weights — we operate under custom MSAs that include AI-specific data handling and IP clauses.
Book an AI Penetration Test Scoping Call
30 minutes with a CREST + AI specialist tester. Fixed-price quote within 24 hours. No sales pipeline.







