AI Supply ChainAI SecurityGRCAI GovernanceDefinitive Guide
Securing the AI Ecosystem: Supply Chain Flow, Risks, Dependencies, Best Practices & the Road Ahead
JK
Juno David K
Strategic Delivery Leader · Information Security & AI Governance Practitioner
PublishedApril 2026
Reading Time30 min read
Layers Covered8 Supply Chain Layers
Why the AI Supply Chain Demands a New Security Map
The phrase "software supply chain security" entered the mainstream consciousness after SolarWinds in 2020. It became operational reality after Log4j in 2021. It became a systemic emergency in 2025 and 2026, with wave after wave of coordinated attacks targeting package registries, CI/CD pipelines, and developer tooling. But even as organisations scrambled to respond to those incidents, a parallel and more complex supply chain was growing underneath them: the AI supply chain.
The AI supply chain is not simply a new version of the software supply chain. It is categorically more complex, more interdependent, and more vulnerable to attack vectors that have no equivalent in traditional software security. A conventional application depends on libraries and infrastructure. An AI system depends on those things plus training data whose integrity determines the model's behaviour, model weights that can be tampered with invisibly, foundation model providers whose security posture you inherit entirely, orchestration frameworks with their own dependency trees, agent protocols connecting your AI to external services with varying trust levels, and a runtime environment where the boundary between instructions and data — between what the system is told to do and what it might be manipulated into doing — is permeable by design.
This guide is a practitioner's complete map of that supply chain. Not a policy document. Not a compliance checklist. A working framework for understanding every layer of the AI ecosystem, the specific risk profile of each layer, how they cascade into each other, what controls actually reduce risk at each layer, where current frameworks fall short, and what the industry must build next.
🗺️
How to Use This Guide
This article is designed to be read sequentially for a complete picture, or referenced by layer for specific guidance. Security practitioners will find the most operational value in Sections 3, 6, 7, and 9. GRC and governance leaders should focus on Sections 4, 8, 10, and 11. Executives and CISOs looking for a strategic summary should start with Sections 1, 5, and 11.
The AI Supply Chain Is Not One Thing — It Is Eight Layers
One of the most common errors in AI security discussions is treating the AI supply chain as if it were a single entity with a single set of controls. It is not. It is a stack of eight interdependent layers, each with its own threat actors, attack vectors, and defensive requirements. Weakness at any layer propagates upward through the entire stack. The NSA's March 2026 guidance on AI and ML supply chain risks formalises this layered model — framing AI as a supply chain in its own right, where data, models, software, infrastructure, hardware, and third-party services are interconnected components that influence confidentiality, integrity, and availability across the entire digital operation.
Understanding the map is the prerequisite for everything that follows. Here are the eight layers of the AI supply chain, their components, and their inherent risk level.
📊
Layer 1 — Data Foundation
Raw Data, Training Data & Fine-Tuning Data
Web-scraped corpora, licensed datasets, proprietary business data, fine-tuning datasets, RAG knowledge bases, synthetic data pipelines. The source of truth for model behaviour — and the earliest point at which an adversary can permanently alter that behaviour without touching code.
Critical
🧠
Layer 2 — Foundation Models
Model Weights, Checkpoints & Model Artefacts
Base models from providers (OpenAI, Anthropic, Google, Meta, Mistral), open-weight models from Hugging Face (2.5M+ public models as of Feb 2026), fine-tuned derivatives, GGUF quantised files, embedded model objects. Model artefacts can contain executable code that runs on load — the pickle serialisation problem is largely unresolved at scale.
The open-source Python and JavaScript frameworks that provide the building blocks for AI applications. Distributed via PyPI and npm — the same registries targeted in the LiteLLM and Axios attacks. Subject to all traditional software supply chain risks plus AI-specific deserialization vulnerabilities (LangGrinch / CVE-2025-68664).
Critical
⚙️
Layer 4 — AI Tools & Orchestration
LiteLLM, LangGraph, Flowise, Dify, PromptFlow, AI Gateways
The middleware layer that routes requests between applications and AI providers, manages context, orchestrates multi-agent workflows, and aggregates credentials for multiple model providers. LLM gateways like LiteLLM are credential aggregators by design — a single compromise yields API keys for every connected provider simultaneously.
Critical
🔌
Layer 5 — Agent Protocols & MCP
Model Context Protocol, A2A, Agent-to-Agent Communication, Tool Integrations
The connective tissue of agentic AI systems. MCP servers connect AI agents to external tools, databases, APIs, and file systems. Over 8,000 MCP servers were publicly exposed as of January 2026. A compromised MCP server can poison the context shared by every agent that connects to it, without any interaction with the model's weights.
Critical
📱
Layer 6 — AI Applications & Products
RAG Systems, AI Copilots, Chatbots, Autonomous Agents, AI-Embedded SaaS
The end-user-facing layer where AI capabilities are assembled and deployed. Inherits all risks from every layer below it. Subject to its own attack surface: prompt injection from user inputs, RAG knowledge base poisoning, output manipulation, jailbreaks, and the expanding attack surface of agentic systems with real-world action capabilities.
The automation layer that builds, tests, and deploys AI systems. As the TeamPCP campaign demonstrated, CI/CD pipelines are privileged attack surfaces — they have access to publishing credentials for package registries, deployment keys for production environments, and the credentials of every service the pipeline touches. Security tools in the pipeline are themselves supply chain components.
Critical
☁️
Layer 8 — Infrastructure & Cloud
Compute (GPU/TPU), Storage, Networking, Cloud AI Services, Hardware Supply Chain
The physical and virtual substrate on which all AI computation runs. AI workloads introduce specific infrastructure risks: GPU memory side-channel attacks, training environment isolation failures, model storage exposure (Anthropic's Cloudflare R2 bucket was accessible after the Claude Code leak), and the geopolitical dimension of hardware supply chains (chip export controls, embedded hardware backdoors in adversarial supply chains).
High
⚠️
The Cascade Principle
A compromise at any layer propagates to every layer above it. Poisoned training data produces a compromised model. A compromised model infects every application built on it. A compromised framework or tool taints every application using it. A compromised CI/CD pipeline can corrupt every layer simultaneously. This is not a metaphor — it is the literal attack pattern demonstrated by the Trivy → LiteLLM → downstream organisations cascade in March 2026. Securing the AI supply chain requires treating every layer as a potential attack entry point, not just the application layer that users interact with.
Layer-by-Layer Risk Catalogue
Each layer carries a distinct risk profile. The following analysis details the primary threat categories, real-world attack examples, and the specific characteristics that make each layer uniquely vulnerable. Where traditional software security models break down for AI, this is called out explicitly.
L1 — Data
Training Data & RAG Knowledge Base Risks
Primary Threats
Data poisoning — injecting malicious or biased samples into training, fine-tuning, or RAG data to alter model behaviour. Research in 2025 demonstrated that as few as 250 poisoned documents injected into a RAG knowledge base can implant persistent backdoors that activate under specific trigger phrases while leaving general performance unchanged — making detection extremely difficult without dedicated testing. Model inversion and membership inference — reverse-engineering training data from model outputs, potentially reconstructing PII, proprietary content, or sensitive training samples that were never intended to be accessible. Bias injection — systematically skewing training distributions to produce outputs that serve an adversary's agenda while appearing statistically normal.
What Makes This Uniquely Dangerous
Data poisoning attacks happen before the model is built. By the time a model is trained, evaluated, and deployed, the attack is complete and the effects are baked into the model's weights. Unlike a software vulnerability that can be patched, a poisoned model may need to be entirely retrained from clean data — a process that costs millions of dollars and months of time for large foundation models. The Virus Infection Attack (VIA) research from 2025 demonstrated an even more alarming variant: poisoned content that propagates through synthetic data pipelines across multiple model generations, silently amplifying the attack beyond its source.
Data PoisoningModel InversionBias InjectionMembership InferenceRAG Poisoning
L2 — Models
Model Weight Tampering, Serialisation Attacks & Backdoors
Primary Threats
Serialisation attacks — Python's pickle format, widely used for model serialisation, allows arbitrary code execution when a file is loaded. Research confirmed in 2025 that malicious model files continue to be uploaded to Hugging Face (which hosts over 2.5 million public models) without being flagged by standard security scanning. Loading a compromised model is equivalent to executing arbitrary code — the attack is triggered by the act of using the model. Backdoor poisoning — hidden triggers embedded in model weights that cause targeted misclassification or malicious behaviour when a specific input pattern is present, while performing normally on all other inputs. Model weight tampering — post-training modification of weights that degrades performance in targeted ways, biases outputs toward specific conclusions, or introduces latent vulnerabilities.
The Rug Pull Problem
Model registries like Hugging Face operate on implicit trust — a published model is assumed to be what its description claims. Rug pull attacks replace a legitimate model that has accumulated downloads and trust with a malicious version, exploiting the reputation the original model built. This mirrors the compromised npm maintainer account pattern seen in Axios. At 2.5 million+ public models and growing, the scale of potential exposure from a single well-timed rug pull on a widely-downloaded model is significant. LangGrinch (CVE-2025-68664) demonstrated a related vector: a serialisation injection vulnerability in LangChain Core that allows arbitrary code execution through crafted metadata in model artefacts.
Package registry attacks — the LiteLLM (PyPI) and Axios (npm) compromises are textbook examples. AI frameworks distribute via the same open registries as all other software, with the same trust model (maintainer account credentials as the sole gate) and the same attack patterns (compromised credentials, malicious dependencies, typosquatting). The attack surface is compounded because AI framework packages are installed in development environments that have access to API keys, cloud credentials, and model weights. Transitive dependency risk — LangChain alone has over 250 direct and transitive dependencies. A compromise anywhere in that tree can reach the application layer.
AI-Specific Amplification
A compromised AI framework package is significantly more dangerous than a compromised general-purpose library because of what it has access to. In a typical AI developer environment, the framework process has access to: LLM API keys (OpenAI, Anthropic, etc.), cloud provider credentials, training datasets, model artefacts, fine-tuning pipelines, and any sensitive data being passed to or from models. The LiteLLM attack exploited exactly this: because LiteLLM acts as a centralised gateway for multiple AI providers, a single compromise yielded the API credentials for every provider the victim used simultaneously.
LLM gateways (LiteLLM, LangFuse, PromptLayer, custom proxies) are architectural credential aggregators. An organisation using a centralised gateway to route requests to OpenAI, Anthropic, Google Gemini, Azure OpenAI, and Cohere stores credentials for all five providers in a single system. Compromising that system yields a credential harvest worth orders of magnitude more than any individual provider key. Beyond credentials, gateways typically log all model inputs and outputs — including any sensitive data passed to models — creating a second exfiltration surface. Context and memory stores used by orchestration frameworks (LangGraph state, AutoGen agent memory, vector databases) add a third surface: persistent data structures that can be poisoned to alter agent behaviour across sessions.
The Agentic Blast Radius
Agentic AI systems — those with the ability to take autonomous actions: browsing the web, executing code, writing files, calling APIs, sending emails — amplify the consequences of any compromise exponentially. A traditional application vulnerability limits the attacker to what the application can do. A compromised AI agent is limited only by what the agent has been permitted to do — and in many early deployments, agent permissions were granted broadly. A single poisoned agent, operating at machine speed, can cause damage equivalent to a human attacker with months of access in a matter of minutes. The Barracuda Security report from late 2025 identified tool misuse and privilege escalation as the most common agentic AI incidents, with 520 documented cases in 2025 alone.
Prompt injection — malicious instructions embedded in untrusted inputs (user messages, retrieved documents, web content, emails, files) that override the model's intended instructions. OWASP LLM Top 10 ranks this as LLM01 — the number one risk for language model applications. In agentic systems, prompt injection can cause an agent to execute unintended actions, exfiltrate data through legitimate-looking API calls, or forward stolen information via normal communication channels. GitHub's MCP server was exploited in a documented case where a malicious issue injected hidden instructions that hijacked an agent and triggered data exfiltration from private repositories. RAG knowledge base poisoning — injecting malicious documents into the retrieval index so that the model retrieves and acts on attacker-controlled content, even while behaving normally on all other queries.
The Lethal Trifecta
Security researcher Simon Willison identified a specific configuration pattern in June 2025 that maximises prompt injection risk: the simultaneous presence of (1) private data the agent has access to, (2) untrusted content the agent processes (web pages, documents, emails), and (3) external communication capabilities (email sending, API calls, webhooks). When all three are present, a single crafted document in the agent's processing path can exfiltrate all private data it has access to through what appear to be legitimate outbound communications. Most production agentic AI deployments have all three components. Most do not have controls to prevent this specific attack pattern.
CI/CD pipelines are the most dangerous layer in the AI supply chain from an attacker's perspective — not because they are uniquely vulnerable, but because of what they have access to. A compromised CI/CD pipeline can: exfiltrate all secrets accessible to the runner (publishing tokens, cloud credentials, API keys, deploy keys, model registry credentials); inject malicious code into build artefacts that will be deployed to production; replace legitimate model versions with backdoored alternatives; poison training data pipelines before data reaches the training job; and publish compromised packages to registries under the compromised organisation's trusted identity. The TeamPCP campaign's core insight — that security scanners in CI/CD pipelines are the most valuable targets because they have the broadest access — is the defining lesson of 2025–2026 supply chain security. The security tooling in your pipeline is itself a supply chain component that must be secured, version-pinned, hash-verified, and network-controlled with the same rigour as any other privileged system.
The eight layers described above do not exist in parallel — they form a directed dependency graph in which every higher layer is only as secure as the lower layers it depends on. Understanding this dependency topology is essential for prioritising where to invest security controls, and for understanding why the "build it and secure it at the application layer" approach that most organisations default to is fundamentally insufficient for AI systems.
Foundation models depend on: the integrity of their training data (L1), the security of the compute infrastructure used for training (L8), and the trustworthiness of the organisations that trained and published them. An organisation deploying GPT-4o or Claude Sonnet inherits the security posture of OpenAI and Anthropic respectively — including their supply chain security practices, their employee vetting, and their internal development security controls. This inherited trust is largely invisible in most organisations' risk registers.
AI frameworks depend on: the Python and JavaScript package ecosystem integrity (PyPI and npm), the security of their own development pipelines, and the trustworthiness of their transitive dependency trees. LangChain's 250+ dependencies mean that a compromise anywhere in that tree can reach every LangChain application. AutoGen, CrewAI, and the growing ecosystem of agentic frameworks share this exposure profile.
AI applications depend on: every layer below them simultaneously. A RAG-based enterprise chatbot depends on the integrity of its knowledge base (L1 risk), the trustworthiness of the foundation model it calls (L2 risk), the security of the LangChain or similar framework it uses (L3 risk), the security of the LiteLLM or similar gateway managing its API credentials (L4 risk), the security of every MCP server it connects to (L5 risk), and the integrity of its CI/CD pipeline (L7 risk). A security assessment of the application alone — without assessing the dependencies — is evaluating one node in an eight-layer graph while ignoring seven layers of potential entry points.
🔗
The Hugging Face Scale Problem
As of February 2026, Hugging Face hosts over 2.5 million public models. The platform's security scanning flags known malware signatures — but novel malware hiding in serialised model files routinely evades detection. Research from 2025 confirmed that malicious model files were being uploaded and downloaded without triggering security checks. For organisations fine-tuning or deploying community models from Hugging Face, each download is a potential supply chain compromise. At 2.5 million models and growing, manual review is impossible. The industry does not yet have a reliable, scalable solution to this problem.
Key Observations from 2025–2026 Incidents
The incidents of the past 18 months are not random events — they are data points in a pattern that reveals the systematic attack strategy of sophisticated threat actors against AI infrastructure. Taken together, they yield eight observations that should reshape how organisations think about AI security.
🎯
Observation 01
Security Tools Are the Primary Target, Not the Perimeter
The TeamPCP campaign's defining innovation was targeting Trivy — a security scanner — as the entry point, rather than attacking the production systems it was intended to protect. By compromising the tool used to find vulnerabilities, attackers gained access to the credentials of every organisation that ran that tool in CI/CD. This is not a one-time tactic. It is a structural insight: security tools in automated pipelines have privileged access to exactly the secrets that matter, and they are trusted implicitly.
🏛️
Observation 02
Nation-State Actors Are Active in Open-Source Ecosystems
Google Threat Intelligence's attribution of the Axios attack to UNC1069 — a North Korea-nexus threat actor — confirms that state-sponsored actors view npm and PyPI as active attack terrain. The the backdoor malware RAT deployed via the Axios compromise was a sophisticated, cross-platform tool designed for credential harvest and persistent access. The 56% year-over-year increase in AI-related incidents recorded in the 2025 Stanford AI Index is consistent with escalating nation-state interest in AI infrastructure.
🔁
Observation 03
Three-Hour Windows Are Sufficient for Mass Compromise
Both the Axios and LiteLLM attacks had exposure windows of approximately three hours before removal. Yet both caused mass credential compromise across thousands of environments. This is the result of automated, always-on CI/CD pipelines that continuously pull dependency updates. The "pull latest compatible version" default in package management means that the three-hour window between publication and removal is sufficient for global scale compromise. Periodic security scanning provides no meaningful protection in this threat model.
📦
Observation 04
AI-Specific Packages Are Higher-Value Targets Than Generic Libraries
The targeting of LiteLLM (an AI gateway) rather than a generic utility library reflects attacker understanding of the AI tooling landscape. An LLM gateway that aggregates credentials for multiple providers is worth far more to an attacker than a general-purpose HTTP client — even one with 100 million weekly downloads. The attack logic is clear: target the components that aggregate the most valuable credentials. AI tools have a disproportionate credential value relative to their download count.
🕸️
Observation 05
88% of AI Agent Deployments Have Experienced an Incident
Security firm Beam AI reported that 88% of organisations using AI agents had faced a confirmed or suspected security incident in the prior year. Cisco's State of AI Security 2026 found that 72% of security leaders are seeing unprecedented risk levels. Only 29% of organisations planning agentic AI deployment reported feeling prepared to secure it. The gap between deployment velocity and security readiness is the defining characteristic of the current AI security landscape.
💉
Observation 06
Data Poisoning Has Crossed from Research to Active Exploitation
The Australian Cyber Security Centre (ACSC) confirmed in its 2026 AI supply chain risk guidance that adversarial actors are actively poisoning AI training data — this is no longer a research concern. Joint reporting from intelligence agencies in 2025 confirmed multiple verified data poisoning incidents against production AI systems. The a published MCP attack framework research demonstrated 72% success rates attacking MCP servers with poisoned tool descriptions across 45 real servers tested — not a lab environment.
🌊
Observation 07
Attacks Are Designed to Cascade Across Ecosystem Layers
The Trivy → Checkmarx → LiteLLM cascade, and the concurrent Axios attack on the same ecosystem, demonstrate that sophisticated actors understand the dependency graph of the AI ecosystem and design attacks to propagate across it. A single entry point at a security tool (L7) yields credentials that enable attacks on an AI framework (L3), which enables attacks on downstream AI applications (L6). The cascade is architectural, not accidental.
🔍
Observation 08
Post-Incident Detection Methods Fail Against Purpose-Built Evasion
The Axios malware was designed to defeat post-incident investigation: after execution, the dropper deleted itself and replaced the package manifest with a clean version. Running npm audit or manually inspecting the installed package after the fact revealed nothing. The Axios compromise was detected by manual external observation, not by any automated security control. This reflects a broader pattern: supply chain malware is increasingly designed with forensic countermeasures that render traditional post-incident analysis ineffective.
56%
YoY increase in AI-related incidents (Stanford AI Index 2025)
88%
AI agent deployments with confirmed or suspected security incident
2.5M+
Public models on Hugging Face as of February 2026
72%
a published MCP attack framework attack success rate on real MCP servers tested
AI-Specific Risks With No Traditional Equivalent
Traditional software security frameworks were built on the assumption that a system's behaviour is determined by its code. In AI systems, behaviour is determined by both code and data — and the boundary between instructions and data is permeable. This fundamental difference creates seven categories of risk that have no direct equivalent in traditional software security and for which traditional controls are inadequate or entirely absent.
Risk Category 01
Data Poisoning — Attacks Before the System Exists
Traditional security protects systems that exist. Data poisoning attacks systems before they exist — corrupting the training data that will determine behaviour at a point when there is no system to defend yet. A poisoned dataset produces a compromised model. There is no patch for a poisoned model; the only remediation is retraining from clean data. For large foundation models, this means an adversary who successfully poisons training data can inflict a remediation cost of tens of millions of dollars that no security tool can prevent post-hoc.
Risk Category 02
Pickle Serialisation — Model Files as Executable Code
Python's pickle format — the dominant serialisation format for AI model weights — is fundamentally a code execution format, not a data format. Loading a pickle file executes arbitrary Python code. This is not a vulnerability; it is a design characteristic. Security scanning tools can detect known malicious pickle patterns but cannot detect novel payloads. With 2.5 million models on Hugging Face and no reliable automated scanning for novel pickle exploits, every model download from an untrusted source is a potential remote code execution event. The industry has not solved this problem — safer serialisation formats (safetensors) exist but adoption is incomplete.
Risk Category 03
Prompt Injection as Supply Chain — Runtime Poisoning via Context
Prompt injection is not just an application-layer concern — it is a supply chain attack that operates at runtime. Malicious content placed in any data source the AI agent accesses (web pages, documents, emails, database records, tool outputs) can poison the agent's context and redirect its behaviour. Unlike traditional code injection that targets a specific code path, prompt injection targets the model's instruction-following behaviour — which is intentional by design. This makes it structurally different from any injection vulnerability in traditional software.
Risk Category 04
Model Extraction — Your Training Data Is Not Confidential
Model inversion and membership inference attacks allow adversaries to reconstruct information about training data by querying a deployed model. A model trained on PII, trade secrets, proprietary research, or sensitive business communications may be queried in ways that reconstruct that information without ever accessing the training data or model weights directly. Regulatory frameworks including GDPR and the EU AI Act are beginning to impose requirements on training data governance — but most existing AI deployments were not designed with data extraction attacks in the threat model.
Risk Category 05
Memory Poisoning in Agents — Persistence Without Persistence Mechanisms
Agentic AI systems that maintain long-term memory (vector databases of past interactions, explicit memory stores, learned preferences) introduce a new attack class: memory poisoning. An attacker who can inject content into an agent's memory store can alter the agent's behaviour not just in the current session but in all future sessions — without modifying any code, configuration file, or model weight. The poisoned memory persists silently, activating when specific conditions are met. This creates attacker persistence in AI systems through a mechanism that traditional incident response playbooks are not designed to detect.
Risk Category 06
The Lethal Trifecta — Exfiltration Via Normal Operation
The "lethal trifecta" (Willison, 2025) describes a configuration where private data + untrusted content + external communication capabilities coexist in an AI agent. When all three are present, an attacker who controls any piece of untrusted content the agent processes can craft an instruction that causes the agent to exfiltrate private data through its own legitimate communication channels — email, Slack, webhooks, API calls. The exfiltration looks like normal agent activity. It triggers no anomaly detection. It produces no access log entry that distinguishes it from legitimate behaviour. Most enterprise agentic AI deployments have all three components of the trifecta without the controls to prevent this specific attack.
Traditional attackers are bounded by human speed. An agentic AI system operating at machine speed can be redirected to cause damage orders of magnitude faster than any human attacker. A compromised trading agent can execute thousands of trades in seconds. A compromised code review agent can approve malicious pull requests across hundreds of repositories simultaneously. A compromised email agent can exfiltrate an entire corporate mailbox in minutes. The blast radius of a compromised agent is determined by its access permissions — and early agentic AI deployments frequently granted permissions that were not yet understood to be dangerous.
Risk Category 08
Synthetic Data Propagation — Poison That Breeds
The Virus Infection Attack (VIA) demonstrated in 2025 that poisoned content can be designed to survive in synthetic data generation pipelines, propagating through model fine-tuning cycles and producing increasingly large quantities of poisoned synthetic data across model generations. Unlike a one-time data poisoning attack, a VIA-type attack is self-amplifying: the initial poisoning grows through the data lifecycle of any system that uses synthetic data generation to augment training sets. This is among the most alarming AI-specific risks identified in recent research because it exploits the increasingly common practice of using AI-generated data to train AI systems.
The MCP Risk Layer in Depth
The Model Context Protocol (MCP), introduced by Anthropic in late 2024 and rapidly adopted as the standard interface for connecting AI agents to external tools and data sources, has become the fastest-growing new attack surface in the AI ecosystem. As of January 2026, over 8,000 MCP servers were publicly exposed to the internet — many with default configurations that exposed full agent conversation histories, environment variables including API keys, and complete tool configurations. The a widely-adopted MCP-based tool incident in January 2026 demonstrated the consequences of default-insecure MCP configurations at scale, within 72 hours of the tool going viral.
Checkmarx's research identified eleven distinct risk categories specific to MCP's protocol architecture. Five are particularly critical for practitioners to understand:
Tool Poisoning at the Protocol Level
MCP tool descriptions — the natural language text that tells an AI agent what a tool does and when to use it — flow directly into the LLM's context window. An attacker who can control or modify a tool description can embed hidden instructions that alter agent behaviour without modifying any application code or model weights. a published MCP attack framework, a research framework testing this attack across 45 real MCP servers, achieved success rates of 72% — meaning three out of four real MCP server tools could be exploited to redirect agent behaviour through poisoned tool descriptions. This is supply chain poisoning at the protocol level: the attack vector is the metadata, not the code.
Context Pollution — Supply Chain via Shared Context
In multi-agent architectures, multiple agents may share a common MCP context store. A compromised or malicious MCP server can inject content into that shared context that propagates misinformation, backdoored configuration values, or hidden instructions to every agent consuming the shared context. This creates a category of attack that Checkmarx terms "supply chain via context" — where the contamination travels through the data layer of the agent ecosystem rather than through the code layer. One compromised MCP server can achieve multi-agent compromise without interacting with any individual agent's code or model.
Rug Pull Attacks on MCP Servers
MCP servers are typically referenced by clients as remote endpoints. A server that behaves legitimately at deployment time can be modified after it has been trusted and integrated — swapping its tools, outputs, or configurations for malicious versions once it has established trust. This pattern mirrors the npm maintainer account compromise in Axios: the trust was earned legitimately, then exploited. For organisations that have integrated third-party MCP servers into production agent workflows, this represents a persistent and largely unmonitored risk.
The Confused Deputy Problem
Agents operating through MCP inherit the trust permissions of the client application, not the trust level of the content they are processing. An agent with permission to write files can be instructed through a poisoned document to write malicious content — acting as a "confused deputy" that uses its legitimate permissions to perform illegitimate actions. This is structurally similar to CSRF in web security, but operating at the AI agent abstraction layer where the attack surface is defined by natural language instructions rather than HTTP requests.
Cross-Agent Compromise in Multi-Agent Systems
The arXiv paper "Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks" (February 2026) formalised an attack pattern where a compromised agent acts as a proxy for attacking downstream agents in a multi-agent pipeline. A compromised research agent can insert hidden instructions into outputs consumed by a downstream financial agent, which then executes unintended trades. A compromised data-retrieval agent can poison the context of a decision-making agent. The chain of trust between agents in multi-agent systems is currently poorly defined and largely unprotected.
🔌
The MCP Security Gap
MCP is approximately 18 months old as a widely-adopted protocol. The security community's understanding of its attack surface is still developing. The CoSAI MCP security framework, released in early 2026, provides the most comprehensive treatment currently available — covering 12 threat categories and controls including strong identity chains, zero-trust for AI agents, and sandboxed tool calls. Organisations deploying MCP in production should treat it as an immature security surface requiring active monitoring, not a solved problem.
AI BOM — The Missing Piece in Every Security Programme
A Software Bill of Materials (SBOM) is an inventory of every software component in an application — every library, framework, and transitive dependency, with version information and provenance. The US Executive Order on Cybersecurity (2021) mandated SBOM for federal software procurement. The EU Cyber Resilience Act will require SBOM for software sold into EU markets. SBOM is becoming the foundation of software supply chain security governance.
For AI systems, SBOM is necessary but insufficient. An SBOM for an AI application tells you which Python packages were installed and at which versions. It tells you nothing about which foundation model was used, which dataset it was trained on, which fine-tuning data was applied, which MCP servers it connects to, or what the provenance of the model weights is. This is the gap that the AI Bill of Materials (AI BOM) is designed to fill.
An AI BOM is a structured inventory of every AI-specific component in an AI system, extending the SBOM concept to cover the full eight-layer supply chain described in this article. The NSA's March 2026 guidance specifically recommends that organisations request AI BOMs from all AI vendors and service providers as a fundamental supply chain due-diligence requirement. The components of a complete AI BOM include:
Foundation model provenance: model name, version, provider, training data description, fine-tuning lineage, model card reference, and any known limitations or biases documented by the provider.
Training and fine-tuning dataset inventory: data sources, collection methodology, preprocessing steps, data lineage, last verification date, and any public disclosure of data poisoning incidents affecting the dataset.
Framework and library versions: the standard SBOM content — extended to include AI-specific libraries with their own supply chain risk profiles (LangChain, PyTorch, HuggingFace Transformers).
MCP server inventory: every MCP server the AI application connects to, their version, their author, their access scope, their last security review date, and their network exposure.
RAG and retrieval data sources: knowledge base contents, update frequency, access controls, and the provenance of ingested documents.
Third-party AI service dependencies: every external AI API, model inference endpoint, or AI-powered service that the application calls, with the same provenance requirements as foundation models.
Agent capability inventory: for agentic systems, a documented inventory of every action the agent can take, every external service it can communicate with, and the permission model governing those capabilities.
Building an AI BOM is not a one-time activity — it is a living governance artefact that must be updated whenever any component changes. The EU AI Act requires providers of high-risk AI systems to maintain technical documentation covering the supply chain of their AI applications. An AI BOM is the practical mechanism for meeting this obligation. Organisations that implement AI BOM practices now will be significantly ahead of the compliance curve when CRA and AI Act enforcement begins in 2027.
Best Practices — The Security Control Framework by Layer
The following control framework is organised by supply chain layer, providing specific, actionable controls for each level of the AI ecosystem. These are not theoretical recommendations — they are the controls that would have prevented or mitigated the major incidents of 2025 and 2026 had they been in place.
Layer 1 — Data Controls
Data 01
Data Provenance Tracking
Establish tamper-proof provenance records for all training, fine-tuning, and RAG data. Document source, collection methodology, preprocessing steps, and data lineage in your AI BOM. Implement cryptographic signing of dataset versions to detect tampering. Treat data provenance as a security-critical artefact equivalent to code signing.
Data 02
Data Integrity Monitoring
Implement statistical baseline profiling for training datasets. Use anomaly detection to identify distribution shifts that may indicate poisoning. Apply differential privacy techniques to reduce the effectiveness of membership inference attacks. For RAG systems, implement access controls and integrity checks on knowledge base content, treating each document as a potential injection vector.
Data 03
Adversarial Testing for Poisoning
Before deploying models trained or fine-tuned on external data, conduct adversarial testing specifically designed to detect backdoor triggers. Test model behaviour on edge cases and synthetic boundary conditions that a backdoor might be designed to activate. Include data poisoning scenarios in your AI red-teaming programme.
Layer 2 — Model Integrity Controls
Model 01
Verified Model Registry
Implement a private, organisation-controlled model registry for all models used in production. Only models that have passed security review, integrity verification, and adversarial testing should be promoted to the registry. Never load models directly from public registries (Hugging Face, etc.) into production without staging, scanning, and approval. Treat every public model as untrusted until proven otherwise.
Model 02
Abandon Pickle Where Possible
Migrate model serialisation from pickle format to safer alternatives — safetensors (Hugging Face's open-source format) provides pickle-equivalent functionality without arbitrary code execution on load. For models where safetensors is not available, implement sandbox execution environments for model loading that prevent network access and file system writes during the load process.
Model 03
Cryptographic Model Signing
Require cryptographic signatures for all model artefacts used in production. Verify signatures before loading. Maintain the complete signing chain from model trainer through registry to deployment, so that any tampering at any point in the chain is detectable. This mirrors code signing practices from the software supply chain and is now recommended explicitly by the NSA's 2026 AI supply chain guidance.
Layers 3 & 4 — Framework and Tooling Controls
Framework 01
Dependency Pinning and Hash Verification
Pin all framework and library dependencies to exact versions with cryptographic hash verification. Use pip install --require-hashes for Python and npm ci with package-lock.json for JavaScript. Remove semantic version operators (^, ~) from all production dependency files. Implement allowlisting for package registries in CI/CD network egress. These controls would have prevented both the Axios and LiteLLM compromises.
Framework 02
Real-Time SCA Monitoring
Deploy Software Composition Analysis tools that provide real-time alerts when a package in your dependency tree appears in a known-malicious registry entry — not just periodic scanning. The three-hour attack windows of 2026 require detection that operates in the same timeframe. Integrate SCA alerting into your security operations workflow with defined response procedures for dependency compromise alerts.
Framework 03
LLM Gateway Credential Isolation
Do not use centralised LLM gateways (LiteLLM, custom proxies) that aggregate credentials for multiple AI providers in a single system unless you can adequately secure that system. If you do use an aggregated gateway, treat it as a crown-jewel system with HSM-backed credential storage, network isolation, privileged access controls, and continuous security monitoring. The credential aggregation architecture is a feature with a corresponding security obligation.
Layer 5 — MCP and Agent Protocol Controls
MCP 01
MCP Server Allowlisting
Maintain an explicit allowlist of approved MCP servers for each agent deployment. Block all unapproved MCP connections at the network layer. Review and approve every MCP server before allowing it in production — including auditing its tool descriptions for hidden instruction injection. Treat MCP servers as privileged dependencies requiring the same security review as any other third-party service integration.
MCP 02
Minimal-Permission Agent Design
Apply strict least-privilege to every AI agent. Define precisely what actions each agent needs to perform its intended function and grant only those permissions. Specifically: do not grant write access when read access suffices; do not grant external communication when only internal operations are needed; do not allow file system access beyond explicitly needed paths. The agentic blast radius is directly proportional to the permissions granted.
MCP 03
Human-in-Loop for High-Impact Actions
Implement mandatory human approval checkpoints for any agent action with irreversible consequences: financial transactions, data deletion, external communications, code deployment, access control changes. The "wrong action that can't be undone" is the defining risk of agentic AI as described at RSAC 2026. No agent should be able to take irreversible high-impact actions without a human confirmation step in the workflow, regardless of how sophisticated the agent appears.
Migrate all package publishing and deployment workflows from long-lived stored credentials to OIDC Trusted Publishing. With OIDC, a PyPI or npm publishing token is never stored in CI/CD — it is minted fresh for each run from the OIDC identity of the pipeline. The LiteLLM attack succeeded because a long-lived PyPI token was accessible from the CI/CD runner. OIDC eliminates that attack vector entirely.
CICD 02
Security Tool Version Pinning
Pin every CI/CD security tool — vulnerability scanners, SAST tools, container image scanners, dependency checkers — to specific, hash-verified versions. Do not allow security tools to auto-update from external registries during CI/CD runs. The TeamPCP campaign succeeded precisely because Trivy was pulled without a pinned version. Your security toolchain must be treated with the same supply chain security rigour as your production code.
CICD 03
Network Egress Allowlisting for Build Environments
Implement strict outbound network allowlists for CI/CD runners. Build pipelines should only be permitted to communicate with explicitly approved domains — package registries, internal services, approved model endpoints. Both the Axios RAT and LiteLLM credential stealer communicated with external C2 domains ([C2 domain], [exfiltration endpoint]) that had no legitimate reason to be in a build pipeline's network path. An allowlist would have blocked exfiltration even after initial compromise.
Governance and GRC Controls
GRC 01
Implement and Maintain AI BOM
Build and maintain an AI Bill of Materials for every AI system in production. Include foundation model provenance, dataset lineage, framework versions, MCP server inventory, RAG data sources, and agent capability inventory. Integrate AI BOM generation into CI/CD. Monitor AI BOM components against real-time threat intelligence feeds. The AI BOM is both a security control and the foundation of EU AI Act compliance.
GRC 02
Extend ISMS Scope to AI Supply Chain
Under ISO 27001, explicitly scope your ISMS to include AI supply chain assets: model registries, training pipelines, fine-tuning datasets, LLM gateways, MCP servers, and AI-specific CI/CD workflows. Add AI supply chain risk scenarios to your risk register. Assess control effectiveness against the specific threat patterns documented in 2025–2026 incidents — not just generic software supply chain risks.
GRC 03
AI-Specific Incident Response Playbooks
Develop and test incident response playbooks for AI-specific scenarios: training data poisoning discovery, compromised model artefact, MCP server compromise, agent memory poisoning, and prompt injection-based data exfiltration. Current IR playbooks assume human attackers operating at human speed — agentic AI incidents can cause disproportionate damage in minutes. Test playbooks in tabletop exercises against these scenarios.
What Frameworks Cover and Where They Fall Short
Multiple established and emerging frameworks address AI security and supply chain risk. None is complete. Understanding what each framework covers — and where the gaps are — is essential for building a governance posture that goes beyond checkbox compliance.
NIST AI RMF
GOVERN · MAP · MEASURE · MANAGE
Supply ChainPartial
Data / ModelGood
MCP / AgenticNo Coverage
AI BOMNot Addressed
Excellent governance structure with well-defined functions. Does not address MCP security, agentic blast radius, memory poisoning, or AI BOM at technical depth. Compliance is achievable without implementing any of the AI-specific controls that would have mitigated the March 2026 incidents.
ISO 42001
AI Management System Standard
Supply ChainPartial
Data / ModelPartial
MCP / AgenticNo Coverage
AI BOMNot Addressed
Clause 6.1 risk management and Annex A controls provide a strong governance foundation for AI risk. Does not yet address MCP security, agentic AI risks, or AI BOM as specific control requirements. Certification is possible without addressing attack scenarios of the type seen in 2025–2026.
OWASP LLM Top 10
2025 Edition
Supply ChainGood
Data / ModelGood
MCP / AgenticEmerging
AI BOMNot Addressed
The most technically comprehensive current reference for AI application security. LLM01 (Prompt Injection) through LLM10 (Unbounded Consumption) cover most application-layer risks well. Limited supply chain depth at the framework level. Actively evolving for agentic AI — the GenAI Security Project adds MCP coverage.
NSA 2026 AI/ML Guidance
March 2026 Joint Publication
Supply ChainStrong
Data / ModelStrong
MCP / AgenticLimited
AI BOMRecommended
The most current authoritative guidance on AI supply chain security. Explicitly recommends AI BOM requests from all AI vendors. Strong on data integrity, model provenance, and software layers. Limited treatment of MCP and agentic AI risks. Excellent baseline for organisational AI supply chain governance programmes.
MITRE ATLAS
Adversarial Threat Landscape for AI Systems
Supply ChainStrong
Data / ModelStrong
MCP / AgenticEmerging
AI BOMNot Addressed
The AI-specific extension of MITRE ATT&CK. Excellent taxonomy of AI attack techniques including data poisoning, model extraction, evasion, and supply chain compromise. Invaluable for threat modelling and red-team exercises. Not yet comprehensive for MCP-specific attack patterns or multi-agent compromise scenarios.
EU AI Act
High-Risk AI Systems Framework
Supply ChainPartial
Data / ModelPartial
MCP / AgenticNo Coverage
AI BOMImplied
Article 11 technical documentation requirements imply AI BOM-like inventory for high-risk AI systems. Supply chain obligations are present but not technically prescriptive. Does not address MCP, agentic AI, or the specific attack patterns documented in 2025–2026. Enforcement begins 2026–2027.
CoSAI MCP Security Framework
2026 White Paper
Supply ChainPartial
Data / ModelLimited
MCP / AgenticStrong
AI BOMNot Addressed
The most comprehensive current treatment of MCP-specific security risks — twelve threat categories with controls including strong identity chains, zero-trust for AI agents, and sandboxed tool calls. Immature as a standard but rapidly developing. Essential reading for any organisation deploying MCP servers in production.
The most significant gap across all current frameworks is the absence of prescriptive guidance on agentic AI security at the operational level. Frameworks acknowledge that autonomous agents introduce new risks, but none provides the specific controls — agent permission models, memory integrity requirements, multi-agent trust boundaries, human-in-loop requirements for high-impact actions — that practitioners need to implement secure agentic AI systems. This is the primary area where new guidance is needed and where the industry must move quickly, given that agentic AI deployment is already outpacing the development of governance frameworks.
The Road Ahead — Seven Areas the Industry Must Advance
The controls and frameworks described in this article address the known threat landscape as it exists today. But the AI ecosystem is evolving faster than security frameworks, faster than regulatory guidance, and faster than most organisations' security programmes can adapt. The following seven areas represent where the industry must make concerted progress to avoid a widening gap between AI capability and AI security.
🏷️
Priority 01
AI BOM Standardisation — From Recommendation to Requirement
The AI BOM concept exists but lacks standardisation. Multiple organisations are developing competing specifications. The industry needs a single, well-specified AI BOM standard that is machine-readable, integrates with existing SBOM tooling, covers all eight supply chain layers described in this article, and is adopted by cloud providers and model registries as a default output. SPDX and CycloneDX (the dominant SBOM formats) have begun adding AI-specific fields — this work needs to be completed, published, and mandated. Without a standard, AI BOM adoption will remain inconsistent and the governance value will be limited.
🔌
Priority 02
MCP Security Standards and Protocol Hardening
MCP is becoming the standard protocol for AI agent tool integration, but it was designed for functionality before security. The protocol needs hardening at the specification level: mandatory authentication between MCP clients and servers, cryptographic verification of tool descriptions before loading, context integrity mechanisms to detect context pollution, and standardised permission scoping for tool access. The CoSAI framework provides a starting point. Anthropic and the MCP working group need to treat security as a protocol-level requirement for the next major MCP specification revision, not a recommendation for implementers.
🔏
Priority 03
Model Signing and Provenance Infrastructure at Ecosystem Scale
Code signing has been a software security standard for decades. Model signing — cryptographically binding a model artefact to a specific verified publisher and a specific training/fine-tuning lineage — barely exists as a widely-implemented practice. Hugging Face and other model registries need to implement verified publisher programmes, mandatory signing for model uploads, and signature verification as a default requirement for model downloads. Without this infrastructure, the 2.5 million models on Hugging Face are an effectively unsecured supply chain that every organisation consuming those models inherits.
🤖
Priority 04
Agentic AI Governance Standards — Permissions, Boundaries, Accountability
The industry is deploying autonomous agents at significant scale without agreed standards for what permissions agents should have, what actions require human approval, how agent identity should be established and verified, how agent actions should be logged for accountability, or what constitutes acceptable blast radius for a misconfigured agent. The work from Cisco (DefenseClaw), Microsoft (Entra ID for agents), and SentinelOne (Prompt AI Agent Security) provides commercial implementations — but the industry needs open, interoperable governance standards that any organisation can adopt, not proprietary frameworks from individual vendors.
🌊
Priority 05
Data Provenance Tooling — Making Lineage Visible and Verifiable
The fundamental enabler of data poisoning attacks is the absence of verifiable data provenance — organisations typically cannot trace where their training data came from, when it was collected, what preprocessing was applied, or whether any component of the collection or preprocessing pipeline was compromised. Google's AI security researchers have called for tamper-proof provenance records for datasets, equivalent to a git log for data. This requires both tooling (version-controlled, cryptographically signed dataset lineage records) and ecosystem cooperation (data providers publishing machine-readable provenance artefacts). Neither exists at scale today.
🚨
Priority 06
AI-Specific Incident Response — Playbooks, Tooling, and Exercises
The incident response industry has decades of experience with traditional security incidents: malware, data breaches, ransomware, DDoS. It has almost no experience with AI-specific incidents: training data poisoning discovery, compromised model detection, agent memory poisoning, multi-agent cascade compromise. No widely-adopted incident response framework currently addresses these scenarios. The industry needs published playbooks, tabletop exercise scenarios, and IR tooling specifically designed for AI incidents — including techniques for determining the scope of data poisoning, methods for assessing whether a model has been backdoored, and procedures for safely isolating a compromised agent without losing its context for forensic analysis.
⚖️
Priority 07
Regulatory Catch-Up on Agentic AI Autonomy and Accountability
Current AI regulation was designed for AI systems that generate outputs — text, images, recommendations — consumed by humans who make the final decision. Agentic AI systems that take autonomous actions without human review present a fundamentally different regulatory challenge: who is accountable when an AI agent causes harm at machine speed, before any human has the opportunity to intervene? The EU AI Act's high-risk category framework and the US Executive Order on AI both partially address this, but neither provides clear accountability frameworks for agentic AI decisions. As agentic AI deployment accelerates into financial services, healthcare, critical infrastructure, and legal workflows, this regulatory gap becomes an existential liability for the organisations deploying these systems.
🎯
The Strategic Summary for Security and GRC Leaders
The AI supply chain is an eight-layer dependency graph in which a failure at any layer propagates to every layer above it. The attack patterns of 2025–2026 have demonstrated that sophisticated threat actors understand this architecture and design attacks to cascade through it deliberately. Securing the AI supply chain requires treating every layer — not just the application layer — as a primary security domain: inventory, verify, control, and monitor every component from training data to deployed agent. Current compliance frameworks provide a governance foundation but are technically insufficient against the threat patterns now in production. Organisations that implement the controls described in this guide, maintain living AI BOMs, and actively track the evolution of agentic AI security will be in a fundamentally stronger position as both the threat landscape and the regulatory environment continue to evolve.
🍪 We use cookies & similar technologies
We use essential cookies for site functionality and optional analytics cookies to understand how visitors use this site. You can accept all, customise your preferences, or reject non-essential cookies.
Privacy Policy
Search junodavidk.com
🔍
No results found. Try different keywords.
Type at least 2 characters to search · Press Esc to close