LLM10: Model Theft

Verified by Precogs Threat Research

LLM10:2025MEDIUMCWE-200 CWE-284 CWE-522

Model theft encompasses the unauthorized extraction, cloning, or reconstruction of proprietary LLM weights and capabilities. This includes model extraction attacks (querying the API systematically to reconstruct the model), stealing model files from insecure storage, reverse-engineering fine-tuning data, and side-channel attacks on inference servers. Model theft eliminates the competitive advantage of proprietary AI and enables attackers to study the model offline for vulnerabilities.

Model Extraction Attacks

In a model extraction attack, an adversary makes thousands of queries to a model API and uses the input-output pairs to train a clone model. Research has shown that with sufficient queries, attackers can create a distilled version of a proprietary model that reproduces 80-95% of its behavior. The cost of extraction is a fraction of the cost of training, making this a significant IP theft vector.

Insecure Model Storage

Model weights are often stored insecurely: public S3 buckets, unencrypted model registries, or embedded in container images pushed to public registries. A single misconfigured IAM policy can expose months of training work and millions of dollars in compute investment. Model weights should be treated with the same security as source code — or more, given the investment they represent.

Value of Stolen Models

A stolen model is valuable in multiple ways: (1) Direct commercial use — deploying the stolen model as a competing service. (2) Fine-tuning — using the stolen model as a base for specialized downstream tasks. (3) Vulnerability research — studying the model offline to find prompt injection techniques, biases, and safety bypasses that can be exploited against the original service.

Attack Examples & Code Patterns

Model weights exposed in container image

Proprietary model weights accidentally included in a Docker image:

# ❌ VULNERABLE — model weights baked into Docker image
FROM python:3.11
COPY ./model_weights/ /app/models/    # 15GB proprietary weights
COPY ./app/ /app/
CMD ["python", "serve.py"]
# Anyone who pulls this image gets the model weights

# ✅ SAFE — download at runtime with authentication
FROM python:3.11
COPY ./app/ /app/
# Model weights fetched securely at startup
ENV MODEL_REGISTRY="s3://private-models/production/"
CMD ["python", "serve.py"]
# serve.py downloads model with IAM role, not baked in

API-based model extraction

Systematic querying to reconstruct a model:

# Model extraction attack — querying API to collect training data
import openai

# Generate diverse input-output pairs
pairs = []
for category in CATEGORIES:
    for template in TEMPLATES:
        prompt = template.format(category=category)
        response = target_api.complete(prompt)
        pairs.append({"input": prompt, "output": response})

# Use collected pairs to fine-tune a clone model
# clone_model = train(base_model, pairs)  # 80%+ accuracy clone

# ✅ DETECTION: Monitor for unusual API patterns
# - Same API key making >10K requests/day
# - Systematic prompts across categories
# - Low user engagement (no follow-ups, just queries)

Detection Checklist

Verify model weights are not included in container images
Check model storage (S3, GCS, HuggingFace) access controls
Implement rate limiting on model API inference endpoints
Monitor for systematic API query patterns (extraction attempts)
Encrypt model weights at rest and in transit
Apply model watermarking for theft detection

Mitigation Strategy

Implement rate limiting and query monitoring on model APIs. Use watermarking techniques in model outputs. Encrypt model weights at rest and in transit. Apply access controls to model artifact storage. Monitor for unusual API query patterns indicative of extraction attempts.

How are proprietary AI models stolen?

Model theft occurs through API-based extraction (systematically querying to clone the model), insecure storage (model weights in public S3 buckets or Docker images), and insider access. A stolen model eliminates competitive advantage and lets attackers find exploits offline. Prevention requires rate limiting, access controls, encryption, and watermarking.

Protect Against LLM10: Model Theft

Precogs AI automatically detects llm10: model theft vulnerabilities and generates AutoFix PRs.

Start Free Scan Book a demo