Precogs Priority Overview: Intelligence for PII & Secret Detection
AI Security
Precogs Priority: State of the Art PII & Secret Detection
99.2% precision. 98.3% recall. 0.002s per KB. Zero configuration.
Precogs Priority is powered by Adaptive Intelligence—a precision-engineered system that outperforms traditional tools by intelligently combining pattern matching with context-aware machine learning. Stop choosing between speed and accuracy. Secure your production with both.
Precogs Priority eliminates this trade-off with Adaptive Intelligence, a multi-layer detection architecture that dynamically selects the optimal strategy for every content type.
| Metric | Precogs Priority | Industry Average |
| Precision | 99.2% | 75-85% |
| Recall | 98.3% | 80-90% |
| Speed | 0.002s/KB | 0.5-2s/KB |
| False Positive Rate | 1-3% | 10-25% |
State of the Art: Where We Fit

Competitive Landscape
| Tool | Approach | Precision | Recall | Speed | PII | Secrets |
| Precogs Priority | Adaptive Intelligence | 99.2% | 98.3% | 0.002s | ✅ | ✅ |
| TruffleHog v3 | Patterns + Verification | 95% | 88% | 0.05s | ❌ | ✅ |
| Gitleaks | Patterns | 92% | 85% | 0.01s | ❌ | ✅ |
| Microsoft Presidio | ML (spaCy) | 85% | 92% | 0.5s | ✅ | ⚠️ |
| AWS Macie | ML + Patterns | 90% | 90% | N/A | ✅ | ⚠️ |
| GitGuardian | Patterns + ML | 94% | 90% | SaaS | ⚠️ | ✅ |
Research Foundation
Our approach builds on peer-reviewed research:
- Adaptive Detection: Studies show multi-layer detection achieves 17% higher F1-score than pure ML (arXiv:2510.07551)
- Context-Aware Filtering: Reduces false positives by 60-80% vs pattern-only (Nature 2025)
- Entropy Thresholds: Optimized Shannon entropy cutoffs for secret detection with minimal noise
Overview
Our platform uniquely integrates three core technologies—Instant Pattern Recognition, Context-Aware Machine Learning, and High-Entropy Analysis—into a unified pipeline that achieves industry-leading precision (99.2%) and recall (98.3%).
Unlike single-method tools that sacrifice accuracy for speed or vice versa, Precogs Priority dynamically selects the optimal detection strategy based on content type, file format, and organizational requirements.
The Challenge
Modern organizations face an exponentially growing attack surface for sensitive data exposure:
| Challenge | Impact |
| Credential Leaks | 80% of breaches involve compromised credentials |
| PII Exposure | Average GDPR fine: €2.4M; HIPAA: $1.5M |
| False Positives | Security teams spend 25% of time on false alerts |
| Diverse Formats | Code, configs, documents, logs, images—all need scanning |
| Speed vs Accuracy | Traditional tools force a trade-off |
Precogs Priority solves these challenges with an intelligent, adaptive detection architecture.

Precogs Adaptive Intelligence: How it Works
Layer 1: Instant Pattern Discovery
Our pattern layer provides the foundation for fast, accurate detection of structured data.
Core Pattern Library (50+ Types)
- PII Patterns:Personal identifiers: Names, emails, phone numbers (20+ country formats)Government IDs: SSN, passport, driver's license, UK NINO, EU national IDsFinancial: Credit cards (with Luhn validation), bank accounts, IBAN, SWIFTHealthcare: Patient IDs, medical record numbers, insurance identifiersTechnical: IP addresses (v4/v6), MAC addresses, device IDs
- Personal identifiers: Names, emails, phone numbers (20+ country formats)
- Government IDs: SSN, passport, driver's license, UK NINO, EU national IDs
- Financial: Credit cards (with Luhn validation), bank accounts, IBAN, SWIFT
- Healthcare: Patient IDs, medical record numbers, insurance identifiers
- Technical: IP addresses (v4/v6), MAC addresses, device IDs
- Secret Patterns:Cloud credentials: AWS, GCP, Azure (access keys, service accounts)AI/ML platforms: OpenAI, Anthropic, HuggingFace, ReplicateVersion control: GitHub, GitLab, Bitbucket tokensPayment systems: Stripe, Square, PayPal, BraintreeCommunication: Slack, Discord, Telegram, TwilioInfrastructure: Database URLs, JWTs, private keys, SSH keys
- Cloud credentials: AWS, GCP, Azure (access keys, service accounts)
- AI/ML platforms: OpenAI, Anthropic, HuggingFace, Replicate
- Version control: GitHub, GitLab, Bitbucket tokens
- Payment systems: Stripe, Square, PayPal, Braintree
- Communication: Slack, Discord, Telegram, Twilio
- Infrastructure: Database URLs, JWTs, private keys, SSH keys
Extended Pattern Database (761+ Patterns)
Our pattern database covers edge cases and emerging credential formats across:
- 100+ cloud services
- 50+ SaaS platforms
- Regional variations and legacy formats
- Custom enterprise patterns
Key Features
| Feature | Description |
| Format Validation | Luhn checksum for credit cards, phone number parsing |
| International Support | Phone numbers in 20+ country formats |
| JSON/YAML Aware | Correctly parses secrets in config file formats |
| Placeholder Filtering | Ignores "YOUR_API_KEY", "changeme", demo values |
Layer 2: Intelligent Context Validation
For unstructured text where patterns alone are insufficient, our ML layer provides context-aware detection.
Transformer-Based NER
We employ state-of-the-art transformer models trained on large-scale datasets for named entity recognition (NER). Our ML approach:
- Understands context: "Contact John at..." → John is a person name
- Handles variations: Nicknames, misspellings, unconventional formats
- Multi-language support: Recognizes entities across languages
- Multi-model selection: Automatically routes code to StarPII and documentation to Piiranha for optimal results
- Adaptive thresholds: Per-entity-type confidence tuning
ML Detection Modes
| Mode | Use Case | Speed | Accuracy |
| Disabled | Real-time scanning, structured data | 0.002s/KB | 99.2% precision |
| Enabled | Batch processing, documents, emails | 0.1s/KB | +16.7% recall |
What ML Adds
┌────────────────────────────────────────────────────────────────┐
│ ML DETECTION ADVANTAGES │
├────────────────────────────────────────────────────────────────┤
│ ✅ Names in prose: "Please forward to John Smith..." │
│ ✅ Addresses in text: "Located at 123 Main Street, Suite 5" │
│ ✅ Emails with typos: "john dot smith at company dot com" │
│ ✅ Context-aware secrets: Variable names indicating keys │
│ ✅ Non-standard formats: Obfuscated or encoded data │
└────────────────────────────────────────────────────────────────┘
Layer 3: High-Entropy Secret Detection
High-entropy strings often indicate randomly-generated secrets that don't match known patterns.
Shannon Entropy Detection
Our entropy analyzer calculates the randomness of strings to identify:
- API keys with non-standard formats
- Randomly generated passwords
- Encrypted tokens
- Base64-encoded secrets
Context-Aware Filtering
Not all high-entropy strings are secrets. Our analyzer filters:
| Filtered | Reason |
| Base64 image data | data:image/png;base64,... |
| SVG path coordinates | M 10 20 L 30 40 |
| CSS color codes | #ff5500, rgba(255,0,0,0.5) |
| Version strings | 1.2.3.4, v2.0.0-beta |
| UUIDs in expected contexts | Logging, tracing |
Layer 4: PrecisionShift™ Fusion & Validation
The final layer ensures high precision by validating and deduplicating findings.
False Positive Filtering (70+ Rules)
We maintain extensive filters for common false positives:
- Name Filters:Job titles: "Admin", "Manager", "Director"Department labels: "Patient Services", "Customer Support"Documentation terms: "Example User", "Test Account"Geographic names: City names, street types
- Job titles: "Admin", "Manager", "Director"
- Department labels: "Patient Services", "Customer Support"
- Documentation terms: "Example User", "Test Account"
- Geographic names: City names, street types
- Date/Time Filters:Log timestamps: 2024-01-15 10:30:00ISO dates in code: datetime.now()Version numbers: 1.2.3.4
- Log timestamps: 2024-01-15 10:30:00
- ISO dates in code: datetime.now()
- Version numbers: 1.2.3.4
- Technical Filters:IP-like version stringsSVG coordinates and transformsCSS values and properties
- IP-like version strings
- SVG coordinates and transforms
- CSS values and properties
Intelligent Deduplication
When multiple detection layers find the same data:
Priority Order:
1. ML Detection (highest - context-aware)
2. Pattern Detection (high - precise format matching)
3. Entropy Detection (medium - catches unknowns)
Resolution:
- Same span, same type → Keep highest confidence
- Overlapping spans → Prefer more specific type
- Complementary detections → Merge and enhance
Detection Capabilities
PII Detection (28+ Types)
| Category | Types |
| Personal | Name, Email, Phone, Address, Date of Birth |
| Government | SSN, Passport, Driver's License, National IDs |
| Financial | Credit Card, Bank Account, IBAN, Bitcoin |
| Healthcare | Patient ID, MRN, Insurance ID |
| Automotive | VIN, IMEI, ICCID, IMSI, EID, License Plates |
| Technical | IP Address, MAC Address, Device ID, Bluetooth ID |
Secret Detection (50+ Types)
| Category | Types |
| Cloud | AWS, GCP, Azure credentials |
| AI/ML | OpenAI, Anthropic, HuggingFace |
| DevOps | GitHub, GitLab, Docker, Kubernetes |
| Payment | Stripe, Square, PayPal |
| Communication | Slack, Discord, Twilio, SendGrid |
| Database | Connection strings, passwords |
| Crypto | Private keys, JWTs, SSH keys |
Compliance Framework Integration
Precogs Priority automatically maps findings to regulatory requirements:
Supported Frameworks
| Framework | Coverage |
| GDPR | EU personal data protection |
| HIPAA | US healthcare (18 PHI identifiers) |
| PCI-DSS | Payment card industry |
| SOX | Financial system controls |
| CCPA | California consumer privacy |
| FERPA | Education records |
| GLBA | Financial privacy |
Automated Mapping
JSON
{
"finding": {
"type": "CREDIT_CARD",
"value": "4111-****-****-1111",
"file": "payment_log.csv"
},
"compliance": {
"pci_dss": {
"applicable": true,
"requirement": "3.4 - Render PAN unreadable",
"action": "Tokenize or encrypt card data"
},
"gdpr": {
"applicable": true,
"category": "Financial data",
"action": "Ensure lawful basis for processing"
}
}
}
Enterprise Features
Credential Enrichment
For detected cloud credentials, our enterprise module provides:
| Feature | Description |
| Identity Lookup | Who owns this credential? |
| Permission Analysis | What can it access? |
| Status Verification | Is it active or revoked? |
| Risk Scoring | 0-100 score with CRITICAL/HIGH/MEDIUM/LOW levels |
| Remediation Guidance | Specific steps to resolve |
Risk Scoring Factors
┌────────────────────────────────────────────────────────────────┐
│ RISK CALCULATION │
├────────────────────────────────────────────────────────────────┤
│ Base Score: Type-specific (AWS = 90, API key = 60) │
│ + Active Status: +20 if verified active │
│ + Admin Access: +30 if elevated permissions │
│ + Production Env: +20 if production indicators │
│ + File Location: +15 if in .env, .git, or config │
│ - Development Env: -10 if dev/test indicators │
│ - MFA Enabled: -10 if multi-factor auth present │
│ ─────────────────────────────────────────────────────────────│
│ Final Score: 0-100 → Risk Level (CRITICAL/HIGH/MEDIUM/LOW) │
└────────────────────────────────────────────────────────────────┘
Performance
Benchmarks
| Metric | Value |
| Precision | 99.2% (pattern mode), 95%+ (ML mode) |
| Recall | 98.3% (pattern mode), 99%+ (ML mode) |
| Speed (Pattern) | 0.002s per KB |
| Speed (ML) | 0.1s per KB |
| Large Repo (10K files) | 25s (pattern), 20min (ML) |
Accuracy by Data Type
| Data Type | Precision | Recall |
| Structured forms | 99.5% | 99.0% |
| Email content | 97.8% | 96.5% |
| Medical records | 98.5% | 98.0% |
| Source code | 99.0% | 98.5% |
| Config files | 99.5% | 99.5% |
Deployment Options
- Web Application: Interactive scanning with real-time results, visualization, and export.
- Command Line Interface: Batch processing for CI/CD integration and automation.
- API Integration: RESTful endpoints for custom application integration.
- Cloud Deployment: AWS, GCP, Azure with auto-scaling and high availability.
Why Precogs Priority?
| Differentiator | Benefit |
| Adaptive Intelligence | Multi-layer protection without the performance tax |
| Enterprise Context | Zero-noise results that matter to your business |
| Enterprise Ready | Risk scoring, compliance mapping, remediation |
| Fast by Default | Pattern mode for real-time, ML for batch |
| International | 20+ phone formats, multi-language names |
| Medical PII | HIPAA-specific identifiers |
| AI/ML Coverage | OpenAI, Anthropic, emerging AI platforms |
Getting Started with Precogs Priority
Precogs Priority is a fully managed SaaS platform. You can start securing your repositories in three simple steps:
- Visit Precogs.ai: Explore our detection capabilities and enterprise features.
- Log in to the Precogs App: Securely sign in with your enterprise identity provider.
- Connect & Scan: Connect your GitHub, GitLab, or Bitbucket repositories. Our intelligence engine will automatically begin scanning your code and history for sensitive data.
Deployment Options:
For organizations with strict data residency requirements, we also offer Private Cloud and On-Premise deployments. Contact our team for more information.
📊 Summary
Precogs Priority is the standard for next-generation data protection:
✅ Adaptive Intelligence Engine for unmatched precision and recall
✅ Context-aware validation that understands file types and content structure
✅ Enterprise-grade risk scoring, compliance mapping, and remediation guidance
✅ Production-ready with 99.2% precision and 98.3% recall
✅ Flexible deployment via web UI, CLI, API, or cloud infrastructure
Whether you're securing source code, processing documents, or maintaining compliance, Precogs Priority provides the accuracy, speed, and intelligence your security program demands.
