Table of Content

Precogs Priority Overview: Intelligence for PII & Secret Detection

AI Security

Rajnish SharmaUpdated on 21th Jan, 2026

Precogs Priority Overview: Intelligence for PII & Secret Detection

99.2% precision. 98.3% recall. 0.002s per KB. Zero configuration.

Precogs Priority is powered by Adaptive Intelligence—a precision-engineered system that outperforms traditional tools by intelligently combining pattern matching with context-aware machine learning. Stop choosing between speed and accuracy. Secure your production with both.

Precogs Priority eliminates this trade-off with Adaptive Intelligence, a multi-layer detection architecture that dynamically selects the optimal strategy for every content type.

Metric	Precogs Priority	Industry Average
Precision	99.2%	75-85%
Recall	98.3%	80-90%
Speed	0.002s/KB	0.5-2s/KB
False Positive Rate	1-3%	10-25%

State of the Art: Where We Fit

Competitive Landscape

Tool	Approach	Precision	Recall	Speed	PII	Secrets
Precogs Priority	Adaptive Intelligence	99.2%	98.3%	0.002s	✅	✅
TruffleHog v3	Patterns + Verification	95%	88%	0.05s	❌	✅
Gitleaks	Patterns	92%	85%	0.01s	❌	✅
Microsoft Presidio	ML (spaCy)	85%	92%	0.5s	✅	⚠️
AWS Macie	ML + Patterns	90%	90%	N/A	✅	⚠️
GitGuardian	Patterns + ML	94%	90%	SaaS	⚠️	✅

Research Foundation

Our approach builds on peer-reviewed research:

Adaptive Detection: Studies show multi-layer detection achieves 17% higher F1-score than pure ML (arXiv:2510.07551)
Context-Aware Filtering: Reduces false positives by 60-80% vs pattern-only (Nature 2025)
Entropy Thresholds: Optimized Shannon entropy cutoffs for secret detection with minimal noise

Overview

Our platform uniquely integrates three core technologies—Instant Pattern Recognition, Context-Aware Machine Learning, and High-Entropy Analysis—into a unified pipeline that achieves industry-leading precision (99.2%) and recall (98.3%).

Unlike single-method tools that sacrifice accuracy for speed or vice versa, Precogs Priority dynamically selects the optimal detection strategy based on content type, file format, and organizational requirements.

The Challenge

Modern organizations face an exponentially growing attack surface for sensitive data exposure:

Challenge	Impact
Credential Leaks	80% of breaches involve compromised credentials
PII Exposure	Average GDPR fine: €2.4M; HIPAA: $1.5M
False Positives	Security teams spend 25% of time on false alerts
Diverse Formats	Code, configs, documents, logs, images—all need scanning
Speed vs Accuracy	Traditional tools force a trade-off

Precogs Priority solves these challenges with an intelligent, adaptive detection architecture.

Precogs Adaptive Intelligence: How it Works

Layer 1: Instant Pattern Discovery

Our pattern layer provides the foundation for fast, accurate detection of structured data.

Core Pattern Library (50+ Types)

PII Patterns:Personal identifiers: Names, emails, phone numbers (20+ country formats)Government IDs: SSN, passport, driver's license, UK NINO, EU national IDsFinancial: Credit cards (with Luhn validation), bank accounts, IBAN, SWIFTHealthcare: Patient IDs, medical record numbers, insurance identifiersTechnical: IP addresses (v4/v6), MAC addresses, device IDs
Personal identifiers: Names, emails, phone numbers (20+ country formats)
Government IDs: SSN, passport, driver's license, UK NINO, EU national IDs
Financial: Credit cards (with Luhn validation), bank accounts, IBAN, SWIFT
Healthcare: Patient IDs, medical record numbers, insurance identifiers
Technical: IP addresses (v4/v6), MAC addresses, device IDs
Secret Patterns:Cloud credentials: AWS, GCP, Azure (access keys, service accounts)AI/ML platforms: OpenAI, Anthropic, HuggingFace, ReplicateVersion control: GitHub, GitLab, Bitbucket tokensPayment systems: Stripe, Square, PayPal, BraintreeCommunication: Slack, Discord, Telegram, TwilioInfrastructure: Database URLs, JWTs, private keys, SSH keys
Cloud credentials: AWS, GCP, Azure (access keys, service accounts)
AI/ML platforms: OpenAI, Anthropic, HuggingFace, Replicate
Version control: GitHub, GitLab, Bitbucket tokens
Payment systems: Stripe, Square, PayPal, Braintree
Communication: Slack, Discord, Telegram, Twilio
Infrastructure: Database URLs, JWTs, private keys, SSH keys

Extended Pattern Database (761+ Patterns)

Our pattern database covers edge cases and emerging credential formats across:

100+ cloud services
50+ SaaS platforms
Regional variations and legacy formats
Custom enterprise patterns

Key Features

Feature	Description
Format Validation	Luhn checksum for credit cards, phone number parsing
International Support	Phone numbers in 20+ country formats
JSON/YAML Aware	Correctly parses secrets in config file formats
Placeholder Filtering	Ignores "YOUR_API_KEY", "changeme", demo values

Layer 2: Intelligent Context Validation

For unstructured text where patterns alone are insufficient, our ML layer provides context-aware detection.

Transformer-Based NER

We employ state-of-the-art transformer models trained on large-scale datasets for named entity recognition (NER). Our ML approach:

Understands context: "Contact John at..." → John is a person name
Handles variations: Nicknames, misspellings, unconventional formats
Multi-language support: Recognizes entities across languages
Multi-model selection: Automatically routes code to StarPII and documentation to Piiranha for optimal results
Adaptive thresholds: Per-entity-type confidence tuning

ML Detection Modes

Mode	Use Case	Speed	Accuracy
Disabled	Real-time scanning, structured data	0.002s/KB	99.2% precision
Enabled	Batch processing, documents, emails	0.1s/KB	+16.7% recall

What ML Adds

┌────────────────────────────────────────────────────────────────┐
│                    ML DETECTION ADVANTAGES                      │
├────────────────────────────────────────────────────────────────┤
│  ✅ Names in prose: "Please forward to John Smith..."          │
│  ✅ Addresses in text: "Located at 123 Main Street, Suite 5"   │
│  ✅ Emails with typos: "john dot smith at company dot com"     │
│  ✅ Context-aware secrets: Variable names indicating keys      │
│  ✅ Non-standard formats: Obfuscated or encoded data           │
└────────────────────────────────────────────────────────────────┘

Layer 3: High-Entropy Secret Detection

High-entropy strings often indicate randomly-generated secrets that don't match known patterns.

Shannon Entropy Detection

Our entropy analyzer calculates the randomness of strings to identify:

API keys with non-standard formats
Randomly generated passwords
Encrypted tokens
Base64-encoded secrets

Context-Aware Filtering

Not all high-entropy strings are secrets. Our analyzer filters:

Filtered	Reason
Base64 image data	data:image/png;base64,...
SVG path coordinates	M 10 20 L 30 40
CSS color codes	#ff5500, rgba(255,0,0,0.5)
Version strings	1.2.3.4, v2.0.0-beta
UUIDs in expected contexts	Logging, tracing

Layer 4: PrecisionShift™ Fusion & Validation

The final layer ensures high precision by validating and deduplicating findings.

False Positive Filtering (70+ Rules)

We maintain extensive filters for common false positives:

Name Filters:Job titles: "Admin", "Manager", "Director"Department labels: "Patient Services", "Customer Support"Documentation terms: "Example User", "Test Account"Geographic names: City names, street types
Job titles: "Admin", "Manager", "Director"
Department labels: "Patient Services", "Customer Support"
Documentation terms: "Example User", "Test Account"
Geographic names: City names, street types
Date/Time Filters:Log timestamps: 2024-01-15 10:30:00ISO dates in code: datetime.now()Version numbers: 1.2.3.4
Log timestamps: 2024-01-15 10:30:00
ISO dates in code: datetime.now()
Version numbers: 1.2.3.4
Technical Filters:IP-like version stringsSVG coordinates and transformsCSS values and properties
IP-like version strings
SVG coordinates and transforms
CSS values and properties

Intelligent Deduplication

When multiple detection layers find the same data:

Priority Order:
1. ML Detection (highest - context-aware)
2. Pattern Detection (high - precise format matching)
3. Entropy Detection (medium - catches unknowns)

Resolution:
- Same span, same type → Keep highest confidence
- Overlapping spans → Prefer more specific type
- Complementary detections → Merge and enhance

Detection Capabilities

PII Detection (28+ Types)

Category	Types
Personal	Name, Email, Phone, Address, Date of Birth
Government	SSN, Passport, Driver's License, National IDs
Financial	Credit Card, Bank Account, IBAN, Bitcoin
Healthcare	Patient ID, MRN, Insurance ID
Automotive	VIN, IMEI, ICCID, IMSI, EID, License Plates
Technical	IP Address, MAC Address, Device ID, Bluetooth ID

Secret Detection (50+ Types)

Category	Types
Cloud	AWS, GCP, Azure credentials
AI/ML	OpenAI, Anthropic, HuggingFace
DevOps	GitHub, GitLab, Docker, Kubernetes
Payment	Stripe, Square, PayPal
Communication	Slack, Discord, Twilio, SendGrid
Database	Connection strings, passwords
Crypto	Private keys, JWTs, SSH keys

Compliance Framework Integration

Precogs Priority automatically maps findings to regulatory requirements:

Supported Frameworks

Framework	Coverage
GDPR	EU personal data protection
HIPAA	US healthcare (18 PHI identifiers)
PCI-DSS	Payment card industry
SOX	Financial system controls
CCPA	California consumer privacy
FERPA	Education records
GLBA	Financial privacy

Automated Mapping

JSON

{
  "finding": {
    "type": "CREDIT_CARD",
    "value": "4111-****-****-1111",
    "file": "payment_log.csv"
  },
  "compliance": {
    "pci_dss": {
      "applicable": true,
      "requirement": "3.4 - Render PAN unreadable",
      "action": "Tokenize or encrypt card data"
    },
    "gdpr": {
      "applicable": true,
      "category": "Financial data",
      "action": "Ensure lawful basis for processing"
    }
  }
}

Enterprise Features

Credential Enrichment

For detected cloud credentials, our enterprise module provides:

Feature	Description
Identity Lookup	Who owns this credential?
Permission Analysis	What can it access?
Status Verification	Is it active or revoked?
Risk Scoring	0-100 score with CRITICAL/HIGH/MEDIUM/LOW levels
Remediation Guidance	Specific steps to resolve

Risk Scoring Factors

┌────────────────────────────────────────────────────────────────┐
│                      RISK CALCULATION                          │
├────────────────────────────────────────────────────────────────┤
│  Base Score: Type-specific (AWS = 90, API key = 60)           │
│  + Active Status: +20 if verified active                       │
│  + Admin Access: +30 if elevated permissions                   │
│  + Production Env: +20 if production indicators                │
│  + File Location: +15 if in .env, .git, or config             │
│  - Development Env: -10 if dev/test indicators                │
│  - MFA Enabled: -10 if multi-factor auth present              │
│  ─────────────────────────────────────────────────────────────│
│  Final Score: 0-100 → Risk Level (CRITICAL/HIGH/MEDIUM/LOW)   │
└────────────────────────────────────────────────────────────────┘

Performance

Benchmarks

Metric	Value
Precision	99.2% (pattern mode), 95%+ (ML mode)
Recall	98.3% (pattern mode), 99%+ (ML mode)
Speed (Pattern)	0.002s per KB
Speed (ML)	0.1s per KB
Large Repo (10K files)	25s (pattern), 20min (ML)

Accuracy by Data Type

Data Type	Precision	Recall
Structured forms	99.5%	99.0%
Email content	97.8%	96.5%
Medical records	98.5%	98.0%
Source code	99.0%	98.5%
Config files	99.5%	99.5%

Deployment Options

Web Application: Interactive scanning with real-time results, visualization, and export.
Command Line Interface: Batch processing for CI/CD integration and automation.
API Integration: RESTful endpoints for custom application integration.
Cloud Deployment: AWS, GCP, Azure with auto-scaling and high availability.

Why Precogs Priority?

Differentiator	Benefit
Adaptive Intelligence	Multi-layer protection without the performance tax
Enterprise Context	Zero-noise results that matter to your business
Enterprise Ready	Risk scoring, compliance mapping, remediation
Fast by Default	Pattern mode for real-time, ML for batch
International	20+ phone formats, multi-language names
Medical PII	HIPAA-specific identifiers
AI/ML Coverage	OpenAI, Anthropic, emerging AI platforms

Getting Started with Precogs Priority

Precogs Priority is a fully managed SaaS platform. You can start securing your repositories in three simple steps:

Visit Precogs.ai: Explore our detection capabilities and enterprise features.
Log in to the Precogs App: Securely sign in with your enterprise identity provider.
Connect & Scan: Connect your GitHub, GitLab, or Bitbucket repositories. Our intelligence engine will automatically begin scanning your code and history for sensitive data.

Deployment Options:

For organizations with strict data residency requirements, we also offer Private Cloud and On-Premise deployments. Contact our team for more information.

Summary

Precogs Priority is the standard for next-generation data protection:

✅ Adaptive Intelligence Engine for unmatched precision and recall

✅ Context-aware validation that understands file types and content structure

✅ Enterprise-grade risk scoring, compliance mapping, and remediation guidance

✅ Production-ready with 99.2% precision and 98.3% recall

✅ Flexible deployment via web UI, CLI, API, or cloud infrastructure

Whether you're securing source code, processing documents, or maintaining compliance, Precogs Priority provides the accuracy, speed, and intelligence your security program demands.

Rajnish Sharma

Stay Audit-Ready, Always

Explore the AI + Logic engine behind Precogs AI

Get started for free