v1.0 Effective: 2026-04-06 · Classification: Public

VerityHelm Methodology

VerityHelm provides adversarial verification of compliance attestations using publicly available signals. We produce findings, not opinions. This document discloses the complete methodology used to generate findings reports, enabling any party to understand, reproduce, or challenge our results.

VerityHelm does not perform audits, issue attestations, or provide legal opinions. We cross-reference vendor compliance claims against publicly observable signals and report factual contradictions, gaps, and observations.

1. Data Sources

VerityHelm v1.0 queries the following public signal sources. Each source is documented with its access method, limitations, and reliability assessment.

Tier 1 Sources (API-accessible, high reliability)

Source What It Provides Access Method Freshness
SEC/EDGAR Public company filings: 10-K risk factors, 8-K cybersecurity incidents, auditor changes REST API (JSON), free, 10 req/sec Same-day
GitHub Security Advisories Open-source vulnerability database with CVEs and severity scores REST API, CC BY 4.0 licensed Near-real-time
Certificate Transparency (crt.sh) All TLS/SSL certificates issued for a domain — subdomain enumeration, infrastructure mapping Web/JSON/PostgreSQL, free Near-real-time
USPTO Trademarks Trademark registration status, filing history, related entities REST API with free API key, 60 req/min Daily
PCAOB AuditorSearch Registered audit firms, engagement history, inspection reports Bulk CSV download (daily updates), free Daily
Vendor Trust Pages SOC 3 reports, ISO 27001 certificates, public compliance claims Web scrape of vendor websites Varies (annual cycle)

Tier 2 Sources (requires subscription or careful access)

Source What It Provides Access Method Freshness
NASBA CPAverify CPA license verification across 55 U.S. jurisdictions Web interface, individual lookups Varies by state
AICPA Peer Review Audit firm quality oversight enrollment and results Web interface, individual lookups 1–3 year review cycles
HaveIBeenPwned Domain breach exposure history REST API, paid subscription for domain search As breaches are verified
UKAS/ANAB Directories ISO 27001 certifier accreditation status Web search As accreditations change
Court Records (PACER/RECAP) Federal litigation history PACER (paid, $0.10/page) and RECAP (free archive) Same-day (PACER)

Tier 3 Sources (supplementary, used with caution)

Source What It Provides Access Method Limitations
DNS/Subdomain History Historical DNS records, infrastructure changes SecurityTrails API (paid) or DNSdumpster Limited free tier
State Corporation Filings Entity registration, good standing, registered agent Per-state web interfaces (fragmented) No unified API; bot protection
Job Posting History Security team maturity, technology stack signals Wayback Machine CDX API (free) Indirect signal; coverage gaps

Sources NOT Used

  • Paste sites (Pastebin, etc.): Corroborative only, never primary. We do not download or store credential content. If paste content contains PII, it is skipped entirely.
  • Social media (Twitter/X, LinkedIn posts): Not used as primary signals due to unreliability. May be used to corroborate findings from authoritative sources.
  • Confidential SOC 2 Type II reports: We do not access, request, or process confidential audit reports in v1.0. Engine inputs are limited to public signals (SOC 3 summaries, trust pages, public attestation claims).

2. Collection Method

2.1 Signal Collection

Each data source is queried using deterministic scripts (not AI/LLM agents). The collection process:

  1. Vendor identification: Vendor name → domain resolution (common patterns: vendor.com, vendor.io, etc.)
  2. Parallel signal collection: Each source is queried independently with the vendor's domain or name as input
  3. Structured output: Each source produces a JSON intermediate file with source name, URL, query timestamp (UTC), raw data retrieved, and error state
  4. Rate limit compliance: All queries respect published rate limits and robots.txt directives

2.2 Rate Limits Observed

Source Published Limit Our Policy
SEC/EDGAR10 req/secMax 5 req/sec (50% of limit)
GitHub Advisories60 req/hr (unauth)Max 30 req/hr
crt.shFair useMax 1 req/5 sec
USPTO60 req/minMax 30 req/min
CourtListenerPublished limitsRespect published limits
All othersFair useMinimum 2 sec between requests

2.3 Error Handling

If a source is unavailable:

  • The finding report notes the source as "unavailable at query time"
  • No findings are generated from unavailable sources
  • The overall analysis continues with available sources
  • The "Signal Freshness" section of the report reflects which sources returned data

3. Claim Extraction

3.1 Sources of Claims

Vendor compliance claims are extracted from:

  1. Trust pages: Vendor-operated security/trust/compliance pages
  2. SOC 3 reports: Publicly distributed audit summaries
  3. Company websites: Security sections, compliance sections
  4. Third-party trust centers: Vanta, SafeBase, Whistic, Drata hosted trust portals

3.2 Extraction Method

Claims are extracted using deterministic pattern matching:

  1. Certification identification: Regex patterns match certifications (SOC 2, ISO 27001, HIPAA, GDPR, PCI DSS, FedRAMP, CSA STAR, CCPA, SOX)
  2. Audit firm identification: Regex patterns match known audit firm names and common phrases ("audited by," "examined by," "certified by")
  3. Security claims: Pattern matching extracts statements about encryption, monitoring, testing, and other security practices
  4. No AI interpretation at this step: All extraction is regex-based. The patterns are versioned with this methodology document.

3.3 What We Do NOT Extract

  • We do not extract claims from confidential SOC 2 Type II reports
  • We do not extract claims from NDA-gated trust portals (we only access publicly visible content)
  • We do not infer claims that are not explicitly stated on vendor pages

4. Cross-Reference Logic

4.1 Signal-to-Claim Matching

Cross-referencing uses deterministic rules that compare public signals against extracted claims:

Claim Type Signal Source Cross-Reference Rule
"Zero security incidents" GitHub Advisories, HaveIBeenPwned, SEC 8-K If high/critical advisories or breach records exist during the claimed audit period, flag as contradiction
"Continuous monitoring" CT Logs (subdomain count) If subdomain count exceeds 50, flag as gap — question whether monitoring covers all infrastructure
Certification claims AICPA Peer Review, CPAverify Verify audit firm enrollment in peer review program; verify CPA signatory license status
ISO 27001 claims UKAS/ANAB Directories Verify certifying body is accredited

4.2 Rule Types

  1. Contradiction: A public signal directly contradicts a vendor claim. Example: vendor claims zero incidents, but HaveIBeenPwned shows their domain in a breach database during the audit period.
  2. Gap: A public signal raises a question that the vendor claim does not address. Example: 200 subdomains discovered but monitoring claims don't specify scope.
  3. Observation: A public signal is notable but does not directly contradict or gap a specific claim. Example: SEC 8-K filings exist that may contain cybersecurity incident disclosures.

4.3 Temporal Matching

All cross-references are time-aware:

  • Signals are matched to the vendor's most recent audit period (if identifiable from SOC 3 or trust page)
  • If the audit period is not identifiable, signals from the most recent 12 months are used
  • The report notes when temporal alignment could not be verified

5. Contradiction Detection

5.1 What Constitutes a Contradiction

A finding is classified as a contradiction when ALL of the following are true:

  1. The vendor makes a specific, verifiable claim (e.g., "zero security incidents in the audit period")
  2. A public signal from an authoritative source directly conflicts with that claim (e.g., HIBP shows a breach record for the vendor's domain during the same period)
  3. The conflict is unambiguous — there is no reasonable interpretation that reconciles both the claim and the signal

5.2 What Constitutes a Gap

A finding is classified as a gap when:

  1. The vendor makes a broad claim (e.g., "continuous monitoring")
  2. Public signals suggest the claim may be incomplete but do not directly contradict it (e.g., large infrastructure footprint that may exceed monitoring coverage)

5.3 What Does NOT Constitute a Finding

  • A vendor not having a trust page (absence of evidence is not evidence of absence)
  • A vendor using a compliance automation platform (this is standard practice)
  • A vendor's audit firm not being in our "known" list (we flag for investigation, not as a finding)
  • Public signals that are ambiguous or could have multiple interpretations

5.4 False Positive Expectations

VerityHelm v1.0 is calibrated for low false positive rate at the cost of higher false negative rate. We prefer to miss findings rather than report incorrect ones. Expected rates:

  • False positive rate: <5% of reported findings
  • False negative rate: ~40–60% of actual issues (many compliance issues are not detectable from public signals)

6. Known Limitations

6.1 Coverage Limitations

  • Private companies: Limited SEC/EDGAR data. Analysis primarily relies on trust pages, CT logs, GitHub, and court records.
  • Non-US companies: NASBA CPAverify, PACER, and state filings are US-only. International coverage requires different signal sources.
  • Small/early-stage companies: May have minimal public signal footprint. Analysis may return few or no findings.
  • Vendors without trust pages: If no public compliance claims are found, cross-referencing is not possible.

6.2 Methodology Limitations

  • No access to confidential reports: SOC 2 Type II reports are not used in v1.0. This means we cannot verify specific control descriptions or test procedures.
  • Deterministic pattern matching: Regex-based claim extraction may miss non-standard phrasings. Complex or nuanced claims may not be extracted.
  • Temporal alignment: Audit period dates are not always publicly available, limiting precision of temporal cross-referencing.
  • Auditor quality assessment is indirect: We can verify peer review enrollment and CPA license status, but we cannot assess the quality of the audit work itself from public signals.

6.3 Categories of Vendors Poorly Served

  1. Private companies with minimal web presence
  2. Companies operating primarily outside the US
  3. Companies that do not publish any compliance information publicly
  4. Infrastructure-level vendors (IaaS, PaaS) whose compliance posture is documented in separate compliance portals with different URL patterns

6.4 What the Methodology Cannot Detect

  • Fabricated evidence within confidential audit reports (requires access to the report)
  • Auditor capture or independence issues (requires insight into auditor-client relationship economics)
  • Internal compliance program effectiveness (requires internal access)
  • Social engineering susceptibility (requires active testing, which we do not perform)
  • Accuracy of specific technical controls (requires technical assessment, which we do not perform)

7. Version History

Version Date Changes Backward Compatible
v1.0 2026-04-06 Initial release. 14 public signal sources. Deterministic pipeline. No scoring — findings only. N/A (initial)

Planned for v1.1

  • Additional signal sources (paste-site corroborative signals, WHOIS history)
  • Improved temporal matching with audit period extraction from SOC 3 PDFs
  • Expanded audit firm database

Planned for v2.0 (post-legal review)

  • Optional Defensibility Score (0–100, weighted composite of findings)
  • SOC 2 Type II report metadata ingestion (control descriptions, audit firm, dates — not full report)
  • Continuous monitoring mode (weekly signal refresh)

8. Pipeline Architecture

INPUT: Vendor Name │ ├─ Step 1: Vendor Profile Assembly │ └─ Queries: SEC/EDGAR, CT Logs, GitHub Advisories, │ PCAOB, USPTO, Wayback Machine, CourtListener │ └─ Output: 01-vendor-profile.json │ ├─ Step 2: Claim Extraction │ └─ Scans: Trust pages, third-party trust centers, │ SOC 3 download URLs │ └─ Extracts: Certifications, audit firm, security claims │ └─ Output: 02-claims.json │ ├─ Step 3: Cross-Reference Analysis │ └─ Matches: Public signals against extracted claims │ └─ Classifies: Contradiction / Gap / Observation │ └─ Output: 03-cross-references.json │ ├─ Step 4: Fraud Pattern Match │ └─ Checks: Auditor legitimacy, certification speed, │ infrastructure scope, breach history │ └─ Output: 04-pattern-matches.json │ └─ Step 5: Report Generation └─ Assembles: All findings into structured report └─ Includes: Subject, methodology disclosure, findings, │ questions, signal freshness, summary └─ Output: findings-report.md NOTE: Steps 1–4 are fully deterministic (scripts, regex, API queries). Step 5 assembles structured data into report format. LLM interpretation is available as an optional enhancement in future versions.

This methodology document is versioned and published at verityhelm.com/methodology. Any changes result in a version increment documented in the Version History section. Findings reports reference the specific methodology version under which they were produced.