ARIA — HuntJacq Labs · Investor & Technical Briefing

Architectural position

ARIA sits above the SIEM layer. SIEMs detect events. ARIA decides what they mean.

Multi-alert correlation

Single-event pattern matching is the SIEM's job. ARIA's job is the cross-alert sequence: alert A + B + C within window W on same host = breach. Higher-fidelity than any single source can produce.

Validate against tenant baseline

Is this normal-noise for this tenant or genuinely new? Per-tenant FP rates, UEBA baselines, asset criticality, deterministic rules library — all feed the verdict.

AI investigator with evidence

9-agent pipeline produces a verdict + evidence chain + recommended action. Explainable. Audit-logged. MITRE-mapped. Hard caps on tokens and iterations per agent.

What's new — April 24 release

A double-sprint that landed Sigma-at-scale, vector similarity, the conversational hunt agent, and a full L3 SOC-analyst UI audit remediation. Production-ready.

AI Hunt Agent (Sprint B · NEW)

Plain-English hunting in HuntWorkbench: "users who logged in from 2+ countries in the last 24h".
Plan + 6 read-only tools (search alerts · user activity · device history · similar investigations · IOCs · MITRE chain) + answer.
Hard caps: max_iters=10, token_budget=50K; force-summarize on exhaustion.
SSE-streamed plan/thought/tool cards; save-as-saved-search and save-as-rule one click.
Cross-tenant access blocked at the runner; company_id injected, never LLM-supplied.

pgvector similarity (Sprint A · NEW)

768-dim embeddings (nomic-embed-text via Ollama, on-prem) on every alert + investigation.
ivfflat cosine indexes; tenant-scoped /api/{investigations,alerts}/{id}/similar.
100% backfilled — investigations 208/208, alerts 12,132/12,132.
"Investigations like this one" feeds the AI investigator's RAG layer + the hunt agent's similarity tool.

Sigma library (Sprint 0 · NEW)

3,101 SigmaHQ community rules pre-loaded as global + disabled (per-tenant opt-in).
Severity mix: 70 critical · 1,400 high · 1,334 medium · 272 low · 25 informational.
89% MITRE-mapped (2,751 / 3,101); product / service / category filters in UI.
GitHub provenance link + YAML expander + DRL-1.1 attribution.
Quarterly re-import via migrate_sigma_bulk_import.py --upgrade <tag>.

URLhaus + ThreatFox live polling

26,122 URLs / 32,866 IOCs loaded; refreshed on the ARIA_URLHAUS_POLL_SECONDS cadence (default 900s).
IOC scorer with cloud-CDN allowlist (490 IOCs rescored — s3.amazonaws.com capped at 5).
Feed-health widget on Integrations · Test-All button on every feed.
Lifecycle states: active / stale / expired / suppressed / trusted.

MFA enrollment wizard

QR-code enrollment with backup codes; status chips on every user row.
Nag banner for unenrolled users · ARIA_MFA_REQUIRED=true hard-enforce env.
5-strike lockout policy; password policy enforcement; session-management with one-click revoke.

LLM resilience — 100% local

Circuit breaker: qwen3.5 → llama3.1:8b → degraded queue. All on-prem Ollama.
No cloud failover. Anthropic / OpenAI provider entries shown but disabled by on-prem policy.
Per-agent token + iteration budgets enforced in the runner.
Live metrics panel at /admin/llm.

Tenant SLA dashboard

Per-tenant MTTD / MTTR vs targets (defaults: critical 5m / high 30m / medium 2h / low 24h).
SlaBadge component with green / amber / red bands; per-source breakout.
9-bucket SLA breach taxonomy (kills "Other = 35"); root-cause classification.

Playbook execution audit

playbook_runs table with full execution history, viewer, and investigation linkage.
Every run: actor, target, dry-run flag, result, duration. Searchable.
Foundation for upcoming auto-trigger from investigation verdict (Sprint N+2).

RBAC permissions matrix

Live matrix view of role × resource × action — Admin / Analyst / Viewer + custom roles.
Per-tenant overrides; effective-policy visualization.
API-key rotation with 24h dual-read window for zero-downtime rotation.

ClickHouse columnar (Sprint C slice)

Container aria-clickhouse live; ReplacingMergeTree alerts + investigations + materialized view.
Vector dual-write wired for high-signal alerts; PG remains source of truth.
Routes time_range > 24h queries to columnar (Sprint C2).
Replaces OpenSearch (removed Apr 23) — single search surface, fewer moving parts.

Pipeline architecture

Tenant resolution at the ingress door. Deterministic rules before the expensive pipeline. Canonical schema downstream so adapters add without touching downstream code.

Device profile anchoring

Before the investigator runs, the enricher does a (tenant, MAC → hostname → IP) lookup against the tenant-scoped devices table. Match returns a device_profile rendered at the top of the LLM prompt. The LLM is told not to guess an OS or role contradicted by the block.

Rules engine as the override

Analyst-authored auto_close / suppress rules fire before the canonical pipeline. Each rule has a natural key, priority, action, optional suppression window, immutable versioning with one-click rollback.

Per-tenant workers

Queue naming aria.<concern>.<company_id>. The supervisor queries the tenant roster on startup and launches worker processes per tenant. Stage workers shared across tenants; per-tenant consumers isolate customer-level failure modes.

Platform capabilities

What the platform does, by concern. Every surface tenant-safe by default, every destructive action audit-logged.

9-agent investigation pipeline

Triage → Threat Intel → Knowledge (RAG + pgvector) → Forensics → Investigator (local LLM) → Remediation (policy-gated) → Validation → Audit → Notify. Plus the conversational Hunt agent. Typical end-to-end: 60-90s. Per-agent token + iteration budgets.

Channel-based tenant isolation

Tenancy is an ingress property, never a payload property. Every data source carries a unique ingest token; the ingress API resolves X-Aria-Source-Token to (company_id, data_source_id) at the door before the body is touched.

Sigma library + per-tenant overrides

3,101 SigmaHQ community rules pre-loaded as global / disabled. Per-tenant opt-in without forking. Severity-banded (70 critical / 1,400 high / 1,334 medium / 272 low). 89% MITRE-mapped. GitHub provenance link + YAML expander on every row.

Conversational Hunt agent

HuntWorkbench → AI Hunt tab. Plain-English questions → plan + 6 read-only tools + answer. Hard caps: 10 iterations / 50K tokens. SSE-streamed plan/thought/tool cards. Save-as-saved-search and save-as-rule one click.

pgvector similarity

768-dim embeddings on every alert + investigation, generated locally via nomic-embed-text. Cosine similarity via ivfflat indexes. Tenant-scoped /similar endpoint feeds RAG knowledge layer + hunt-agent similarity tool + investigator UI panel.

Threat intelligence

Live URLhaus polling (26K URLs / 32K IOCs). ThreatFox. IOC scorer with explainable components. Lifecycle states: active / stale / expired / suppressed / trusted. Cloud-CDN allowlist prevents s3.amazonaws.com-style FPs. Feed-health widget.

Tenant SLA dashboard

Per-tenant MTTD / MTTR vs targets — defaults critical 5m, high 30m, medium 2h, low 24h. Green / amber / red bands. 9-bucket SLA breach taxonomy with root-cause classification.

MFA, RBAC & session governance

MFA enrollment wizard with QR + backup codes. RBAC matrix: role × resource × action with custom roles. API-key rotation with 24h dual-read. 5-strike lockout. Active session list with one-click revoke.

LLM resilience — 100% local

Circuit breaker: qwen3.5 → llama3.1:8b → degraded queue, all local Ollama. Cloud providers shown but disabled by on-prem policy. Per-agent token + iter caps. No customer data ever leaves the host.

Native SIEM-style search

Field-aware DSL (severity:critical AND source:wazuh), visual filter builder, schema panel, cursor-paginated infinite scroll, JSONB SQL pushdown. Saved searches private / tenant / global. Any saved search → scheduled alert.

Detection Engineering

Per-rule TP/FP/escalation rate, noisy-rule ranking, explainable tuning recommendations, replay against historical alerts, MITRE coverage with gap detection, candidate-rule review queue, immutable rule versioning with one-click rollback.

Playbook execution audit

playbook_runs table with full execution history — actor, target, dry-run flag, result, duration. Linked to investigations. Foundation for verdict-triggered auto-execution (Sprint N+2).

Reporting

AI-generated narratives. Scheduled email + Slack delivery with PII guard on Slack — only summary KPIs ever leave. PDF export with per-tenant white-label. QoQ / YoY period comparison with graceful fallback.

Resilience & observability

Per-tenant worker pool. Graceful pipeline degradation. Postgres + OS backups + logrotate. Prometheus /metrics. SHA-256 hash-chained custody per investigation. Audit log on every authn / authz / config / policy / rollback / IOC override / session revoke / remediation event.

Audit posture

Two independent audits this month. Every critical finding closed.

Security review · Apr 18-19

15 findings · 3 CRITICAL · 5 HIGH · 5 MEDIUM · 2 LOW

Status: 15 / 15 CLOSED

Auth + CORS + Fernet + ReDoS + file perms hardened
JWT secret rotated; existing tokens invalidated
Prometheus /metrics live
pg + OS backups + logrotate live
pytest regression suite (12 correctness classes)

UI audit · Apr 24 (L3 SOC analyst)

C+ "Do not deploy" → B- (reconciled) → A-

Status: A- · cleared for production

65 items DONE · 0 MISSING · 12 explicitly parked
All 10 reviewer top-criticals closed
MFA UX shipped (enrollment wizard + nag banner)
Per-tenant SLA dashboard
RBAC permissions matrix · LLM-only-local enforced

Tenant isolation tested · cross-tenant probes blocked · audit log immutable · LLM never reaches the internet · pgvector embeddings local · feeds polled outbound only.
Designed for regulated tenants. Built so your data — and your customers' data — stays on premises.

What's next

Correlation is the missing core. MSP polish closes the operating loops. Posture extends ARIA outward. Cold storage waits for customer ask.

Sprint N+1 — Correlation engine

5–6 days · the missing core

Multi-alert pattern detection: "alert A + B + C within window W on same host = breach".

Brute-force → success
Phishing → exec → outbound
Lateral movement
Account takeover
Data exfiltration

Sprint N+2 — MSP polish

5 days

Playbook auto-trigger from investigation verdict
Report templates: Executive / Incident / Metrics
Playbook versioning + dry-run
Asset grouping by business unit

Sprint M — External Posture

5–6 weeks

Tenant declares external assets. 15-day light scan → letter grade A/B/C/D/F across 6 categories:

Network exposure · TLS
DNS hygiene · Web hardening
Patching · PKI

Mini-SecurityScorecard, on-prem.

Cold storage — revive on customer ask

SSO / SAML
Vulnerability scanner integration
Cloud-source ingestion (CloudTrail · O365 · GWS)
Slack two-way
LLM fine-tuning on tenant data
Marketplace for community detections

On-prem first. Customer-driven. No cloud-LLM compromises.

Clean code. No internet for investigating your incidents.

Architectural position

Multi-alert correlation

Validate against tenant baseline

AI investigator with evidence

Video tour

Investor deck

What's new — April 24 release

AI Hunt Agent (Sprint B · NEW)

pgvector similarity (Sprint A · NEW)

Sigma library (Sprint 0 · NEW)

URLhaus + ThreatFox live polling

MFA enrollment wizard

LLM resilience — 100% local

Tenant SLA dashboard

Playbook execution audit

RBAC permissions matrix

ClickHouse columnar (Sprint C slice)

Pipeline architecture

Device profile anchoring

Rules engine as the override

Per-tenant workers

Platform capabilities

9-agent investigation pipeline

Channel-based tenant isolation

Sigma library + per-tenant overrides

Conversational Hunt agent

pgvector similarity

Threat intelligence

Tenant SLA dashboard

MFA, RBAC & session governance

LLM resilience — 100% local

Native SIEM-style search

Detection Engineering

Playbook execution audit

Reporting

Resilience & observability

Stack

Inference (all local)

Storage

Ingest & pipeline

App & ops

Audit posture

Security review · Apr 18-19

UI audit · Apr 24 (L3 SOC analyst)

What's next

Sprint N+1 — Correlation engine

Sprint N+2 — MSP polish

Sprint M — External Posture

Cold storage — revive on customer ask

Screenshots

Clean code.
No internet for investigating your incidents.