A multi-agent AI SOC platform with an MSSP operating layer. Tenancy is enforced at the ingress channel — every source ships with a per-channel token; payload content never decides the tenant. Raw events flow through a canonical parser (with cross-source delegation when one vendor wraps another) into enrichment, scoring and explainable triage, where an asset resolver anchors the investigator on ground-truth device profiles instead of guessing roles from device names. Deterministic analyst-authored rules short-circuit triage for benign chatter; the rest lands in a 7-stage LLM + rules pipeline with MITRE-mapped verdicts and policy-gated remediation. Native SIEM search, a Detection Engineering workbench, a Security Intelligence operating layer, and an Admin Control Plane round out the analyst UI. Tenant-safe by default; every destructive action audit-logged. Runs on-prem — nothing leaves the box by default.
3-minute 30-second captioned walkthrough of every module and new feature. Generated from a demo environment.
Eight-slide briefing for technical review — architecture, pipeline, AI, security, integrations, resilience, workflows, economics. Navigate with ← / → arrows inside the frame.
Thirty-six-slide auto-playing captioned walk through the product, with keyboard navigation. Each slide explains what you're looking at and which feature it demonstrates.
Opens in a new tab. 7s per slide, ←/→ to navigate, P to pause.
Click any tile below to open full-size. 36 images at 1680×1050 @ 2× density.
Jump to gallery ↓A two-sprint cycle that rebuilt tenancy from the ingress up, made triage explainable, anchored the investigator on ground-truth device profiles, and hardened the MSSP control plane. Grouped by concern.
tenant_scope choke point applied to CMDB, Devices, Campaigns, Detection Rules, Saved Searches, Scheduled Reports.score_reasons[] audit trail on every alert; triage_reason human-readable sentence; trigger_reason on every investigation.device_profile (role / OS / manufacturer / confidence) to every alert before the LLM sees it.000 (the manager itself) and GCs stale entries automatically.auto_close / suppress actions short-circuit triage.aria.<concern>.<company_id>; per-tenant processes alongside stage workers.Intl.supportedValuesOf with curated fallback.company_id on create ignored for non-admin users — forced to the caller's tenant.company_id deliberately dropped.aria-ingest service as a systemd unit — security boundary, separate blast radius./health, /healthz, /api/health all respond (prevents external-probe false alarms)./metrics + pg + OS backups + logrotate + pytest regression suite.Every event follows the same path. Tenant resolution happens at the ingress door; deterministic rules run before the expensive pipeline; every downstream stage reads a canonical schema so adapters add without touching downstream code.
Before the investigator runs, the enricher does a (tenant, MAC → hostname → IP) lookup against the tenant-scoped devices table. Match returns a device_profile (role, OS, manufacturer, confidence, discovery source) that is rendered at the top of the LLM prompt as a DEVICE PROFILE block. The LLM is told not to guess an OS or role contradicted by the block. Discovery adapters populate the devices table via a 4-hour loop + onboarding-time one-shot.
Analyst authored auto_close / suppress rules are deterministic, not a model input. They fire before the canonical pipeline so they never contend with scoring. Each rule has a natural key, priority, action, optional suppression window, and is versioned with one-click rollback. Pre-seeded rules cover the long tail of benign SaaS chatter so the downstream pipeline only sees what actually needs thinking about.
Queue naming aria.<concern>.<company_id>. The supervisor queries the tenant roster on startup and launches worker processes per tenant. Stage workers (7 of them: triage, intel, knowledge, forensics, investigator, remediation, validation) are shared across tenants; per-tenant consumers isolate customer-level failure modes. A single backend host comfortably runs 19 worker processes for three tenants, scales horizontally past ten.
What the platform does, organized by concern. Every surface is tenant-safe by default and audit-logged for destructive actions.
Seven stages per alert: Triage (MITRE mapping) → Threat Intel (VT/AbuseIPDB/TAXII/CVE) → Knowledge (RAG over history) → Forensics (timeline + campaign) → Investigator (LLM verdict) → Remediation (policy-gated) → Validation (post-action re-check). Learning runs async to generate Sigma rule candidates. Typical end-to-end: 60-90s.
Tenancy is an ingress property, never a payload property. Every data source on a tenant carries a unique ingest token; the ingress API resolves X-Aria-Source-Token to (company_id, data_source_id) at the door before the event body is touched. A dedicated ingest service (port 8001) forms the security boundary — compromising the analyst UI can't forge events, and no upstream relay can spoof another tenant by rewriting a field. Partner-portal wizard emits shipper-config snippets (Vector / Filebeat / Fluent Bit / curl) per source; token shown once.
Raw vendor events are normalized into a CanonicalAlert schema, then enriched (IOC, asset, MITRE, rule-FP-rate, frequency, UEBA), scored (source confidence × derived confidence, combined via max + 0.3·min), and triaged into auto-closed / correlating / investigated / escalated bands with a per-tenant threshold (30-80). Every alert carries score_reasons[] (audit trail of what moved the number) and a human-readable triage_reason; every investigation carries a trigger_reason. No opaque "the model said so."
Real-world pipelines relay one vendor's events through another — for example, a firewall alarm arriving through a Wazuh manager as a wrapped payload. The Wazuh parser sniffs the wrapper (data.source == firewalla-msp or data._type starts with ALARM_), lazy-loads the Firewalla parser, and delegates normalization and scoring, then stamps the relay trail (agent id, rule id, manager) as tags for audit. Adding a new source = one parser file + one fixture.
A device resolver kills the hallucinate-the-OS-from-a-device-name class of false positives. Integration adapters (Firewalla, Wazuh) push discovered devices into a tenant-scoped devices table with role, OS, manufacturer, model, confidence, and discovery source; a periodic 4-hour loop refreshes. At enrichment time, (tenant, MAC → hostname → IP) lookup attaches a ground-truth device_profile to the alert. The investigator prompt renders it as a DEVICE PROFILE block above free-text, so the LLM is anchored on role before it reasons.
Analyst-authored rules are the override, not a suggestion. A rules layer runs before the canonical pipeline and can auto_close or suppress benign chatter (Apple/iCloud, Google/Microsoft/Zoom, NTP, DoH connectivity checks, Wazuh keep-alives, ad/tracker blocks) short-circuiting triage entirely. Rules are global or per-tenant and ship pre-seeded; analysts can add more without touching code. The pipeline downstream only sees what actually needs thinking about.
Any saved search can be promoted to an alert with a cron expression, timezone, hit threshold, and multi-recipient email list. Six presets cover the common cadences (5 min / 15 min / hourly / daily 9 AM / weekdays 9 AM / Monday 9 AM); arbitrary cron strings are supported. Ownership and admin RBAC match the delete rule — only the search owner or an admin can configure alerting. Backend croniter runner fires on the tenant's timezone, not the server's.
Queues are named aria.<concern>.<company_id>; a supervisor queries the tenant roster on startup and spawns dedicated worker processes per tenant alongside stage workers. Noisy tenants don't starve quiet ones, and tenant-level pause / restart operates on a blast-radius of one. Designed to scale horizontally to 10+ tenants without a container-per-tenant blowup.
Built-in search tab with field-aware DSL (severity:critical AND source:wazuh), visual filter builder, right-side schema panel, cursor-paginated infinite scroll, SQL pushdown into JSONB for sub-100ms first page, and saved searches with private / tenant / global visibility. Timechart and top-value aggregations share the same filter state. Any saved search can be promoted to a scheduled alert (cron + timezone + threshold + multi-recipient). This is the single search surface — no external index to operate, no split-brain with a parallel search engine.
Closes the loop Alert → Verdict → Rule. Per-rule TP/FP/escalation rate, noisy-rule ranking by fp_rate × log(1+hits), explainable tuning recommendations with evidence, replay against historical alerts, MITRE coverage with gap detection, candidate-rule review queue (LearningAgent output), immutable rule versioning with one-click rollback, and Sigma import with structural validation.
IOCs move from "enrich and show" to scored + prioritized + correlated. Threat score 0-100 with explainable components (source confidence, sighting volume, confirmed TPs, freshness, lifecycle overrides). Lifecycle states: active / stale / expired / suppressed / trusted. Campaign severity + confidence (confirmed / probable / unknown). Feed sync health, hunt-suggestion generation, intel ⇄ rule coverage gaps, per-tenant threat landscape.
Single choke point for tenant isolation. Every query scoped server-side from the JWT; client-supplied tenant parameters are ignored. Cross-tenant admin access is explicit, role-gated, audit-logged, and surfaced via UI banner. Per-tenant white-label branding (logo, display name, primary color, report footer) flows into every report payload and PDF export. Per-tenant per-action policy overrides with effective-policy visualization. MSP-of-MSP tenant hierarchy via parent-child wiring.
Not just settings — governance. Worker lifecycle (start / stop / restart / restart-all) with confirm dialogs + audit. Aggregated system alerts across 6 categories. Cost dashboard (tokens + USD per day, per tenant, top drivers). Active session list with one-click revoke. Runtime config editor with type validation. RabbitMQ queue depth. Policy editor showing effective policy per tenant.
KPI strip with click-through to filtered investigations. Per-tenant comparison with grades A-D. Alert-lifecycle funnel. SLA breach root-cause analysis (backlog / slow investigation / escalation delay). Data coverage per source. Automation coverage. Decision-engine metrics. Dashboard narrative that summarizes "what changed" vs the previous period, plus z-score alert-volume anomaly detection. Reports carry executive-ready narrative on every type; scheduled delivery via cron + email; PDF export with tenant branding; QoQ / YoY period comparison with graceful fallback when history is insufficient.
Evidence chain: SHA-256 hash-chained custody per investigation. Audit log covers every authentication, authorization, config change, policy override, cross-tenant access, rule rollback, IOC override, session revoke, and remediation action. Search telemetry for SLO tracking. Email delivery log — no silent failures. Per-agent tracing via Langfuse when configured. Compliance / Audit report type exports the audit trail with event-type summary and narrative.
Representative module captures from a demo environment. Click to enlarge.