Operator dashboard for the live Spaceduckling stack.

LIVE System Health API Stats Admin →
Incident mode active Auto-refresh paused for operator triage.
Operator anomalies detected
  1. Waiting for live status snapshot…
Loading…
RegisteredLoading live account metrics…
CertifiedLoading hatch completion metrics…
BondedLoading bonded spaceduck metrics…
GalaxyLoading live galaxy version…

Audit Activity · Last 24h

Loading…

Live 24h audit volume from POST /beak/audit, with explicit fallback when a category is not yet emitted by the backend.

SignupsWaiting for audit signal…
CertsWaiting for audit signal…
PecksWaiting for audit signal…
ReportsWaiting for audit signal…
Mission Control loading

Live system posture

Loading Beak system status…

Auth surfaceChecking auth routes…
Runtime stateChecking lambda metadata…

Auth Status

Loading…

Live from GET /beak/system/status.

Backend: GET /beak/system/status · Fields: auth.cognito_pool, auth.ses_sender, auth.signout/signup/verify/phone_verify/forgot_reset
Cognito pool ID
SES sender
signout signup verify phone_verify forgot_reset
Health

Communication Status

Loading…

Live SES/SNS provider posture from GET /beak/system/status.

Backend: GET /beak/system/status · Fields: ses.sandbox, sns.sandbox
SESLoading…
SNSLoading…
Health

Sandbox Exit Readiness

Loading…

Live SES/SNS exit blockers from GET /beak/system/status — exact next operator action.

Backend: GET /beak/system/status · Fields: ses.sandbox, ses.daily_quota, sns.sandbox, sns.monthly_spend_limit
SES mode
SES daily quota
SNS mode
SNS spend limit
Daily Quota Usage
Next SES action: checking…
Next SNS action: checking…

Webhook Health

Operator-set

EventBridge wiring currently documented from deployed infra.

EventBridge busspaceduck-events
StatusConnected
Eventsduck.hatched, duck.bonded, duck.pulse, duck.unpecked
Logs→ CloudWatch /spaceduck/events (90 day retention)
HealthEvent bus connected

Runtime State

Loading…

Live Lambda metadata from system/status.lambda.

Backend: GET /beak/system/status · Fields: lambda.version, lambda.runtime, lambda.memory_mb, lambda.timeout_s, lambda.function, lambda.last_modified
Function
Runtime
Memory
Timeout
Regionus-east-1
Online sinceOnline since last deploy
Current UTC
UptimeUptime: Live
Concurrency
Health

Agent Fleet

Loading…

Live bonded agent telemetry from the production status route.

Backend: GET /beak/system/status · Fields: agents.alive, agents.slow, agents.dead, agents.total_bonded
Alive
Slow
Dead
Total bonded
Health

Database Posture

Loading…

Live DynamoDB table counts from the system status route.

Backend: GET /beak/system/status · Fields: database.* (eggs, ducklings, birth_certificates, …) + GET /beak/metrics for growth strip
Loading table posture…
Ducklings
Certs issued
Connections
Peck requests
Health

DynamoDB Billing

Loading…

Live DynamoDB billing modes from system/status.

Backend: GET /beak/system/status · Fields: database_billing.*
Loading billing modes…
Health

Pending Peck Requests

Loading…

Live from GET /beak/metrics.

Backend: GET /beak/metrics · Fields: spaceducks_bonded
Spaceducks bonded
HistoryPeck request history available via /beak/audit
Health

🔗 Peck Protocol

Loading…

Live Step Functions execution counts from peck-approval-workflow.

Backend: GET /beak/system/status · Fields: peck_protocol.running, peck_protocol.succeeded, peck_protocol.failed
🔄 Running
✅ Succeeded
❌ Failed

Lambda Cost Estimator

Loading…

Projected monthly cost from invocation count and duration.

Backend: POST /beak/audit (for invocation count), GET /beak/system/status (for memory)
Monthly Invocations
Average Duration— ms
Configured Memory— MB
Projected Cost$—. est.
Excludes free tier. Prices for us-east-1.

Peck Failure Analysis

Loading…

Analysis of failed peck requests from the Step Functions workflow, split into denied, expired, and callback-error buckets.

Backend: GET /beak/system/status · Fields: peck_protocol.failed, peck_protocol.succeeded, peck_protocol.failure_breakdown.*
Failed Pecks
Success Rate—%
Failure Rate—%
Denied —%
Expired —%
Callback errors —%
View audit →

Alias Drift Investigation

Loading…

Direct prod alias investigation: function version, alias version, explicit parity verdict, and exact remediation command.

Prod alias target (get_alias)
Invoked version (context)
Reported lambda.version
Parity verdict
Status generated at
Status age
FreshnessLoading…
Remediation
Last checked: —

Deploy Readiness Checklist

Loading…

Pre-promotion checklist comparing prod alias state, reported status version, sandbox posture, and the latest deploy log snapshot.

Prod alias parityWaiting for status…
Status freshnessWaiting for generated timestamp…
Sandbox blockersWaiting for provider posture…
Recent deploy logWaiting for static log snapshot…
Recent deploy entries will appear here.
Last checked: —

Route Latency Strip

Loading…

Live fetch durations for the three operator-critical routes with green / amber / red thresholds.

GET /beakWaiting for probe…
GET /beak/metricsWaiting for probe…
GET /beak/system/statusWaiting for probe…
Last checked: —

Operator Watchlist

0 pinned

Pin up to 5 duckling IDs or cert IDs in localStorage for one-click support lookup.

No watchlist items pinned yet.

Incident Mode

Inactive

Freeze auto-refresh during active incidents, highlight critical cards, and stamp the UTC incident start time in localStorage.

Start time: —

Data Freshness Scoreboard

Loading…

Age of the last successful system/status, metrics, and audit fetch with stale thresholds.

System statusWaiting for fetch…
MetricsWaiting for fetch…
AuditWaiting for fetch…

Operator Handoff Bundle

Ready

Export a markdown handoff with build stamp, drift state, sandbox state, peck failure stats, and local shift notes.

Includes shift log localStorage notes automatically.

System Health Score

Loading…

Computed client-side from GET /beak/system/status.

Backend: GET /beak/system/status · Fields: agents.dead, agents.slow, database.ducklings + POST /beak/audit for recent event count
— / 100
Waiting for live data…

Cert Pipeline

Loading…

Live system throughput from eggs to bonded spaceducks.

Backend: GET /beak/system/status · Fields: database.eggs/ducklings/birth_certificates, agents.total_bonded + GET /beak/metrics for trend
Eggs
Ducklings
Certified
Spaceducks
Total certs issuedWaiting for metrics…
Last cert issuedTimestamp pending…
TrendWeekly movement pending…

Cert Issuance Latency

Loading…

Last-issued age plus rolling 24h certificate volume from live metrics and audit signals.

Backend: GET /beak/metrics · Fields: last_cert_issued_at (+ fallback POST /beak/audit for duck.cert_issued / duck.hatched events)
Last issued ageWaiting for issuance telemetry…
Issued in last 24hWaiting for audit volume…
Latest cert timestampUsing /beak/metrics when present, then audit fallback.

Mission Control API

Loading…

Request health for the operator-facing APIs used by this page.

Backend: GET /beak/system/status (latency + fields), GET /beak/metrics, POST /beak/audit — all three fetched each refresh cycle
Status route/beak/system/status
Metrics route/beak/metrics
Audit route/beak/audit
Observed latency
Health

Governance Actions

Manual approval required

Production changes move through the governance lane: open a request ID, record the target version cycle, and capture approval before any alias change. Frozen surfaces still require T-JOSH sign-off. See GOVERNANCE-LOG.md for the full audit trail.

Alert Thresholds

Reference

Static operator thresholds for quick review in the control surface.

API error ratealert if > 5%
Lambda memoryalert if > 100MB
Cert agealert if > 60 days
Peck failuresalert if > 10 in 24h

Failure Budget

Within budget

24h SLA targets — Galaxy 1.1

Auth flow (signup/signin)99.9% target · est. OK
Cert issuance99.5% target · est. OK
Peck approval flow99.0% target · est. OK
SMS deliverysandbox only · limited

Manual review — update after incidents

CloudWatch Alarms

No alarms

Configured alarm thresholds (checked manually via AWS Console)

Lambda errors > 5%OK
Lambda duration > 5sOK
API Gateway 5xx > 1%OK
DynamoDB throttleOK

Last verified: 2026-03-21 · Update manually after incidents

Recent Deploys

Static snapshot

Last 5 Lambda deployments

2026-03-21 05:40 UTC · v39prod
2026-03-21 05:33 UTC · v38prod
2026-03-21 05:31 UTC · v37prod
2026-03-21 05:24 UTC · v36prod
2026-03-21 05:18 UTC · v35prod

Update this card after each deploy. Refresh it periodically because this is a one-time static snapshot from coordination/DEPLOY-LOG.md.

Lambda Version History

Static snapshot

Last 5 prod promotions from coordination/DEPLOY-LOG.md.

v392026-03-21 05:40 UTC | DG-033 /beak/pageview route added
v382026-03-21 05:33 UTC | DC-038 lambda_alias_version + SD-034/035/038 prior batch merged
v372026-03-21 05:31 UTC | DC-038 lambda_alias_version in status response
v362026-03-21 05:24 UTC | SD-034 AgentMail cert email wired into /beak/cert/issue
v352026-03-21 05:18 UTC | birth certificate issuance now captures ip_address, user_agent, and country metadata at issuance time; prod alias promoted to v35

Auth Dependencies

Loading…

Mixed live + operator-confirmed auth provider posture.

Cognito
Turnstile✓ Production keys active
AgentMail✓ Inbox configured
SES

Surface Audit

Documented

DC-019 audit: nothing below is decorative.

Live sectionsMetrics strip, hero summary, Auth Status, Runtime State, Agent Fleet, Database Posture, Pending Peck Requests, Mission Control API, Recent Events.
Static sectionsCommunication Status and Webhook Health are explicit operator-truth tiles until dedicated API fields exist. Quick links are utility navigation, not health tiles.
RemovedDecorative/dead-weight tiles from the previous layout were replaced by real data surfaces or explicit static operator state.

Recent Events

Loading…

Last five audit entries from POST /beak/audit.

Backend: POST /beak/audit (body: {}) · Fields: entries[].event_type, entries[].timestamp
Loading recent events…

Route Health

Checking

Live probe of key API routes

Backend: direct HTTP probes — GET /beak, GET /beak/metrics, GET /beak/system/status — independent of main refresh cycle
GET /beak
GET /beak/metrics
GET /beak/system/status
Last checked: —

Recent Activity

Loading…

Last five entries from GET /beak/system/status recent_audit when present.

Loading recent activity…

Cert Inventory

Loading…

Duckling and certificate state distribution computed from live database counts.

Backend: GET /beak/system/status · Fields: database.eggs, database.ducklings, database.birth_certificates, agents.total_bonded
Certs issued
Ducklings pending cert
Unverified eggs
Bonded (fully certified)
Completion ratecerts / ducklings
Pending backlogducklings awaiting cert
Pipeline yieldeggs → bonded ratio

Operator Acknowledgements

0 / 4 acknowledged

Pre-shift checklist — tick each item before making production changes. State persists across refresh with UTC acknowledgement stamps until reset.

Confirmed current deploy matches expected version and behaviour. Acknowledged at: —
Previous known-good alias version identified and rollback command documented. Acknowledged at: —
CloudWatch alarms checked; no unexpected alerts present. Acknowledged at: —
Database table counts and cert pipeline values look expected. Acknowledged at: —

Persisted in localStorage only. For long-term records, log to GOVERNANCE-LOG.md.

Operator Notes

Shift Handoff

Standing shift context and next priorities. Source of truth: coordination/OPERATOR-NOTES.md.

Current Shift Notes
Galaxy 1.1 Beta is live. SES and SNS remain in sandbox mode — production exit requests are pending.
Lambda prod alias is at v30 ($LATEST drift expected; check drift alarm on load).
Cert pipeline: eggs → ducklings → certs → bonded. Monitor Cert Issuance Latency tile for any backlog.
Peck flow is the primary audit event source. Signups and certs do not yet emit audit events.
CloudWatch alarm review required before any production alias promotion.
Standing Operator Actions
1. SES: Request production access via AWS SES console → Sending Statistics → Request Production Access.
2. SNS: Request SMS sandbox exit via AWS SNS console → SMS Sandbox → Request exit.
3. After each deploy: update coordination/DEPLOY-LOG.md and record version + alias promotion.
4. After incidents: log to coordination/GOVERNANCE-LOG.md with request ID + approver.
Next Priorities
Wire signup/cert audit events in Lambda so the Audit Activity strip shows real counts.
Expose agents.stale_list in /beak/system/status for the Stale Agent Spotlight card.
Expose last_cert_issued_at in /beak/metrics to enable real cert latency display.
Source: coordination/OPERATOR-NOTES.md · Last updated: 2026-03-21 UTC · Batch DC-055–DC-057

Operator Shift Log

Ready

Local shift notes for handoff continuity. Saves the latest three UTC-stamped notes to your browser only.

Newest first · UTC timestamps

Last saved: never

Stale Agent Spotlight

Loading…

Dead or never-pulsed agent summary from GET /beak/system/status. Individual agent records are shown when the API exposes them; otherwise summary counts are rendered.

Backend: GET /beak/system/status · Fields: agents.dead, agents.alive, agents.slow, agents.total_bonded, agents.stale_list (when available)
Loading agent health…

⌨️ Operator Command Palette

One-click copyable diagnostics for rapid operator investigation — curl and aws CLI commands pre-filled for prod.

Commands pre-loaded · Click Copy to grab

📊 Metrics Delta Strip

Waiting…

Change since last refresh — green for growth, red for drops, grey for no change.

Last checked: —

⚠️ Degraded-Service Impact Matrix

Loading…

Translates live sandbox state, dead-agent count, and peck failures into plain-English operator risk.

Last checked: —

🔗 Connection Pressure

Loading…

Connections per bonded agent, peck pending vs failed ratio, and overload thresholds.

Last checked: —

🔄 What Changed Since Last Refresh

Metrics that moved on the latest live poll with old → new values.

Waiting for first refresh comparison…

📋 Route Failure Journal

Last route probe failures persisted in localStorage — UTC stamp, endpoint, status code.

No route failures recorded yet.

🕐 Incident Timeline

Chronological handoff view combining deploy-log tail, recent audit events, and operator notes.

📦 Incident Handoff Pack v2

Exports diagnostics JSON, anomaly summary, operator notes, route health, and parity state as a downloadable Markdown + JSON pair.

🚨 Operator Drill Checklist

Incident triage, alias verification, route probe review, and rollback preparation steps.

Step 1 — Confirm live statusRun curl .../beak/system/status and verify lambda.version matches prod alias via aws lambda get-alias.
Step 2 — Check route healthUse Route Latency strip — any route >1000ms or error indicates a degradation. Check Route Failure Journal for recent failures.
Step 3 — Review anomaly summaryCheck the Anomaly Banner and Impact Matrix. Classify as Degraded / Partial Outage / Full Outage.
Step 4 — Rollback readinessCheck Deploy Readiness Checklist. If rollback needed: list versions with aws lambda list-versions-by-function, then use Promote Alias command from palette.
Step 5 — HandoffExport Incident Handoff Pack v2 (Markdown + JSON) for shift handoff or incident ticket. Save a Shift Log note with current state.

📈 Route Latency History

Last few probe latencies per route — movement over time for degradation detection.

No latency history yet — waiting for route probes.

🕵️ Stale Data Detector

Checking…

Cards that have not refreshed within the last 120 seconds are flagged here so operators don't mistake cached UI for live truth.

Waiting for first refresh…

🚧 Operational Blockers

Loading…

Precise operational constraints for SES, SNS, cert pipeline, and GitHub — not vague waiting states.

Last checked: —

🔄 Board / Runtime Reconciliation

Compare board task state against live runtime truth — exports a compact Markdown handoff for operator review.

Cold-start trend

Last 24h cold-starts

SES Exit Readiness

Checklist for leaving AWS SES sandbox

—/6
Open AWS SES Console →

SES sandbox exit is pending AWS review

DynamoDB Tables

Table health sourced from sd_dynamo_status localStorage key.

DB Write Activity

Last 10 audit log entries from sd_audit_cache localStorage. Auto-refreshes every 30s.

No recent DB activity

Lambda Memory

Local log

Estimated MB per invocation — last 10 data points from this session.

Latest: — MB ⚠ Amber threshold: 100 MB
Sourced from localStorage missionControlMemoryLog

Lambda Timeout Risk

Estimating…

Estimated p95 execution time from sd_latency_log vs the 30s Lambda limit.

p95 latency
0s20s25s30s limit
Timeout risk: LOW — No latency data yet
Sourced from localStorage sd_latency_log · 30s Lambda limit

Pipeline Yield

Loading…

Certified / total pipeline entries — from /beak/system/status.

— certified / — total Trend: —
Snapshot: —

Peck Approval Timeline

Loading…

Last 5 peck outcomes (approved / denied / expired) from POST /beak/audit with UTC timestamps and outcome reason.

Loading peck outcomes…
Last checked: — View full peck history →
Backend: POST /beak/audit (body: {}) · Filters: duck.peck_approved, duck.peck_denied, duck.peck_expired

Runtime Anomaly History

Last 10 detected anomalies (alias drift, dead fleet, peck failures) logged on each page load. Stored in missionControlAnomalyLog localStorage.

No anomalies logged yet.
Source: derived from live status + metrics each refresh cycle · key: missionControlAnomalyLog

Operator Todos

0 / 0 done

Persistent checklist for this operator session. Stored in localStorage.

Loading…

Recent Emails

Last 5 cert issuance and signup emails sent via SES. Email addresses are masked for privacy.

View full audit →