From 1ccf3ce7278335cc534572adf92672eee69f258b Mon Sep 17 00:00:00 2001 From: Pulse Agent Date: Wed, 20 May 2026 11:00:36 -0300 Subject: [PATCH] docs(pulse-docs): add Docker Swarm runbook + recovery commands + session & docker checklists + state snapshot --- checklists/docker/DOCKER-CHECKLIST.md | 5 ++++ checklists/session/SESSION-CHECKLIST.md | 13 +++++++++ runbook/DOCKER-SWARM-RUNBOOK.md | 37 +++++++++++++++++++++++++ runbook/RECOVERY-COMMANDS.md | 20 +++++++++++++ 4 files changed, 75 insertions(+) create mode 100644 checklists/docker/DOCKER-CHECKLIST.md create mode 100644 checklists/session/SESSION-CHECKLIST.md create mode 100644 runbook/DOCKER-SWARM-RUNBOOK.md create mode 100644 runbook/RECOVERY-COMMANDS.md diff --git a/checklists/docker/DOCKER-CHECKLIST.md b/checklists/docker/DOCKER-CHECKLIST.md new file mode 100644 index 0000000..2be7073 --- /dev/null +++ b/checklists/docker/DOCKER-CHECKLIST.md @@ -0,0 +1,5 @@ +# Docker Checklist — Pulse Agent + +- [ ] \`docker ps\` — verificar ~19 containers rodando +- [ ] \`docker ps -a -f status=exited --format '{{.Names}}'\` — limpar orfaos +- [ ] \`docker stack ps --no-trunc \` — tasks por stack diff --git a/checklists/session/SESSION-CHECKLIST.md b/checklists/session/SESSION-CHECKLIST.md new file mode 100644 index 0000000..9fe1775 --- /dev/null +++ b/checklists/session/SESSION-CHECKLIST.md @@ -0,0 +1,13 @@ +# Session Checklist — Pulse Agent Auto-Check + +## Inicio +- [ ] Ler MEMORY.md +- [ ] Ler SESSION-STATE.md +- [ ] Ler LEARNINGS.md | ERRORS.md | PATTERN_COUNTER.md +- [ ] docker ps — servicos +- [ ] df -h — disco +- [ ] uptime — load + +## Fim +- [ ] Atualizar memory/.md +- [ ] Ler .learnings/LEARNINGS.md diff --git a/runbook/DOCKER-SWARM-RUNBOOK.md b/runbook/DOCKER-SWARM-RUNBOOK.md new file mode 100644 index 0000000..fda76ed --- /dev/null +++ b/runbook/DOCKER-SWARM-RUNBOOK.md @@ -0,0 +1,37 @@ +# Docker Swarm Runbook — Pulse Agent + +_Atualizado: 2026-05-20 | Responsável: Pulse Agent_ + +## \u0001Servicos por Stack + +| Stack | Servicos | Status | +|-------|----------|--------| +| bot | beebot | 🟢 | +| code | file (8dcode) | 🟢 | +| database | mongos-master, dbadmin | 🟡 degraded | +| design | penpot (7 containers) | 🟢 | +| dock | portainer, agent | 🟡 | +| git | gitea | 🟢 | +| pro | leantime, leantime-db | 🟡 | +| proxy | caddy (80/443) | 🟢 | + +## Recuperacao rapida + +\`\`\`bash +# Status detalhado +docker stack ps --no-trunc --no-resolve + +# Forcar restart +docker service update --force _ + +# Escalar (forcar nova réplica) +docker service scale _=2 +docker service scale _=1 + +# Limpar orfaos +docker ps -a -f 'status=exited' --format '{{.Names}}' | xargs -r docker rm -f +\`\`\` + +## Health check coverage +- 3/19 containers com health check definido +- TODO: adicionar para bot_office, gitea, pro_leantime-db, todos do design stack diff --git a/runbook/RECOVERY-COMMANDS.md b/runbook/RECOVERY-COMMANDS.md new file mode 100644 index 0000000..3a2570f --- /dev/null +++ b/runbook/RECOVERY-COMMANDS.md @@ -0,0 +1,20 @@ +# Recovery Commands — Docker Swarm + +## Emergency +\`\`\`bash +docker node ls # verificar saude do no +docker stack rm && sleep 3 # remover stack problematica +docker stack deploy -c .yml # re-deploy +\`\`\` + +## Servico especifico +\`\`\`bash +docker service ps _ # ver tasks +docker service update --force _ # forc@r nova task +\`\`\` + +## Health check manual +\`\`\`bash +docker inspect --format '{{.State.Health.Status}}' +# → healthy | unhealthy | starting | +\`\`\`