Files
pulse-libs/runbook/DOCKER-SWARM-RUNBOOK.md

50 lines
1.8 KiB
Markdown

# Docker Swarm Runbook — Pulse Agent
_Atualizado: 2026-05-20 | Responsável: Pulse Agent_
## 📋 Inventário de Stacks (8 ativos)
| Stack | Serviços | Status |
|-------|----------|--------|
| bot | beebot | 🟢 |
| code | file (8dcode) | 🟢 |
| database | mongos-master, dbadmin | 🟡 degraded |
| design | penpot (7 containers) | 🟢 |
| dock | portainer, agent | 🟡 |
| git | gitea | 🟢 |
| pro | leantime, leantime-db | 🟡 |
| proxy | caddy (80/443) | 🟢 |
## 🚨 Serviços críticos e seus riscos
| Serviço | Risco | Recuperação |
|---------|-------|-------------|
| `bot_office` | HIGH — OOM kill (exit 137), agora UP porém frágil | `docker service scale bot_office=2` |
| `database_mongos-master` | HIGH — 4 containers falharam exit(139) SIGSEGV | `docker service update --force database_mongos-master` |
| `pro_leantime` | HIGH — 4 containers unhealthy, exit(137) | `docker service update --force pro_leantime` |
| `dock_portainer` | MEDIUM — múltiplos Failed | `docker service update --force dock_portainer` |
| `proxy_caddy` | MEDIUM — mount path inválido em réplicas antigas | fix compose mount |
## 🔧 Comandos de recuperação rápida
```bash
# Status detalhado
docker stack ps --no-trunc --no-resolve <stack>
# Forçar recriação
docker service update --force <stack>_<service>
# Escalar (forçar nova réplica)
docker service scale <stack>_<service>=2
docker service scale <stack>_<service>=1
# Limpar órfãos
docker ps -a -f 'status=exited' --format '{{.Names}}' | xargs docker rm -f
docker ps -a -f 'status=dead' --format '{{.Names}}' | xargs docker rm -f
```
## 📊 Health check coverage
- **3/19** containers com health check definido
- **TODO**: adicionar health check para bot_office, gitea, pro_leantime-db, todos do design stack