feat: skill expansion — browser, security, SQL, files (16 skills total)
Novas skills instaladas: - openclaw-agent-browser v1.0.0 CLI Chromium — navegação, login, screenshots, state - skill-security-audit v1.0.0 SAST scanning, prompt injection, secrets audit - sql-toolkit v1.0.0 PostgreSQL/MySQL/SQLite — schema, query, otimização - file v1.0.0 Organização de arquivos por contexto - file-summary v1.0.0 Extração e resumo de PDFs, Word, Excel Workspace expandido: - TOOLS.md: +Browser automation, Security audit, SQL, File management - AGENTS.md: +Linux Analyst section (comandos, logs, rede, scripts) + Full-stack strategy - MEMORY.md: 16 skills indexadas, stack map, comandos Linux ref - SESSION-STATE.md: atualizado com contexto completo - lock.json: sincronizado com 16 skills instaladas
This commit is contained in:
@@ -0,0 +1,144 @@
|
||||
---
|
||||
name: agent-browser
|
||||
description: Headless browser automation CLI for AI agents. Use when interacting with websites — navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, scraping, testing web apps, downloading files, or automating any browser task. Triggers on requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data", "test this web app", "login to a site", "monitor a page", or any task requiring programmatic web interaction.
|
||||
---
|
||||
|
||||
# Browser Automation with agent-browser
|
||||
|
||||
## Setup
|
||||
|
||||
Run `scripts/setup.sh` to install agent-browser and Chromium. Requires Node.js.
|
||||
|
||||
## Core Workflow
|
||||
|
||||
Every browser automation follows this pattern:
|
||||
|
||||
1. **Navigate**: `agent-browser open <url>`
|
||||
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
|
||||
3. **Interact**: Use refs to click, fill, select
|
||||
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
|
||||
|
||||
```bash
|
||||
agent-browser open https://example.com/form
|
||||
agent-browser snapshot -i
|
||||
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
|
||||
|
||||
agent-browser fill @e1 "user@example.com"
|
||||
agent-browser fill @e2 "password123"
|
||||
agent-browser click @e3
|
||||
agent-browser wait --load networkidle
|
||||
agent-browser snapshot -i # Check result
|
||||
```
|
||||
|
||||
## Command Chaining
|
||||
|
||||
Chain with `&&` when you don't need intermediate output:
|
||||
|
||||
```bash
|
||||
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
|
||||
```
|
||||
|
||||
Run separately when you need to parse output first (e.g., snapshot to discover refs).
|
||||
|
||||
## Essential Commands
|
||||
|
||||
```bash
|
||||
# Navigate
|
||||
agent-browser open <url>
|
||||
agent-browser close
|
||||
|
||||
# See the page (always do this first)
|
||||
agent-browser snapshot -i # Interactive elements with refs
|
||||
agent-browser snapshot -i -C # Include onclick divs
|
||||
|
||||
# Interact using @refs
|
||||
agent-browser click @e1
|
||||
agent-browser fill @e2 "text"
|
||||
agent-browser select @e1 "option"
|
||||
agent-browser press Enter
|
||||
agent-browser scroll down 500
|
||||
|
||||
# Get info
|
||||
agent-browser get text @e1
|
||||
agent-browser get url
|
||||
agent-browser get title
|
||||
|
||||
# Wait
|
||||
agent-browser wait @e1 # For element
|
||||
agent-browser wait --load networkidle # For network idle
|
||||
|
||||
# Capture
|
||||
agent-browser screenshot page.png
|
||||
agent-browser screenshot --full # Full page
|
||||
agent-browser pdf output.pdf
|
||||
```
|
||||
|
||||
For the full command reference, see `references/commands.md`.
|
||||
|
||||
## Ref Lifecycle (Important)
|
||||
|
||||
Refs (`@e1`, `@e2`) are invalidated when the page changes. Always re-snapshot after:
|
||||
- Clicking links/buttons that navigate
|
||||
- Form submissions
|
||||
- Dynamic content loading (dropdowns, modals)
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Form Submission
|
||||
```bash
|
||||
agent-browser open https://example.com/signup
|
||||
agent-browser snapshot -i
|
||||
agent-browser fill @e1 "Jane Doe"
|
||||
agent-browser fill @e2 "jane@example.com"
|
||||
agent-browser select @e3 "California"
|
||||
agent-browser click @e5
|
||||
agent-browser wait --load networkidle
|
||||
```
|
||||
|
||||
### Login with State Persistence
|
||||
```bash
|
||||
agent-browser open https://app.example.com/login
|
||||
agent-browser snapshot -i
|
||||
agent-browser fill @e1 "$USERNAME" && agent-browser fill @e2 "$PASSWORD"
|
||||
agent-browser click @e3
|
||||
agent-browser wait --url "**/dashboard"
|
||||
agent-browser state save auth.json
|
||||
|
||||
# Reuse later
|
||||
agent-browser state load auth.json
|
||||
agent-browser open https://app.example.com/dashboard
|
||||
```
|
||||
|
||||
### Data Extraction
|
||||
```bash
|
||||
agent-browser open https://example.com/products
|
||||
agent-browser snapshot -i
|
||||
agent-browser get text @e5
|
||||
agent-browser get text body > page.txt
|
||||
```
|
||||
|
||||
### Screenshot & Diff
|
||||
```bash
|
||||
agent-browser screenshot baseline.png
|
||||
# ... changes happen ...
|
||||
agent-browser diff screenshot --baseline baseline.png
|
||||
```
|
||||
|
||||
### Parallel Sessions
|
||||
```bash
|
||||
agent-browser --session site1 open https://site-a.com
|
||||
agent-browser --session site2 open https://site-b.com
|
||||
agent-browser session list
|
||||
```
|
||||
|
||||
## Security (Optional)
|
||||
|
||||
```bash
|
||||
export AGENT_BROWSER_CONTENT_BOUNDARIES=1 # Wrap output for AI safety
|
||||
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com" # Domain allowlist
|
||||
export AGENT_BROWSER_MAX_OUTPUT=50000 # Prevent context flooding
|
||||
```
|
||||
|
||||
## Cleanup
|
||||
|
||||
Always close sessions when done: `agent-browser close`
|
||||
Reference in New Issue
Block a user