Overview
Browser Agents let you automate real browsers using Playwright. Your action receives aBrowser, BrowserContext, and Page along with a high-level agent that can:
- Execute natural-language instructions with
agent.act(...) - Extract structured data with
agent.extract(label, schema)using Zod - Escalate to a human with the
human:request-helpaction
Anatomy of a Browser Agent action
Define a Zod schema for inputs, then export a default async function. You getbrowser, context, page, variables, agent, and an uploadFile helper.
Acting: agent.act
- Purpose: Execute multi-step, natural-language instructions that the agent translates into concrete Playwright interactions (clicks, typing, navigation).
- Input: A single instruction string. Be explicit about goals, key UI elements, and success conditions.
- Behavior: The agent plans and performs actions on the provided
page, emitting events as it goes.
- State end conditions (e.g., “until a price is visible”).
- Name critical UI elements if known (labels, roles, placeholders).
- Prefer short, sequential instructions over one very long paragraph.
Extracting: agent.extract
- Purpose: Read information from the current page state and return it as strongly-typed data.
- Input: A label (string) and a Zod schema describing the expected shape.
- Output: A value validated by Zod (throws if it cannot be validated).
- Define precise schemas – the stricter the schema, the more reliable the extraction.
- When ambiguous, add context in the label (e.g., “price in USD before tax”).
Human handoff: human:request-help
When automation can’t proceed (CAPTCHA, MFA, ambiguous choices) or requires approval, ask for help. Use the built-in action namehuman:request-help in your instruction.
- The agent creates a help request attached to the current run.
- The run is paused and visible in the dashboard for a human to review and resolve.
- Once resolved, the run resumes and your action continues.
- You can resolve a help request via API if you’re building your own reviewer tooling.
- Endpoint:
PATCH /api/v2/browser-agents/:slug/runs/:browserAgentRunId/help-requests/:helpRequestId/resolution - Body:
{ "resolution": "<the instruction or decision>" }
Uploading files from an agent
Use the injecteduploadFile helper when you need to attach artifacts (screenshots, downloads, CSVs) to a run.
Best practices
- Be explicit in
actinstructions about targets and success criteria. - Use
extractwith strict Zod schemas to validate results. - Escalate early with
human:request-helpwhen blocked or when approvals are required. - Return small, structured objects from your action; avoid large blobs unless uploaded via
uploadFile. - Keep actions idempotent where possible so retries are safe.