Skip to main content

Overview

Browser Agents let you automate real browsers using Playwright. Your action receives a Browser, BrowserContext, and Page along with a high-level agent that can:
  • Execute natural-language instructions with agent.act(...)
  • Extract structured data with agent.extract(label, schema) using Zod
  • Escalate to a human with the human:request-help action
This makes it easy to script multi-step interactions, read the page, and hand off when blocked or when approval is required.

Anatomy of a Browser Agent action

Define a Zod schema for inputs, then export a default async function. You get browser, context, page, variables, agent, and an uploadFile helper.
import type { Browser, BrowserContext, Page } from "playwright";
import { z } from "zod";
import type { BrowserAgent } from "magnitude-core";

export const variablesSchema = z.object({
  url: z.string().url(),
});

const action = async ({
  browser,
  page,
  context,
  variables,
  agent,
  uploadFile,
}: {
  browser: Browser;
  page: Page;
  context: BrowserContext;
  variables: z.infer<typeof variablesSchema>;
  agent: BrowserAgent;
  uploadFile: (args: {
    source:
      | { type: "buffer"; data: Buffer | Uint8Array | string }
      | { type: "path"; path: string };
    fileName?: string;
  }) => Promise<Record<string, unknown>>;
}) => {
  // Navigate (optional – you can also ask the agent to navigate)
  await page.goto(variables.url);

  // Instruct the agent to perform high-level actions
  await agent.act(
    'Search for "The Great Gatsby" and open the first result',
  );

  // Extract a typed value from the current page
  const price = await agent.extract("price", z.number());

  // Optionally upload a file (buffer or path)
  // const uploaded = await uploadFile({
  //   source: { type: "path", path: "/tmp/screenshot.png" },
  //   fileName: "screenshot.png",
  // });

  return { price };
};

export default action;

Acting: agent.act

  • Purpose: Execute multi-step, natural-language instructions that the agent translates into concrete Playwright interactions (clicks, typing, navigation).
  • Input: A single instruction string. Be explicit about goals, key UI elements, and success conditions.
  • Behavior: The agent plans and performs actions on the provided page, emitting events as it goes.
Examples:
await agent.act("Go to amazon.com, search for 'wireless mouse', and open the first result");
await agent.act("Log in using the credentials in the header menu, then verify the dashboard is visible");
Tips:
  • State end conditions (e.g., “until a price is visible”).
  • Name critical UI elements if known (labels, roles, placeholders).
  • Prefer short, sequential instructions over one very long paragraph.

Extracting: agent.extract

  • Purpose: Read information from the current page state and return it as strongly-typed data.
  • Input: A label (string) and a Zod schema describing the expected shape.
  • Output: A value validated by Zod (throws if it cannot be validated).
Examples:
// Single value
const price = await agent.extract("price", z.number());

// Structured object
const product = await agent.extract(
  "product",
  z.object({
    title: z.string(),
    price: z.number(),
    availability: z.string().optional(),
  }),
);
Tips:
  • Define precise schemas – the stricter the schema, the more reliable the extraction.
  • When ambiguous, add context in the label (e.g., “price in USD before tax”).

Human handoff: human:request-help

When automation can’t proceed (CAPTCHA, MFA, ambiguous choices) or requires approval, ask for help. Use the built-in action name human:request-help in your instruction.
await agent.act(
  'Use the human:request-help action to ask: "Which edition should I select? Hardcover, Paperback, or Kindle?"',
);
What happens:
  • The agent creates a help request attached to the current run.
  • The run is paused and visible in the dashboard for a human to review and resolve.
  • Once resolved, the run resumes and your action continues.
Programmatic resolution (optional):
  • You can resolve a help request via API if you’re building your own reviewer tooling.
  • Endpoint: PATCH /api/v2/browser-agents/:slug/runs/:browserAgentRunId/help-requests/:helpRequestId/resolution
  • Body: { "resolution": "<the instruction or decision>" }

Uploading files from an agent

Use the injected uploadFile helper when you need to attach artifacts (screenshots, downloads, CSVs) to a run.
// From a local path on the agent host
const file1 = await uploadFile({
  source: { type: "path", path: "/tmp/report.csv" },
  fileName: "report.csv",
});

// From an in-memory buffer (e.g., a generated PDF)
const file2 = await uploadFile({
  source: { type: "buffer", data: myBuffer },
  fileName: "invoice.pdf",
});
The return value includes identifiers and metadata you can persist or return from the action.

Best practices

  • Be explicit in act instructions about targets and success criteria.
  • Use extract with strict Zod schemas to validate results.
  • Escalate early with human:request-help when blocked or when approvals are required.
  • Return small, structured objects from your action; avoid large blobs unless uploaded via uploadFile.
  • Keep actions idempotent where possible so retries are safe.

Minimal example

import type { Browser, BrowserContext, Page } from "playwright";
import { z } from "zod";
import type { BrowserAgent } from "magnitude-core";

export const variablesSchema = z.object({
  url: z.string().url(),
});

const action = async ({ page, variables, agent }: {
  page: Page;
  variables: z.infer<typeof variablesSchema>;
  agent: BrowserAgent;
}) => {
  await page.goto(variables.url);
  await agent.act('Search for "The Great Gatsby" and open the first result');
  const price = await agent.extract("price", z.number());
  return { price };
};

export default action;
I