AI Agents

Give Your AI Agent Eyes

Vision models understand pages better than scraped text. One API call turns any URL into an image your agent can reason about.

Agents that work with the web keep hitting the same wall: HTML is noisy, semantic structure lies, and the thing that matters — what the page looks like — never reaches the model. Passing a rendered screenshot to a vision model (Claude, GPT-4o, Gemini) routinely beats text extraction for layout questions, design review, content verification, and 'is this page broken?' checks.

Screenshotty is agent-friendly by design: a single HTTP call, JSON responses with hosted image URLs, ad/banner blocking for clean inputs — and an MCP server so coding agents like Claude Code and Cursor can capture pages as a native tool.

Screenshot → vision model

import anthropic, base64, requests

shot = requests.get(
    "https://api.screenshotty.link/api/v1/screenshot",
    params={"url": "https://example.com", "adblock": True},
    headers={"X-Api-Key": "YOUR_API_KEY"},
).content

client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {
                "type": "base64", "media_type": "image/png",
                "data": base64.b64encode(shot).decode()}},
            {"type": "text", "text": "Review this landing page's hierarchy and CTA clarity."},
        ],
    }],
)

Everything You Need, Built In

MCP server

Install the Screenshotty MCP server and your agent gets take_screenshot as a tool — see the integration guide.

Vision-ready output

Clean captures (ads and consent banners stripped) make vision-model inputs less noisy.

Deterministic and fast

ready_event control means the agent sees the loaded page, not a loading spinner.

Scales with the loop

Agents iterate; pay-as-you-go overage means an enthusiastic agent never hard-stops mid-task.

Frequently Asked Questions

How do I connect this to Claude Code or Cursor?

Install the Screenshotty MCP server (see /integrations/mcp) with your API key in the environment. The agent then has a take_screenshot tool it can call with a URL and options — no glue code.

Why not have the agent run its own headless browser?

Browser automation inside an agent loop is slow, memory-heavy, and flaky — and a misbehaving page can wedge the whole loop. A capture API keeps the agent's tool call simple, fast, and stateless.

Is there a free tier to try this?

Yes. 100 screenshots per month free, no credit card required. Paid plans start at $9/month for 2,500 screenshots with $0.004 pay-as-you-go overage.

Image or markdown — which input should I give the model?

Both have a place: screenshots for layout/visual questions, markdown extraction for long-text reasoning. Many pipelines send both — our text-extractor tool shows the markdown side.

Start Capturing in Minutes

100 free screenshots per month. No credit card required.

import anthropic, base64, requests

shot = requests.get(

"https://api.screenshotty.link/api/v1/screenshot",

params={"url": "https://example.com", "adblock": True},

headers={"X-Api-Key": "YOUR_API_KEY"},

).content

client = anthropic.Anthropic()

message = client.messages.create(

model="claude-sonnet-4-6",

max_tokens=1024,

messages=[{

"role": "user",

"content": [

{"type": "image", "source": {

"type": "base64", "media_type": "image/png",

"data": base64.b64encode(shot).decode()}},

{"type": "text", "text": "Review this landing page's hierarchy and CTA clarity."},

}],

)