Metrivo
Docs/AI Crawler Tracking

AI Crawler Tracking

Most AI and search crawlers do not execute JavaScript, so the browser tracking script cannot see them. Ship server-side request logs to Metrivo to get the full picture in the AI Crawlers dashboard.

Tracked crawlers

Metrivo classifies these user agents. Any other User-Agent is dropped on ingest:

  • OAI-SearchBot
  • ChatGPT-User
  • GPTBot
  • Claude-SearchBot
  • Claude-User
  • ClaudeBot
  • PerplexityBot
  • Googlebot
  • Bingbot

1. Create an API key with crawlers:write scope

In Settings → API Keys, create a new key with the crawlers:write scope. Scope the key to a single website if you can — that prevents accidental cross-website writes.

The key is workspace- or website-scoped at validation time; the endpoint rejects writes to any website outside the key's workspace.

2. POST request batches to /api/crawlers/log

Send up to 500 items per request. Each item is a single crawler hit.

curl -X POST https://www.metrivo.co/api/crawlers/log \
  -H "Authorization: Bearer <API_KEY_WITH_crawlers:write_SCOPE>" \
  -H "Content-Type: application/json" \
  -d '{
    "websiteId": "<WEBSITE_UUID>",
    "items": [
      {
        "userAgent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
        "path": "/pricing",
        "statusCode": 200,
        "accessState": "allowed",
        "occurredAt": "2026-05-21T14:32:18.000Z"
      }
    ]
  }'

Required fields per item

  • userAgent — raw User-Agent header (max 2048 chars). Items not matching the tracked-bot list are silently dropped.

Optional fields

  • path — pathname only. If you send url instead, Metrivo extracts the path.
  • statusCode — HTTP status code returned to the crawler. NULL means "Unknown" in the dashboard.
  • accessState — "allowed" or "blocked". If your origin returned 403 or the crawler was blocked by your WAF, set "blocked". Otherwise omit it — Metrivo will show "Unknown" rather than guess.
  • occurredAt — ISO-8601 timestamp. Defaults to ingestion time.

3. Privacy and tenant safety

  • Strip IPs, cookies, query strings, and any user-identifying headers before forwarding. Metrivo stores only the User-Agent, path, status code, access state, and timestamp.
  • The endpoint enforces that the API key's workspace owns the target websiteId. Cross-workspace writes are rejected with 403.
  • If the API key is scoped to a single website, the websiteId field is optional — the endpoint uses the key's scope.

4. Integration examples

Forward from an Nginx / Cloudflare / Vercel access log, or from a small middleware in your app:

# Inside your access log post-processor (or a small forwarder):
# 1. Filter for AI/search bot user agents from your access log.
# 2. POST batches of up to 500 items to /api/crawlers/log.
# 3. Drop or hash any IP / cookie data before forwarding.

5. What Metrivo will not claim

  • If you omit statusCode or accessState, the dashboard shows "Unknown". We never fabricate a value.
  • We do not infer that a missing log entry means a crawler did not visit — it might mean the log was not forwarded.
  • We do not deduplicate identical hits across multiple POSTs. Send each hit exactly once.