Skip to main content

Bot detection

Server Source code Package

Annotates events with user.botScore (0-99, higher = more bot) and user.agentScore (0-99, higher = more AI agent). Optionally writes user.agentProduct (matched UA substring). Never drops events — destinations filter via mapping.

Installation

Loading...
Loading...

Configuration

This transformer uses the standard transformer config wrapper (consent, data, env, id, ...). For the shared fields see transformer configuration. Package-specific fields live under config.settings and are listed below.

Settings

PropertyTypeDescriptionMore
inputinputInput signal sources, resolved via getMappingValue against { event, ingest }. v1 only reads userAgent; other fields reserved for v1.1 header heuristics.
userAgentany | array
ipany | array
acceptLanguageany | array
acceptEncodingany | array
secFetchSiteany | array
secFetchModeany | array
secFetchDestany | array
secFetchUserany | array
secChUaany | array
secChUaMobileany | array
secChUaPlatformany | array
outputoutputOutput paths for bot/agent annotations.
botScorestringPath for bot score (0-99, higher = more bot). Default: "user.botScore". Use "ingest.*" to route to pipeline scratch instead of the event. Empty string or omit = skip.
agentScorestringPath for AI agent score (0-99). v1 emits 0 (no match) or 95 (UA-map match). Default: "user.agentScore".
agentProductstringPath for matched UA substring (e.g. "ChatGPT-User"). Off by default — set to enable.

Mapping

This package does not define custom rule-level settings. For the standard rule fields (consent, condition, data, batch, name, policy) see mapping.

Examples

ChatGPT-User (user-action AI)

A real human routed an AI to fetch this page. botScore high but lower than crawlers — agentProduct lets destinations keep this traffic.

Event
Out

GPTBot training crawler

OpenAI training crawler. Both botScore and agentScore are high.

Event
Out

Human visitor (Chrome)

Modern Chrome UA. No bot or agent signals.

Event
Out

Source prerequisite

The transformer reads userAgent from ctx.ingest (default path ingest.userAgent). The upstream server source must populate it via config.ingest. Without it, every event scores 70 (missing-UA baseline).

Loading...

Detection layers (v1)

  • isbot — catches curl, wget, python-requests, headless Chrome defaults, well-known crawlers.
  • Curated AI agent map — vendor self-declared UAs across OpenAI (GPTBot, ChatGPT-User, ChatGPT-Agent, OAI-SearchBot), Anthropic (ClaudeBot, Claude-User, Claude-SearchBot, Claude-Code, legacy anthropic-ai), Perplexity, Mistral, Meta (Meta-ExternalAgent, Meta-ExternalFetcher), Google (Google-CloudVertexBot, Google-Extended), Apple (Applebot-Extended), Amazon (Amazonbot), DuckDuckGo (DuckAssistBot), ByteDance (Bytespider), Common Crawl (CCBot).

Destination filtering recipes

Drop all bots: event.user.botScore > 50

Drop crawlers, keep user-action AI: event.user.botScore > 50 AND event.user.agentProduct NOT LIKE '%-User'

AI traffic report: event.user.agentScore > 50, grouped by event.user.agentProduct

Not in v1

Header consistency heuristics (Sec-Fetch / Sec-CH-UA / Accept-Language), ASN / datacenter-IP, reverse DNS verification, web-side runtime checks, behavioral signals, TLS / JA4. See the README's "Not in v1" section for the full roadmap.

Limits

Will not catch residential-proxy + stealth Chrome, CAPTCHA-solver farms, or real-browser-as-a-service. For that threat model use a commercial vendor (Cloudflare Bot Management, DataDome, HUMAN).

💡 Need implementation support?
elbwalker offers hands-on support: setup review, measurement planning, destination mapping, and live troubleshooting. Book a 2-hour session (€399)