Bot detection
Annotates events with user.botScore (0-99, higher = more bot) and user.agentScore (0-99, higher = more AI agent). Optionally writes user.agentProduct (matched UA substring). Never drops events — destinations filter via mapping.
Installation
- Integrated
- Bundled
Configuration
This transformer uses the standard transformer config wrapper (consent, data, env, id, ...). For the shared fields see transformer configuration. Package-specific fields live under config.settings and are listed below.
Settings
| Property | Type | Description | More |
|---|---|---|---|
input | input | Input signal sources, resolved via getMappingValue against { event, ingest }. v1 only reads userAgent; other fields reserved for v1.1 header heuristics. | |
userAgent | any | array | ||
ip | any | array | ||
acceptLanguage | any | array | ||
acceptEncoding | any | array | ||
secFetchSite | any | array | ||
secFetchMode | any | array | ||
secFetchDest | any | array | ||
secFetchUser | any | array | ||
secChUa | any | array | ||
secChUaMobile | any | array | ||
secChUaPlatform | any | array | ||
output | output | Output paths for bot/agent annotations. | |
botScore | string | Path for bot score (0-99, higher = more bot). Default: "user.botScore". Use "ingest.*" to route to pipeline scratch instead of the event. Empty string or omit = skip. | |
agentScore | string | Path for AI agent score (0-99). v1 emits 0 (no match) or 95 (UA-map match). Default: "user.agentScore". | |
agentProduct | string | Path for matched UA substring (e.g. "ChatGPT-User"). Off by default — set to enable. |
Mapping
This package does not define custom rule-level settings. For the standard rule fields (consent, condition, data, batch, name, policy) see mapping.
Examples
ChatGPT-User (user-action AI)
A real human routed an AI to fetch this page. botScore high but lower than crawlers — agentProduct lets destinations keep this traffic.
GPTBot training crawler
OpenAI training crawler. Both botScore and agentScore are high.
Human visitor (Chrome)
Modern Chrome UA. No bot or agent signals.
Source prerequisite
The transformer reads userAgent from ctx.ingest (default path ingest.userAgent). The upstream server source must populate it via config.ingest. Without it, every event scores 70 (missing-UA baseline).
Detection layers (v1)
- isbot — catches curl, wget, python-requests, headless Chrome defaults, well-known crawlers.
- Curated AI agent map — vendor self-declared UAs across OpenAI (GPTBot, ChatGPT-User, ChatGPT-Agent, OAI-SearchBot), Anthropic (ClaudeBot, Claude-User, Claude-SearchBot, Claude-Code, legacy anthropic-ai), Perplexity, Mistral, Meta (Meta-ExternalAgent, Meta-ExternalFetcher), Google (Google-CloudVertexBot, Google-Extended), Apple (Applebot-Extended), Amazon (Amazonbot), DuckDuckGo (DuckAssistBot), ByteDance (Bytespider), Common Crawl (CCBot).
Destination filtering recipes
Drop all bots: event.user.botScore > 50
Drop crawlers, keep user-action AI: event.user.botScore > 50 AND event.user.agentProduct NOT LIKE '%-User'
AI traffic report: event.user.agentScore > 50, grouped by event.user.agentProduct
Not in v1
Header consistency heuristics (Sec-Fetch / Sec-CH-UA / Accept-Language), ASN / datacenter-IP, reverse DNS verification, web-side runtime checks, behavioral signals, TLS / JA4. See the README's "Not in v1" section for the full roadmap.
Limits
Will not catch residential-proxy + stealth Chrome, CAPTCHA-solver farms, or real-browser-as-a-service. For that threat model use a commercial vendor (Cloudflare Bot Management, DataDome, HUMAN).