Skip to main content

Google BigQuery

Server Source code Package

Server-side event streaming to Google BigQuery via the Storage Write API for low-latency analytics, machine learning workloads, and data warehousing. The @walkeros/server-destination-gcp package also ships destinationPubSub for publishing events to Pub/Sub topics; this page covers BigQuery only.

Where this fits

GCP BigQuery is a server destination in the walkerOS flow:

Streams events to Google BigQuery for data warehousing, analytics dashboards, and machine learning workloads.

Installation

Loading...
Loading...

Configuration

This destination uses the standard destination config wrapper (consent, data, env, id, ...). For the shared fields see destination configuration. Package-specific fields live under config.settings and are listed below.

Settings

PropertyTypeDescriptionMore
clientanyGoogle Cloud BigQuery client instance
projectId*stringGoogle Cloud Project ID
datasetIdstringBigQuery dataset ID where events will be stored
tableIdstringBigQuery table ID for event storage
locationstringGeographic location for the BigQuery dataset
bigqueryanyAdditional BigQuery client configuration options
* Required fields

Mapping

This package does not define custom rule-level settings. For the standard rule fields (consent, condition, data, batch, name, policy) see mapping.

Examples

Page view

A page view is appended as one row through the BigQuery Storage Write API JSONWriter. Nested objects/arrays in data, source, etc. are JSON-stringified by eventToRow.

Event
Out

Purchase

An order event is appended as a single row through JSONWriter.appendRows. The entire nested data object (including arrays like items) is JSON-stringified into the data column via eventToRow().

Event
Out

Prerequisites

Setup lifecycle

Provision the dataset and table once per environment with the CLI:

Loading...

Output: a narrated setup: ok destination.bigquery line. Add --json to also emit a structured envelope reporting { datasetCreated, tableCreated } for jq piping. The command is idempotent, safe to re-run.

config.setup controls provisioning:

  • omitted or false: narrated skip, no provisioning. Operator runs setup explicitly to provision.
  • true: provision with the defaults below.
  • object matching the Setup interface: provision with the declared overrides.

See the Setup interface in the package for the full shape.

Defaults

FieldValue
datasetIdwalkerOS (note capital O, S)
tableIdevents
locationEU
storageBillingModelPHYSICAL (cheaper for compressible JSON)
PartitioningDay partitioning on timestamp
Clustering(name, entity, action)
Cost optimization

Physical storage billing charges based on compressed size. Day partitioning and the (name, entity, action) clustering reduce scan costs for typical analytics queries. Always include a timestamp filter.

Drift handling

If the existing table's partitioning, clustering, or schema differs from the declared configuration, setup logs WARN setup.drift {...} and continues. There is no auto-mutation. Migrations are an operator decision.

GCP setup

Enable BigQuery API

Loading...

Create service accounts

The provisioning step (setup) and the runtime push path need different permissions. We recommend separating them.

Operator (setup) service account, used by walkeros setup:

  • bigquery.datasets.create
  • bigquery.tables.create
  • bigquery.datasets.get (for drift detection)
  • bigquery.tables.get

Runtime service account, used by the running flow:

  • bigquery.tables.updateData (Storage Write API append)
Loading...

Authentication

For environments where you need explicit credentials (Docker containers, external platforms):

Loading...

Set the environment variable to use the key:

Loading...
caution

Keep key files secure. Never commit them to version control or include in public Docker images.

Inline credentials

Instead of relying on GOOGLE_APPLICATION_CREDENTIALS, you can pass auth options inline via settings.bigquery (for example keyFilename or credentials). These apply to both the control plane (setup, metadata) and the data plane (Storage Write API ingestion). A pre-built settings.client authenticates the control plane only; supply settings.bigquery for the data plane to use non-ADC credentials.

Environment variables

VariableDescriptionDefault
GCP_PROJECT_IDYour GCP project IDRequired
BQ_DATASETBigQuery dataset namewalkerOS
BQ_TABLEBigQuery table nameevents
BQ_LOCATIONBigQuery dataset locationEU
GOOGLE_APPLICATION_CREDENTIALSPath to service account keyRequired (unless using Workload Identity)

Storage Write API (data plane)

The destination uses BigQuery's Storage Write API for data ingestion. This replaces the legacy tabledata.insertAll path.

  • Cost: $25/TB after the 2 TiB/month free tier (vs ~$50/TB for the legacy path). Most low-volume deployments fit entirely in the free tier.
  • Batching: pushBatch is implemented. Set the collector's batch: <ms> mapping setting to flush all events in a window as a single appendRows call.
EXPERIMENTAL SDK

The upstream @google-cloud/bigquery-storage package self-marks as EXPERIMENTAL (subject to change). Pinned at ^5.1.0.

Default table schema

The default 15-column schema follows the walkerOS Event v4 canonical order. Object fields use the native JSON BigQuery type. Only name is REQUIRED; all other columns are NULLABLE for resilience against partial events.

ColumnTypeMode
nameSTRINGREQUIRED
dataJSONNULLABLE
contextJSONNULLABLE
globalsJSONNULLABLE
customJSONNULLABLE
userJSONNULLABLE
nestedJSONNULLABLE
consentJSONNULLABLE
idSTRINGNULLABLE
triggerSTRINGNULLABLE
entitySTRINGNULLABLE
actionSTRINGNULLABLE
timestampTIMESTAMPNULLABLE
timingINT64NULLABLE
sourceJSONNULLABLE

There is no createdAt column. Use timestamp (event time) for partition filters.

Query optimization

Partitioning by day on timestamp and clustering on (name, entity, action) reduces scan costs for typical analytics queries. Always include a timestamp filter.

Custom schema mapping

You can send a custom schema by using the data configuration to map specific fields. This is useful when you only need a subset of the event data.

Example: simple schema

This example sends only name, id, data, and timestamp:

Loading...

With the corresponding simpler table:

Loading...

Cleanup

To remove BigQuery resources:

  • Delete the BigQuery dataset
  • Remove service account IAM bindings from the dataset
  • Delete the service account
  • Remove any downloaded key files

Pub/Sub

The same @walkeros/server-destination-gcp package also exports destinationPubSub for publishing events to a Pub/Sub topic. See the Pub/Sub destination page for full settings, mapping, ordering, attributes, setup, and authentication reference.

💡 Need implementation support?
elbwalker offers hands-on support: setup review, measurement planning, destination mapping, and live troubleshooting. Book a 2-hour session (€399)