Scaling AI for Retail: How Trigo Accelerates Builds with Bazel and BuildBuddy

About Trigo

Trigo’s AI-powered Loss Prevention solution connects to retailers’ CCTV and point-of-sale systems to detect and prevent losses from both accidental errors and theft. When a potential loss event occurs, Trigo provides real-time alerts with supporting video evidence, enabling shoppers to immediately correct the issue or allowing staff and security to intervene if needed.

Our Monorepo

Supporting these real-time capabilities is a complex engineering platform that powers both our research and production at scale.

Our core work happens in a single Bazel monorepo that has grown to roughly 150 shared libraries and nearly 50 production services. It spans research code, on-prem production systems running in stores, and cloud services that support and operate them. Research teams, platform engineers, and product teams all contribute to the same repository and rely on the same build, test, and dependency infrastructure.

That scale has a direct impact on our build infrastructure. We need one system that can handle a complex dependency graph for ML research, high-performance on-prem systems, and high-availability cloud services with the same consistency and speed, while still keeping local and CI build/test times short for our developers.

It also needs to handle multi-language workflows cleanly. In practice, that means cases like Python calling into Rust still build, test, shard, and execute remotely as smoothly as any single-language target.

Our Bazel Journey

Before reaching our current Bazel setup, we went through several build systems as the repository, and our expectations, grew.

We started with a plain Dockerfile and later moved to Earthly for better modularity. However, after a major caching bug caused builds to silently reuse stale artifacts, and even poison our CI, we switched to Nix. Nix provided a strong cache across both CI and local builds, but we eventually moved on after noticing a hit to developer velocity. It’s powerful, but not for every team.

In the end, we landed on Bazel, which gives us hermetic builds and consistent caching while still using a familiar DSL (Starlark is a subset of Python).

How We Landed on BuildBuddy

Our CI evolved in lockstep with our build system. In the Dockerfile/Earthly days, CI was mostly about reliably building and shipping containers, so we ran self-hosted GitHub Actions runners on VMs and optimized around image layers and incremental rebuilds.

Once we committed to Bazel, CI shifted from “build an image” to “build and test a graph.” We ran Bazel on GitHub Actions with a GCS-backed remote cache to keep validation fast and reproducible.

(Example of Trigo’s CI pipeline validating builds and tests using Bazel and BuildBuddy)

As the workload grew and we needed more predictable throughput, we moved to self-hosted GitHub ARC runners on GKE and adopted BuildBuddy for centralized caching, still fully self-hosted and deployed inside the same GKE cluster.

Running this stack at scale isn’t “set and forget.” It involves managing runner fleet capacity and scheduling, credentials and isolation, cache sizing and eviction, build artifact churn, upgrades, and the observability needed to debug flaky jobs or sudden cache-miss spikes. However, once in place, it provides a scalable CI setup where Bazel’s hermeticity remains intact while BuildBuddy makes caching a first-class part of the workflow.

BuildBuddy Cloud: Faster Builds, Happier Developers

Even with good caching, CI can still feel “fast, but not fast enough” when the bottleneck is Bazel startup and analysis on cold hosts. What pushed us toward a truly fast experience was leveraging two BuildBuddy Cloud capabilities that address this overhead directly.

BuildBuddy Workflows: Warm Bazel Hosts via Rolling microVM Snapshots

BuildBuddy Workflows keeps Bazel host runners warm by spinning them up from rolling microVM snapshots. In practice, this means the runner is already in a known-good state with the right dependencies and tooling, reducing environment setup and build startup time to milliseconds.

BuildBuddy RBE: Remote Build Execution at (Effectively) Infinite Scale

BuildBuddy RBE provides Bazel Remote Build Execution on an elastically scaled, BuildBuddy-hosted cluster, co-located with cache servers. This setup reduces end-to-end build time by aggressively parallelizing workloads while minimizing latency between executors and cached artifacts.

Dramatic Build Performance Improvements

The impact was significant. End-to-end (E2E) tests, which previously took between 40 minutes and an hour, were reduced to approximately 10 minutes. 

 Scaling AI for Retail: How Trigo Uses Bazel & BuildBuddy

This acceleration enables faster validation of complex changes and improves overall development velocity, critical when deploying systems that must operate reliably in live retail environments.

For day-to-day development workflows, the improvements are even more striking. Builds that previously took around 30 minutes can now complete in just a few seconds on a warm, cache-hit path. This shift has transformed the developer experience, making iteration cycles dramatically faster and more efficient.

To close the loop, Bazel’s hermeticity closely mirrors the principles behind Trigo’s Loss Prevention Vision AI platform. Just as Bazel ensures builds are reproducible and free from hidden state, our system creates a single, reliable source of truth for in-store activity—eliminating ambiguity around loss events through real-time, verifiable signals from CCTV and point-of-sale data.  In both cases, hermiticity is about trust at scale: removing uncertainty, preventing silent failures, and ensuring every outcome is consistent, observable, and dependable.

Related Content

Retailer?

Let’s tailor Trigo for you. Smarter loss prevention,
smoother operations, better results

Partner?

Power your solutions with AI. Seamlessly integrate Trigo’s
technology and unlock new retail opportunities.
Scroll to Top