Bring Your Own Models
Run Secbez with any LLM — managed providers, OpenAI-compatible endpoints, or your own open-source serving stack.
Enterprise deployments are not tied to any single LLM provider. Secbez has a structured contract for every reasoning step, and the routing layer can dispatch it to whatever model you point it at.
Provider integrations
Secbez integrates with the major hosted LLM providers and with the common open-source serving stacks. Exact wiring (keys, endpoints, regional details) is scoped per engagement.
| Provider | Notes |
|---|---|
| OpenAI | Direct API or any OpenAI-compatible endpoint |
| Azure OpenAI | Per-deployment routing supported |
| Anthropic | Claude family |
| AWS Bedrock | Bedrock-hosted model families |
| Google Vertex AI | Vertex-hosted model families |
| vLLM | Open-source serving |
| Text Generation Inference (TGI) | Open-source serving |
| Ollama | Convenient for single-node and developer deployments |
| llama.cpp / llamafile | CPU and Apple Silicon |
| TensorRT-LLM (Triton) | NVIDIA-optimized |
| Custom HTTP | Any internal model gateway with a documented contract |
Mixing hosted models for one part of the pipeline with local open-source for the rest is fully supported. Fallback chains are also supported, so a primary model that times out can fall back to a secondary without interrupting the scan.
Choosing a model
Secbez's reasoning contracts are structured — models must reliably produce JSON in the requested schema. We don't publish a fixed model recommendation list, because the right choice depends on your region, compliance constraints, hardware, and cost profile.
What we do instead: as part of an Enterprise engagement, we work with your team to select models that fit your environment, validate them against the Secbez evals harness (precision, recall, JSON-mode adherence, cost-per-confirmed-finding), and configure routing accordingly.
Latency, throughput, and degradation
- The scan pipeline is concurrent. Latency-per-call dominates only when one step is on the critical path.
- The model gateway implements per-step timeouts and fallback chains. If your primary model is unavailable, a secondary picks up; if all are unavailable, the scan degrades to deterministic fallback explanations and the gate decision is unaffected.
- Cost per confirmed finding is the metric we recommend tracking — not raw call count.
See BYO GPU for compute guidance.