Skip to main contentKumo can run fully inside your Virtual Private Cloud (VPC/VNet) when you need tight control over identity, network paths, and dependencies. This option keeps the Kumo control plane, training stack, and storage in your cloud account while giving you choices about how much (if any) managed connectivity you allow back to Kumo. This deployment option is only available for Kumo Enterprise Deployments.
This page is written for heads of security and infrastructure who want to understand how the VPC deployment behaves technically—what runs in your boundary, what external services (if any) are required, and what your team must provision to install and operate it.
1. Architecture and isolation
Each customer receives a dedicated Kumo environment that lives inside their own cloud network:
- Control plane (UI, API, scheduler) and data plane services run in your Kubernetes cluster (EKS, AKS, or GKE) using customer-owned compute and storage.
- Model training and graph processing use autoscaled node pools: GPU nodes for training and high-memory nodes for the graph engine.
- Object storage in your account (S3/GCS/Blob) holds intermediate data, model artifacts, and application state.
- Container images can come from a Kumo-managed registry (via allowlisted pull paths or Private Link/Private Service Connect) or from a customer registry that you mirror. Air-gapped installs use an offline image bundle you import yourself.
No customer data or metadata leaves your environment unless you explicitly permit optional managed services (telemetry, orchestration).
2. Identity and console access
Access to the Kumo console and APIs is integrated with your identity provider:
- SSO via SAML or OIDC is required; MFA and device posture are enforced by your IdP.
- Role-based access is managed in Kumo but sourced from your IdP groups; SCIM/JIT provisioning is available if you want automatic user lifecycle management.
- Administrative access for Kumo support can be provided through your VDI or vendor laptops during onboarding and optionally removed afterward.
3. Data flow and protection
The design principle is simple: your primary data stays in your platforms.
- Kumo connects to your warehouses/lakes (Snowflake, Databricks, BigQuery, S3, others) using least-privilege service accounts that you own. Connector guides specify the exact permissions.
- Training and scoring run inside your VPC/VNet; outputs are written back to destinations you configure (warehouse tables, object storage, or application sinks).
- All data in transit is protected with TLS. Secrets and model artifacts are encrypted at rest in your storage and KMS configuration. Direct file upload can be disabled if you want data to originate only from your own systems.
4. Connectivity options (air-gapped to hybrid)
You can choose the egress model that matches your security posture:
- Fully air-gapped: No outbound internet access. You host the container registry, logging/metrics stack, and orchestration (e.g., Temporal) yourself. Egress may be required to the authentication provider (eg. Auth0/Okta/PingFed/etc).
- Private Link / Private Service Connect: The Kumo control plane is hosted outside of your VPC, connected via PrivateLink, while the data plane (Spark & GPU compute) remains within your VPC. This deployment is easier to maintain, while maintianing all data processing within your network boundary.
The connectivity choice does not change where data is processed; it only affects who hosts supporting services and how updates/telemetry flow.
5. Installation flow and customer responsibilities
The deployment follows an infrastructure-as-code approach (Terraform/Helm). Typical prerequisites:
- A Kubernetes cluster with autoscaling GPU and high-memory node pools, container runtime, and ingress controller. Admin access (or a delegated CI role) to install Kumo namespaces and controllers.
- An object storage bucket/prefix for intermediate data, artifacts, and application state, plus a KMS key for encryption.
- A container registry path (customer- or Kumo-managed) reachable from the cluster based on your chosen connectivity model.
- Network rules to allow console access from your corporate network/VDI and to allow or block outbound paths per the chosen egress model.
- Service accounts and roles in each data platform you plan to connect (see connector guides for exact permissions).
Install steps are typically:
- Provision the cluster/node pools, storage locations, and registry access.
- Install Kumo via Terraform/Helm into a dedicated namespace with autoscaling profiles for training and graph workloads.
- Configure SSO, network policies, and logging/metrics destinations (customer or Kumo-managed).
- Smoke test data ingestion → training → scoring with your data to confirm end-to-end paths.
6. Operations and lifecycle
- Upgrades are coordinated through your change management process; air-gapped installs require manual deplyment, while Private Link/controlled egress can pull signed images from Kumo automatically.
- Observability can be pointed to your stack (Splunk, Datadog, CloudWatch/Log Analytics/Cloud Logging) or to Kumo-managed Grafana/Tempo endpoints via the approved connectivity model.
- Support can be delivered through VDI/vendor laptop access; no persistent Kumo access is required.
If you want to evaluate the Virtual Private Cloud deployment, your Kumo representative can map the connectivity model (air-gapped vs. Private Link) to your policies and share a tailored bill of materials for your cloud of choice.