Infrastructure Services¶
For Self Managed Prefect, the following infrastructure requirements are necessary to be in place.
Load Balancer¶
A load balancer is used to facilitate high traffic throughput, multi-zone availability, SSL termination, and integrates with an ingress controller.
Either a Layer 4, or a Layer 7 load balancer can be utilized. The requirement for Prefect to function, is SSL termination, and host based path routing to the Traefik service mesh.
Bringing your own service mesh? That's OK! As long as you route traffic from the API / UI to the appropriate ingress resources with Traefik.
Why Traefik? What about Istio, or another service mesh? Due to the complexity and nature of the offering, we understand you might have different requirements in your environment. Using Traefik allows a much lighter footprint to handle authorization and service routing that does not require additional knowledge or support.
DNS¶
Two distinct Fully Qualified Domain Names are required for Self Managed Prefect.
api.<domain>
- Used for API interactions such as the Workers and clients.app.<domain>
- How you can access the UI via the web.
These domains will be used for the Common Names and/or Subject Alternate Names for SSL Certificates, in addition to the CLOUD_UI_URL
and CLOUD_API_URL
.
Using your existing DNS provider, DNS records should be created to route traffic - one each for the api
and app
sub-domains route traffic to the ingress.
SSL Termination¶
SSL Certificates are required to properly secure the client interactions with the API, even in a self-managed scenario.
There are a number of possible how-tos, Certificate Authorities, and vendors available to issue and sign certificates, so this section describes the requirements to implement.
Certificate Management¶
Certificates can be requested through a vendor such as DigiCert, or LetsEncrypt.
The Prefect API requires a publicly signed SSL certificate.
A self-signed certificate can be used, however, as the Prefect Client (the open-source Prefect package) is shipped with the certifi
package, it will not be trusted by default. To ensure maximum availability and support, a publicly signed certificate is recommended.
Certificate expiration should be generated in accordance with internal / security policies.
Kubernetes Cluster¶
Kubernetes is the core execution platform required for Prefect. Suggested node sizes should be 8 vCPU, and 16GiB of memory per node, with a minimum of 3 nodes.
Istio¶
Istio is responsible for managing the networking Service mesh and auth routes to route traffic properly. This includes traffic from the routed ingress, to the nebula
service, auth
service, ui
service, and server
service.
This is configured and deployed as part of the Prefect installation through Helm. No additional configuration should be necessary.
Redis¶
Redis is used to manage arq
or background worker tasks. There are several instances in place for various services to utilize, and facilitate asynchronous execution of operations that are not time sensitive.
Cache¶
The cache table is used primarily for operations that are naturally constant, and would be a significant performance loss to retrieve from database.
The auth
service makes heavy utilization of the cache table to determine which users are actively logged in, and which require re-authentication.
Work-Pools¶
A dedicated instance for push-pool background work. This instance is responsible for managing background tasks on push work-pools to separate core orchestration for all users to ones primarily utilizing push work-pools.
Triggers¶
A dedicated instance responsible for Automation triggers and background tasks.
PostgreSQL¶
PostgreSQL (and the various cloud services that offer it) is used to manage state for all the metadata occurring within the application. There are three databases in use to maintain the application:
Table | Purpose |
---|---|
Events | Logs and Actions |
Nebula | Authorization, RBAC, Workspaces, Accounts |
Server | Flows, Tasks, Deployments |
Events DB¶
Used to store events data. The primary purpose is to provide a performant datastore for recent events data, with most queries returning in under ~200ms.
Nebula DB¶
The Nebula database stores information about users, accounts, teams, etc. that are used in permissioning access to server data.
Both the nebula
and auth
services need access to the Nebula database.
Server DB¶
The server database stores information about flows, flow runs, task runs, etc.
The server database is accessed by the server
service, as well as many other background services.
This data is separated into “workspaces” using a workspace_id
column.