Troubleshooting Runbook¶
Help - Something isn't working correctly!
While not comprehensive for all scenarios, the steps below should help provide structure and guidance to getting the right information necessary to triage and support.
Building a Support Bundle¶
Gathering enough information of the environment is crucial. To that end, the following information is needed at a minimum:
- Currently deployed Server Image (like
2024.0712.153544-95268ef
) - Currently deployed UI Image (like
2024.07.25.4c646abef-fbf3b78
or2024.07.25.4c646abef-fbf3b78-subpath
) - Current client version like
2.19.x
.- This can differ between a local test environment, a buildtime environment, and a runtime environment
- Redacted values.yaml
- All sensitive items can be redacted, however, the general config is useful to understand the deployment configuration
UI Based Issues¶
If you're experiencing problematic behavior with the UI, please capture a .har session when loading the problematic page.
For best results, clear the session so there is no traffic, begin recording, then reload / navigate to the page.
This minimizes the number of network calls made, and obviates the erroring page.
Service Logs¶
While the UI might seem to be the problem, inevitably, it is only a front-end to make API calls on the backend.
It is usually possible to tell what service is problematic based on the URI, and the matching route in your Istio VirtualService.
For example, if logs
are failing to load, we might look at the logs service.
If a user is unable to login for any reason, we might consider the auth
service logs.
You can review the various services and their purpose here
First, review and save a copy of the logs from the currently running containers for that service.
As there can be multiple replicas running, either collect logs from all the containers for that service, OR, determine which one has errors in advance.
# #### will be randomized per container instance
kubectl logs -n prefect auth-#### > auth.log
If no errors are identified, we can modify the deployment to increase logging level with the following environment variables.
These should be set on the Kubernetes deployment - as a consequence, they either need to exist in advance, or applied allowing new pods to cycle in.
# Overrides the Prefect logging configuration
- name: PREFECT_CLOUD_DISABLE_LOGGING_CONFIG
value: 'true'
# Allows database queries to be displayed to stdout
- name: PREFECT_CLOUD_EVENTS_DATABASE_ECHO
value: 'true'
# Sets the logging level to debug
- name: PREFECT_CLOUD_LOGGING_LEVEL
value: 'DEBUG'
Some issues and services might have a chain of dependencies, and one error can cascade into another.
auth
is a common dependency for nebula
.
logs
is a common dependency for ladler
.
Engaging Support¶
With the information at hand, please engage your Self-Managed support representatives to assist and triage further if necessary.