Skip to content

Troubleshooting Runbook

Help - Something isn't working correctly!

While not comprehensive for all scenarios, the steps below should help provide structure and guidance to getting the right information necessary to triage and support.

Building a Support Bundle

Gathering enough information of the environment is crucial. To that end, the following information is needed at a minimum:

  • Currently deployed Server Image (like 2024.0712.153544-95268ef)
  • Currently deployed UI Image (like 2024.07.25.4c646abef-fbf3b78 or 2024.07.25.4c646abef-fbf3b78-subpath)
  • Current client version like 2.19.x.
    • This can differ between a local test environment, a buildtime environment, and a runtime environment
  • Redacted values.yaml
    • All sensitive items can be redacted, however, the general config is useful to understand the deployment configuration

UI Based Issues

If you're experiencing problematic behavior with the UI, please capture a .har session when loading the problematic page.

Capturing a HAR file

For best results, clear the session so there is no traffic, begin recording, then reload / navigate to the page.
This minimizes the number of network calls made, and obviates the erroring page.

Service Logs

While the UI might seem to be the problem, inevitably, it is only a front-end to make API calls on the backend.
It is usually possible to tell what service is problematic based on the URI, and the matching route in your Istio VirtualService.
For example, if logs are failing to load, we might look at the logs service.
If a user is unable to login for any reason, we might consider the auth service logs.

You can review the various services and their purpose here

First, review and save a copy of the logs from the currently running containers for that service.
As there can be multiple replicas running, either collect logs from all the containers for that service, OR, determine which one has errors in advance.

# #### will be randomized per container instance
kubectl logs -n prefect auth-#### > auth.log

If no errors are identified, we can modify the deployment to increase logging level with the following environment variables.
These should be set on the Kubernetes deployment - as a consequence, they either need to exist in advance, or applied allowing new pods to cycle in.

# Overrides the Prefect logging configuration
- name: PREFECT_CLOUD_DISABLE_LOGGING_CONFIG
  value: 'true'
# Allows database queries to be displayed to stdout
- name: PREFECT_CLOUD_EVENTS_DATABASE_ECHO
  value: 'true'
# Sets the logging level to debug
- name: PREFECT_CLOUD_LOGGING_LEVEL
  value: 'DEBUG'

Some issues and services might have a chain of dependencies, and one error can cascade into another.
auth is a common dependency for nebula. logs is a common dependency for ladler.

Engaging Support

With the information at hand, please engage your Self-Managed support representatives to assist and triage further if necessary.