Skip to content

An Introduction to debugging OKD release artifacts

by Denis Moiseev and Michael McCune

During the course of installing, operating, and maintaining an OKD cluster it is natural for users to come across strange behaviors and failures that are difficult to understand. As Red Hat engineers working on OpenShift, we have many tools at our disposal to research cluster failures and to report our findings to our colleagues. We would like to share some of our experiences, techniques, and tools with the wider OKD community in the hopes of inspiring others to investigate these areas.

As part of our daily activities we spend a significant amount of time investigating bugs, and also failures in our release images and testing systems. As you might imagine, to accomplish this task we use many tools and pieces of tribal knowledge to understand not only the failures themselves, but the complexity of the build and testing infrastructures. As Kubernetes and OpenShift have grown, there has always been an organic growth of tooling and testing that helps to support and drive the development process forward. To fully understand the depths of these processes is to be actively following what is happening with the development cycle. This is not always easy for users who are also focused on delivering high quality service through their clusters.

On 2 September, 2022, we had the opportunity to record a video of ourselves diving into the OKD release artifacts to show how we investigate failures in the continuous integration release pipeline. In this video we walk through the process of finding a failing release test, examining the Prow console, and then exploring the results that we find. We explain what these artifacts mean, how to further research failures that are found, and share some other web-based tools that you can use to find similar failures, understand the testing workflow, and ultimately share your findings through a bug report.

To accompany the video, here are some of the links that we explore and related content:

Finally, if you do find bugs or would like report strange behavior in your clusters, remember to visit issues.redhat.com and use the project OCPBUGS.