A way to DevOps - About failure culture and the five whys

Link to full Post - Estimated reading time: 7 minutes.

After my talks I am regularly asked how we managed it to tear down the wall between Operations and Development. In this post I will try to explain how we were able to implement the DevOps culture in the way we work.

One of the first questions I’m asked again and again is: How did you start the DevOps move and how can we do it? Well, when we started about a year and a half ago, we had already started using Docker in the operating environment, without our colleagues in the development department knowing it and without our senior management knowing it too. You may ask why, but the answer is simple: because it works. But I will explain this a little bit more in detail.

In my opinion a lot of people I know and who I met over time are most of the time really scared about changes. This is a syndrome which I observe since I have started to work in the information technology and this might have something to do with the European culture, more exactly with the failure culture of us. An example: You wrote an article with maybe 500-600 words at school, you made 10 mistakes and the grade you have got was an five. So you didn’t pass the test. You made ten mistakes in 500 words - thats only two percent but your teacher and later your principals will blame you for your mistakes! But what I would like to say is, just look about what you were able to achieve - you wrote 490 correct words, you wrote 98 percent correct words! You made a great job!

Most of the time, only failures are taken into account, nearly never the successful things! If you are always only measured by the amount of mistakes you make most of the time of your life, you will loose the attitude to see changes as a chance not as a risk!

The sad thing about this is, that a lot of people on the one hand are saying that mankind is learning and evolving through making mistakes but on the other hand it is simply not allowed to make mistakes because you get blamed for them. Can you see the contradiction? In my team we try to have an open mind for failures and try not to blame each other. That is not always an easy task for sure! It is an always ongoing learning process that never ends, especially if your are implementing a DevOps culture - the first thing you have to establish is a positive failure culture!

Without a positive failure culture it is impossible to work the DevOps way!

Thats why we have chosen to start the change inside my team. We had already established a positive way how to handle failures and mistakes and we were already managing the containers for the developers (but not with Docker). Therefore, at first nobody knews that we were alread on the way to change a lot of things in the background because our pace was still the same and our average failure rate didn’t change. In addition we knew that we can handle the blame if something goes wrong an we also knew that we get help from our friends on the internet if we would be in the need of. But this was just the beginning, because at this point the developers are completely missing in this progress and therefore we cannot call ourself DevOps. If you would like to understand what it does mean to have a good failure culture I would recommend to read through the following Tweet and follow up the Goolge document which was posted by GitLab.

If you would like to make a bigger change, you may have to be a little bit lucky and you must be able to see the signs of time coming. This was true in our case either. Back then the developers were going to change the setup of an upcoming project to a more direct driven agile style. That was the best chance we got to change our way of work together. The deal was simple: We, as operators would provide a Docker ready environment to allow fast development and changes and as trade-off we would like to go all-in to use the whole range of CI/CD possibilities including continuous deployment. But much more important, we would also accept our mutual failures! This was the beginning of our DevOps working style which evolved to work in a GitLab manner today. I would like to quote the GitLab post mortem issue template here as it sums up our respect between each other. This is not a commercial for GitLab but I like their spirit.

A root cause can never be a person, the way of writing has to refer to the system and the context rather than the specific actors. [by GitLab]

This failure culture is important as you will have problems and you will have outages too. You have to be sure that developers and operators can handle the situation without blaming each other. In our case we knew that we will make some belly landings on sandpaper and we made a commitment that we not fight each other when it happens, instead we help each other. But this is easily said and when the first problem arised, we had to calm down and thought about what we have said beforehand.

There are some things which helped us to manage such situations. I read the book The phoenix project, the DevOps bible, and a took the idea how to make the work we do transparent from it. The four types of work are essential! We started to document the four types of work as issues in a digital Kanban board. That helped us to see what’s going on and to focus on the task which must be made first. And second we started to use the five why methond. This method is by the way not the best one you can use, but it is simple and it helped us to focus on the problem and not on the person during an outage or an issue (see above).

After some time went by, we decided to inform the senior management about our success. Due to the continuous workflow with GitOps, the transparent work types in the Kanban board and of course due to the five whys which shows how we try to find the core issues of problems, we were able to bring in the evidence that we work faster, more flexible and more reliable. Furthermore and that is the key to all of the DevOps culture in my opinion, we were able to show that DevOps is a continuous process to.

DevOps is not a one shot wonder. If you start the process, you should know that you can never finish it because it is a cycle! It will change the style of your work ever and ever again.


How to DevOps?

  • Establish a positive failure culture in your team (if you haven’t already)
  • If you can, start the technical change inside your own team
  • Hold you eyes open for a good chance to reach out to other teams (see the signs of time coming)
  • Find people with the same spirit and create a commitment upon failure culture
  • Make your work transparent (four types of work)
  • Keep the method of the five whys in mind
  • Be open to change everything at anytime

Thats all. I hope that you find this post helpful. If it is, let me know at Twitter (@m4r10k). Thanks!

Nifty tech tag lists from Wouter Beeftink