productivity

How to Dealing with Mistakes

Topkapi Palace Museum

Every mistake we make can lower our morals and efficiency and affect our personal lives besides work life. Moreover, we may not want to write code for a long time because just as a person in a car accident is afraid of getting back into the car, we can naturally avoid writing code. So, if we cannot good manage this situation, we may lose our jobs in the long run.

Why we make mistake(s)

There are many reasons why we make mistakes while coding, some of them are:

We are humans and humans make mistakes. Even when they have the best intentions, humans are known to be unreliable. For example, one study of large internet services found that configuration errors by operators were the leading cause of outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages.
Maybe the project has many technical debts and as a result, these technical debts make things difficult.
Does not have enough tests in the project. Actually, the test is related to technical debts however I want to deal with it in more detail in a separate part. If the project has not had enough tests, you may be in big trouble. What’s worse is that you, think you have enough tests. I mean maybe the project has good coverage and every test pass but the test case of many tests is wrong. In case of this, you trust the test while coding but after deployment, you face the bitter truth.
Maybe the language in which the project is written complicates things.

How can we improve this situation?

How to avoid making mistake(s)

We can always make mistakes, that's okay, after all, we learn by making mistakes. In fact, by making mistakes we avoid making bigger mistakes in the future. Here are some of the items that I learned by making mistakes and that useful for me to avoid making mistakes.

Good nutrition, good sleep, good ventilation, etc.
Test thoroughly at all levels making sure the tests actually cover the right cases, from unit tests to whole-system integration tests and manual tests. Automated testing is widely used, well understood, and especially valuable for covering corner cases that rarely arise in normal operation.
Design systems in a way that minimizes opportunities for error. For example, well-designed abstractions, APIs, and admin interfaces make it easy to do “the right thing” and discourage “the wrong thing.” However, if the interfaces are too restrictive people will work around them, negating their benefit, so this is a tricky balance to get right.
Have constantly updated documentation.
Good selection of the programing language in which the project will be written. You might think what relevance, but the language we use can minimize our chances of making mistakes without making things too difficult. See: Rust.
Pair programming.
Good code review process.
Good communication.
Good knowledge of the domain.

We will make many mistakes in many parts of our career, this is for sure, so what we do after we make a mistake becomes very important.

How to manage the process after making a mistake(s)

If the process after making a mistake cannot be managed well, the person who made the mistake and then all the users and therefore the company can be affected badly.

That's why we have to be much more careful at this stage, but no matter how careful we are, we may feel pressured by users or in-house developers. When will it be solved, is it solved, this should be solved urgently, still not solved, etc. we may panic and make more mistakes in the face of such legitimate questions, and even worse, this situation may turn into a trauma.

We need to take some actions to get rid of such pressures, here are some of them:

First of all, make sure you understand the problem correctly, otherwise, you may encounter much more trouble. You need to set up detailed and clear monitoring, such as performance metrics and error rates. Thus, you can use these invaluable metrics when diagnosing the issue.
To solve the problem stress-free, if the problem can be solved with a configuration without the need to re-deployment, do this. Thus, a minimum number of people will be affected by the problem and you will have enough time to solve the problem.
If you have a roll back system, replace the application with the old instance. Thus, a minimum number of people will be affected by the problem and you will have enough time to solve the problem.
If it will take time to resolve the issue, notify all potential users and in-house developers who will be affected, even if you think they won't notice the issue. Thus, you indicate that you are aware of the problem and that work is being done on it, which reduces future pressures. In addition, it should not be forgotten to inform again when the problem is resolved.

If you think that the tests related to the subject are insufficient before sending the pull request that solves the problem, you should definitely write a test. Because you may also have made more than one mistake in the pull request that is possibly causing the issue. Moreover, in the worst-case scenario, you could cause bigger problems with the pull request that you thought would fix the problem. We may think that you do not have enough time to test to avoid such situations, but this is absolutely not true. If you let the stakeholders know that you need extra time, I think they will be appreciated it.

The pull request was reviewed and merged, and the application was deployed. As a result, you are relieved, but, you need to make sure that the problem is really solved. Moreover, learn from your mistakes and allow quick and easy recovery from human errors, to minimize the impact in the case of a failure. For example, make it fast to roll back configuration changes, and roll out new code gradually (so that any unexpected bugs affect only a small subset of users).

In this article, I wanted to edit the notes I made for myself and share them with you, I hope it will guide you in dealing with mistakes.