Refactoring

Last modified by chrisby on 2024/01/17 21:43

Refactoring means improving the structure of code without changing its behavior.

Tips and Principles

  • The goal of refactoring is to get clean code. Before you dive into refactoring, you should learn the characteristics of clean code so that you can distinguish bad code from clean code, and know what it takes to convert the former into the latter.
    • The ultimate goal is to speed up development. It is a developer's responsibility to decide whether a particular refactoring is worth the time or not. It is his duty to spend his employee's time economically. The usual tendency is that there are too few refactorings rather than too many, in most cases you should do a refactoring.
  • Refactoring until the code is clean is mandatory. The earlier the refactoring is done, the greater the long-term benefit.
  • Refactoring requires extensive tests.
    • The cost of refactoring is the risk of introducing new bugs, which can be greatly reduced by a comprehensive test suite.
    • Stabilize untested code by adding tests. If you intend to change code that has no tests, the risk of introducing new bugs through refactoring is high. Therefore, the first step is to write tests that capture the behavior of the current code. The second step is to do the actual refactoring. If a bug is introduced, aka the behavior changes, the tests will immediately indicate the problem, so you are safe to move on.
  • Cleanup before change. If you intend to apply changes to bad code, clean it up first, so that you can apply the changes much more easily.
    • For example, your task requires you to change many places in the code in the same way. That means big code changes all at once, which is error-prone and harder to debug. The better way is to abstract the duplication, even into many small, simple steps, which are safe due to frequent testing and staging/committing via git. After that, the original change is much easier to apply, and safer because there are fewer locations in the code to adjust.
  • Refactor in small steps. Whenever tests pass, you should stage or commit your changes using Git. If a bug is accidentally introduced, it is much easier to catch if the code changes were small, and rollbacks never undo much work. Doing lots of small commits is good practice. See also TDD.
  • Work in two distinct modes during development. Either add new functionality or refactor, but not both. This ensures that you do not mix up different concerns and that you do not encounter associated problems. When you are done in one mode, commit your changes and switch to the other mode.
    1. Add new functionality: Don't change existing code, leave other functionality untouched. Add new tests and functionality.
    2. Refactoring: Don't add new tests or new functionality, just improve existing functionality. Only add tests if you have overlooked something you have not yet tested. Having only one assertion per test is often a good guideline, but not a strict one, since it's sometimes practical to include multiple assertions within a single test if they are closely related and together validate a specific behavior.
  • Immediate Refactoring: Do not allow big messes to emerge, refactor as soon as possible by cleaning up small messes immediately.
  • Rule of three: If you have to repeat a task for the third time, do a major refactoring to fix the problem once and for all.
  • Immediate Refactoring Implementation: When you have an idea for improvement, turn it into code right away. This practice not only frees up your mind, but also encourages new insights based on the improvement that might otherwise go unnoticed. By implementing your ideas quickly, you free up cognitive load for other thoughts.
  • Handling Major Changes: Refactoring should take anywhere from a few minutes to a few hours at most. Anything beyond that is considered a major change.
    • Small refactorings should be done immediately.
    • Major changes should be done by applying many small refactorings on the fly, whenever code that is part of the major change is touched anyway because of work on another feature. Apply these changes within the RGB cycle (see here) whenever you encounter code that needs to be changed in line with the larger refactoring goals. This means that partial refactorings will be in the main branch, or even in production, but this is fine as long as all the tests pass.
  • Refactoring should be spontaneous. It should happen on the fly, and at best, help you with my current task, which is called "preparing refactoring" for the actual new feature. Do not create refactoring schedules.
  • Sometimes refactor clean code: As requirements change, sometimes even perfectly clean code needs to be changed and re-refactored.
  • Preparing refactoring and feature implementation usually belong together, and therefore in the same branch.
  • Do not perform refactorings if
    • you run into a big mess that you can't easily refactor and that still works.
    • a rewrite is cheaper.
  • Handling API Changes: There may be a situation where you want to change a public API that clients depend on. A good approach is to simply add the new API, and when the clients are updated to work with the new API, the old API can be deleted.
  • Dilemma of Refactoring Bad Legacy Code: Legacy code often lacks tests. Before you start refactoring it, you should stabilize it by adding tests. The problem is that code designed without tests is really hard to write tests for. This leads to a dilemma: Directly refactoring the code runs the risk of introducing new bugs due to lack of tests, and writing tests first is hard without prior refactoring because the code was designed without testing in mind. The best approach is probably to break the code into pieces that are connected by APIs, to which tests can more easily be added afterwards. Once the tests are written, the actual refactoring can be done. This splitting process is still risky because the changes are applied without tests, but less risky than refactoring the entire code without tests.
  • From a refactoring perspective, it is advantageous to use statically typed languages because they have better tools, e.g. renaming is much easier due to unambiguous types and naming. This can be a problem with dynamic languages, but the ultimate choice of language depends on more factors than just this.

Performance Optimization

  • Performance is a feature. Do not optimize performance if the software is already fast enough for the use case, as this is a waste of resources that could be better spent on something more valuable.
  • Make It Work. Make It Right. Make It Fast. This is the order in which you engineer code.
    • Make It Work: Write code that compiles and passes all tests.
    • Make It Right: Refactor until you have clean code, which is much easier to performance optimize than bad code.
    • Make It Fast: Do the actual performance optimization.
  • Clean Code vs Performance
    • Dilemma: Code often cannot be clean and fast at the same time. Cleaning up code often makes it slower. Performance-optimizing clean code often makes it less clean.
    • Practical solution: Focus on the small, performance-critical snippets of code. Often, only small pieces of code have a large impact on overall software performance, and these can be located with a "profiler" tool. This means that it is usually sufficient to optimize these small snippets so that the majority of the code can remain clean.

Database Refactoring

  • Evolutionary database design: Many small changes are better. And like code, the change should keep the database functional in conjunction with the application. In this way, many small migration changes can be applied.
  • You can distribute the refactoring on multiple releases: E.g. when renaming a column: add column with new name, synchronize old and new column, let code use new column, wait if errors occur, if not then delete old column. It is especially important to do this incrementally if this is a high-availability production system that does not tolerate errors.