How to Evolve Legacy Software Systems

There’s probably no IT company without a legacy software product. To some extent all software systems have legacy code. Whenever a system becomes flooded with it, there’s a solution many developers start proposing: rewrite!

I’m a big fan of evolution. I’m also a big fan of making software commercially successful. When software engineers start suggesting expensive solutions, it’s probably a sign of miscommunication between product team, engineering team and high level management.

Business aims to earn money. The engineering practices it uses, should focus on reducing cost of change and increasing commercial value. That may as well include a rewrite, but in my experience there are often way less radical measures to apply before going into extremes. Let’s go through them.

Acknowledgement

A system may not seem like a legacy at all. It may be a core product of a business which brings big $$$ and thanks to designers looks nice and shiny. Non-technical people will have no clue about any internal technical issues until they start surfacing and cause money loss.

It’s the responsibility of engineers to identify and signal technical issues. Engineering management should create a crystal clear communication pipeline between engineers and non-engineers. Everyone from a PM to CEO should have clarity on possible technical problems, reasons why they exist, and potential threats. If engineers signal a serious technical debt, but CEO thinks it’s all going great, there’s someone who is breaking the communication and silencing problems. While the CEO may not be directly responsible or knowledgeable of technical issues, they should be aware of business risks.

The silencing of technical issues may be a byproduct of a poor understanding of technical debt. Therefore we should make sure engineers know how to detect, classify, and fix the technical debt. Then we should make sure non-engineers have a high-level understanding of what it is.

How can we translate technical issues into business issues? Here are a few examples:

Churn: low software quality may lead to an increased number of bugs, which will create a lot of hassle for the support team and eventually may cause customers to leave the product.
Low team morale: when the software is hard to understand and change, the team isn’t motivated to work on features.
Bad product reputation: unhappy customers may give a bad rating to the product, which will make it difficult to sell even when issues are fixed.
Slow time to market: feature development may be super slow because of the high code coupling, absence of tests, or obsolete tech stack.
Impossible growth: new users are coming, the product is getting popular, but the architecture doesn’t allow to scale well or is very expensive to scale.

If we go a bit more technical, we can say that technical debt affects the following product factors:

Robustness: stability of an application and the likelihood of introducing new defects when modifying it.
Performance: the responsiveness of an application.
Security: an application’s ability to prevent unauthorized intrusions.
Transferability: the ease with which a new team can understand the application and quickly become productive working on it (the onboarding period).
Changeability: an application’s ability to be easily and quickly modified.

Eventually, we can distinguish the following technical debt categories:

Code Quality Debt: the code is hard to test and change.
Architecture Debt: high coupling between components, difficulties to work on a project.
Infrastructure Debt: scalability issues, broken or non-existent CI/CD pipeline.
Security Debt: vulnerabilities, broken or non-existent security policies.
Performance Debt: non-optimal execution, can be a byproduct of code, architecture and/or infrastructure debt.

Now that we know how to classify and acknowledge the technical debt, let’s see how to identify and prevent it.

Visibility

How do we know there’s a problem? How do we know how big the problem is? Just a feeling is not good enough to justify changes. We may have an architecture that doesn’t scale to business goals, or we may have an obsolete version of a library X. Those two may be valid issues, yet we should know more about them.

Making things visible is key. If you go through product functional and non-functional requirements, you should be able to select several vital metrics. There are numerous tools for all kinds of monitoring. The choice of them depends on your goals and your technologies. I will try to summarize the most critical metrics, which should be monitored on any software project:

Code quality: apply static analysis to identify code issues.
Test coverage: it can be a good indicator of code quality issues. Adding tests shouldn’t be hard, and reaching 90%+ coverage is super easy when the code has low coupling.
Test suite run time: forget about CI/CD if your tests run for hours.
CD lead time: how fast can you deploy something to production?
Team velocity: whatever methodology you use, measure speed. If it goes down, you can investigate why.
The number of bugs: along with the velocity metric, this can be a good indicator of technical challenges.
Vulnerabilities: make sure you get fresh updates on vulnerabilities of the components you rely on. Try to apply automatic security checks.
New versions: learn about new version of your dependencies as soon as possible.
Performance: it’s an extensive topic, but you should measure whatever matters to you. For an API, it can be response time, memory, CPU, throughput, various DB metrics.
Errors: make sure you record errors from all components, and it’s easy to search them.

Those are the essential metrics. They should be easily trackable and visible. If you can’t track one of those, it’s already a sign of technical debt.

Improving knowledge and skills

Why do we have technical debt? It’s, of course, the fault of those product people who come up with crazy requirements, don’t wanna listen about tests and refactoring, and always push for deadlines. Or is it? While non-engineers may have difficulty understanding the concept of technical debt, it’s engineering responsibility to apply the best knowledge to prevent and foresee technical issues.

We don’t know how our project will look like in X years. But we do know that most software projects grow feature by feature and that requirements change.

People have been building software for decades now. We’ve collected loads of best practices and solutions to most known problems. In the same time, now and then, someone creates a new software and invents a “new” way of organizing dependencies, a “new” way of working with a DB. A “new” wheel is invented every day. Why do engineers spend so much effort on re-inventing things? Well, mostly because they don’t know any better! The famous Design Patterns book was published in 1994. Twenty-five years later, we live in a world where many companies don’t even have a single design pattern question in their interview process. Numerous engineers who use dynamic languages either don’t know or don’t care about design patterns. To them, it’s a thing of Java.

When we create software, we face a multitude of challenges, and most of them are shared between software projects. No matter what kind of software we build. No matter what type of framework or language we use. The same problems arise now and then. Examples? Ok, how can we create business logic independent of a specific data store? Or let’s say we have a selection of algorithms, solving the same problem, but each time we need to pick an algorithm depending on current conditions. I can invent something to solve those problems. But why should I spend any time on re-inventing what was already invented long ago? To this day, we already know:

How to write clean code.
How to design scalable systems.
How to test software.
… and many more.

Cultivating and improving this knowledge in engineering teams will have a direct impact on the technical debt level. Most problems were already solved. Reusing proven solutions will decrease the chance of error.

Change of habits

Habits go along with knowledge. Let’s say you can name all the SOLID principles, but in your daily coding, the dependency inversion doesn’t exist, and code components have multiple responsibilities. That means you don’t have a habit of applying your knowledge. It should be impossible for you to create bad code. When you have a habit of doing TDD, you will not feel that you have the option of either creating tests or not. It’s part of your routine now.

A good theory should come down to good practice and fixate as a good habit.

Tests

In his book “Working Effectively with Legacy Code” Michael Feathers identifies legacy code as the code without tests. No matter how tidy and clean the code is, no tests mean it’s legacy. I tend to agree with him. Tests are also an excellent first step to improve a legacy software system.

Unit tests and TDD

TDD is a great approach to create unit tests. It’s not even testing; it’s just a method of building software. You develop two co-dependent parts (code and test) in a way that changes in each part break the other part. The two parts validate each other all the time. It’s similar to eliminating a single point of failure.

But even if TDD isn’t your choice, unit tests are great at pointing to high coupling and high complexity of your code. In a legacy system, you can start applying TDD to any new code. The old code you can refactor by creating tests for it. In most cases, your code will be very rigid to change, and tests will expose the inflexible parts.

I highly recommend starting refactoring with writing tests. Without any tests, it’s a very high-risk change.

Acceptance and integration tests

If you have a complex legacy system, consider creating integration/system tests (between components) as well as high-level acceptance tests to validate all functional requirements of the system. Such tests will guard you against unwanted regressions. Some engineers even think that acceptance tests are the only ones needed because, in unit testing, you don’t involve all dependencies together. Acceptance tests have great value without a doubt, but they don’t help you to pay the technical debt on the code level.

The evolution of any legacy software system should start with investment in tests. Here’s the basic approach I would use:

Cover functional requirements with high-level tests.
Start applying TDD to any new code.
Refactor the old code by creating tests for it.

The evolutionary approach

Rewrite of the whole system is a very bold and expensive move. How do you know that the rewrite will be better? Especially if the same developers are involved.

In my experience, gradual evolution works pretty well. Slowly, over time, piece by piece, you can rebuild the whole system. I’ve seen it happening in several projects. Evolving a legacy software system implies growing development skills, techniques, and habits. You don’t get those after rewrite.

Define the new rules and apply them to every new change. That’s the primary tactic. Be firm and brave. Don’t let the old habits come back. Whenever possible, apply the Boy Scout Rule: leave your code better than you found it.

A legacy software system is great pain, but it can also be great learning. I hope you can make the best of it!