A high level overview to refactoring monolithic code structures

Bullpen on a gray laptop with shallow depth of field

Depending on how long you’re in the software development industry, there comes a time where we are faced with an large slab or confusing mess of an monolithic blob of code with the task of making sense of it all.

What is a monolith anyway?

Code structures that act like code dispatchers or are entangled with the rest of the codebase in such a way that descriptions often go beyond a simple two sentence explanation. Contrary to most beliefs, monoliths come in various forms, some better from the point of the view of an developer than others. Some monoliths can be simplistic in nature with having single responsibilities, others can be more in the line of thousands of lines of code without a head or tail. By nature, those are the more dreaded ones. Monoliths can have the form of methods or functions, some are classes and even whole packages and modules. Sometimes, in the wild, monoliths can be found that are basically the whole project (application, service,etc.).

I should mention that not every large code base grouping is a monolith. There are examples of classes with tens of thousands lines of code. They are usually some framework-specific classes, but they are centered around some specific functionality. Something like an API. Lower level programming languages tend to have more verbose code written in them, the result being high number of lines of code. An additional example would be an simple window component for an UI view-based framework.

What inspired me to write this article was the case study of a monolithic creation in one of my applications. Usually developers are given tasks to maintain, update or refactor legacy or code that was written by an another person, but I wanted to showcase this particular example on my own piece of code.
For the sake of brevity I’ll keep this article programming-language agnostic. Everything mentioned here can be applied to most software development projects. These are just some of my personal views about the topic and approaches to tackle the task of taming difficult to understand blocks of code that I’ve accumulated through the years as an software developer.

ORIGINS

Let me get this first of the way, no one intentionally sets out to write bad or monolithic code. When we start out working on a new project, one can easy be distracted by the programing language or some sort of framework or, the one that comes far to often in the mix, the rush to get something done and out.

Someone who just got into development, gravitates more to creating an monolith than someone who’s been developing for a considerable amount of time, this is by no means a hard rule. There are people with no software development background to speak of that start out writing very respectable, clean code as well as those who spend years of development just typing along in passive mode that write messy code with lots of code smells.

The monolithic piece of code that I’ve written and which inspired this article has its origins as an simple 350 lines-of-code view controller class. It observed the various states of the, at that time, few view components. For the first few years of the application lifecycle, nothing much has been changed around this class. Minor bug fixes and dependency updates. Then came a time where I’ve decided that the current state of the application needed to get an serious upgrade in every possible way. Huge features and design updates were planned. This would be an substantial, lets say V2 of the application. While there was a loose deadline, no rush was needed to get the new version out, but nonetheless, I still rushed with the development process without taking the time to think about the feature implementation process or even the application architecture.
The development process was going on at a decent pace, but slowly the class in question started to grow. 450, 500, 650 lines of code… No red flags as of yet. It was still the same class, just that there are new features added and coupled into it. The single responsibility principle was still in tact. But then came the first of many little hiccups. One of the features required a special hook to some other component. One quick line of code and the seed of what’s about to come has been planted. The first of many detours. From that point on the monolith started to grow at a rapid pace up until 2500+ lines of conditional spaghetti code. The sight of working features in the application made me overlook my own personal rule to at least stop and think about refactoring or class structures that are longer than 500 lines of code.

The V2 of the application was released and over the years new features were added, but everything at the expense of code readability and with a direct impact on the monoliths size. With each new line of code added to the class, the more concerned I was. I knew that at some point I should roll back my sleeves and tackle this monster. New features were more and more hard to implement. I was at a crossroads, either continue with the current state and postpone the task in hand or just sit down and, at the cost of not releasing new updates to the application for some time, do what needed to be done.

WHEN TO REFACTOR?

Should you refactor the existing code or should you simply just start over from the ground up? This is the biggest question you must answer before going into code refactoring.

Depending on the size and scale of the monolith, starting with a fresh take on the solution is definitely a viable path for going on, but I should note that that’s one of the more extreme approaches to solving a problem such as untangling spaghetti code.

Smaller projects with a limited number of software developers and which have no impact on a larger User base are more suitable candidates for creating new and better solutions from 0.

Larger projects with a sizable client base, resources and team members are less likely to become candidates for starting anything ‘fresh’ by the simple factor of being higher-risk. There’s only so many times that one can go up to the higher-ups and propose starting over from scratch. Larger projects have time and money invested in existing solutions and by simply proposing to bypass that is a huge red flag for everyone involved. It also points to major flaws in the existing codebase that are so severe that no amount of clean-up can fix it. There’s also the point of the existing code being ‘battle-tested and from an client’s point of view, in perfectly operational state. Namely, clients do not see our code and are not interested in the prospect of it being ‘messy’ and hard to understand and improve.

The process of ongoing code refactoring presents itself like a compromised solution. By making minor changes to our monolith from time to time, we ensure better code than that of the initial state with the added benefit of it being less risk-prone and satisfactory to everyone involved.

When is the right time to refactor code? There are few scenarios for this. One is very simple and that’s when you’re assigned the task by higher ups. Perhaps there’s an small downtime between new releases. The most common one is the measure of difficulty to integrate any new feature or update a existing one. The task of implementing a new feature should be mostly contained around the development of the core aspects of the feature and the bare minimum about the coupling of the feature into the existing codebase. A red flag in that process should be the ’slight’ and minimal adjustment of some feature handling. Those slight changes and adjustments can very quickly add up in some quite unpredictable way for which the cost will be payed down the road. This is the perfect opportunity to look a bit deeper in the code architecture and refactor it to better suit the new feature requirement. And now comes the interesting part, how do we approach the refactoring process of an monolith?

FORMAT CODE AND CODE STYLE

I’ll skip the bare requirements of starting the refactoring process like creating an separate version control branch and developing in a sandbox-type of environment. The idea here is to first and foremost create a coding environment that makes you feel like having a safety net in case something breaks. Isolated development is key.
This might be obvious, but the code you aim to refactor should be formatted to the ease of readability as much as possible. This part presumes that the code in question is badly formatted or that it doesn’t confirm to an specific code style. Many IDE’s out there make this process very simple and applying a code style and formatting the code to be structurally more readable should be a quick job. You would be surprised how a few spaces of indentation or spacing make everything more approachable. Remember, at this point we are still in the ‘understanding’ stage of the process.

ANALYZE AND UNDERSTAND THE CODE

One of the first steps when looking into refactoring a monolith is to step back and try to understand what the monolith actually does. Sometimes there can be comments that help and point to the monoliths job or maybe there’s something about it in the projects documentation.

Try to answer the question: What does this actually do? The core of it. If you come to an answer that is more then a few sentences long and contains lots of ands and ors, it’s a good pointer that the code is in violation of the single responsibility principle. Look into the various dependencies and/or global variables sprinkled throughout. The ins and outs. It’s very helpful to just write down the current state and description of the code. Describe it like you would to another to an non-developer.

WHAT IS THE CORE OF THE CODE?

Try to answer this question in one or two sentences. The answer is the goal you are now aiming for. Everything that is around that core description is a prime candidate for trimming and extracting.

You can mark the places where the code steps out of those boundaries and start thinking of where those trimmed parts actually belong to. Perhaps there’s a component already implemented that does something similar? If not, a new component will probably be needed. When there are not too many parts that are outside the core code or if they are not yet visible at this point, don’t worry, sometimes they can manifest themselves later on during the refactoring process. With the task and the newfound goal for the code we have an clear direction. We now have to bridge those two points.

One thing to remember is that the important features are already there just obscured by some messy code.

REMOVE ANY COMMENTED OUT CODE AND DEPRECATED COMMENTS

Slowly we start using our imaginary eraser and look for any commented out code and remove it. You should feel safe by deleting these kind of lines of code because if they are needed they wouldn’t be commented out. If that’s something that someone put there in the case that in the future it could point to a solution or an specific case that should be implemented, you have it already in the version control system and, simply put, for this task it’s not needed. We are breaking apart the existing state of code and that state does not include any distracting commented out code.

Now for the comments that don’t actually reflect the state of the code that they are describing. Perhaps they are remnants of an previous code refactoring that the developer simply forgot to update. Sometimes they are funny or rage-fueled comments left behind. Either way, remove those comments or update them yourself to give them a purpose again with meaningful code-centric descriptions. You shouldn’t move on without addressing them.

TESTS OR MANUAL TESTS

This is where the fun begins. Does the monolithic code have some test cases around it? Perhaps some integration or unit tests? If so, that’s a new safety net in your refactoring process. With the refactoring process we don’t want to break any existing functionality. We want everything to work like before with the addition of better clarity and ease of maintainability.

If there are no tests written, you should at least write a test around the monolith with mocking it’s dependencies and asserting the outputs. Create a testing sandbox. If the monolith, for example, accesses some sensitive data or makes network calls, you can create mock data or network servers that make it safe for executing the code.

Manual testing is also important. You want to go over all the possible the usecases for the monolith. As the process of refactoring goes on, you’ll probably create completely new tests, but the ones that you start out with must remain there. You’ll probably want to run the tests as often as possible.

CHECK FOR MEANINGFUL VARIABLE, METHOD, FUNCTION OR CLASS NAMES

We slowly continue the refactoring process with actually touching the live code with just simply looking for meaningful names of various parts of the code.
Look for any variable or function names whose purpose is hard to understand. Names like “counter”, “executeTask”, “doStuff”, “doStuffMore” and such should be renamed to something more descriptive. You can be verbose as much as you want to. Make them as unambiguous as possible. There shouldn’t ever be a question what a variable or function is about. Same goes for classes, object instances, package names, etc.

At this point some other functionality can manifest themselves that falls outside the core functionality of the monolith, but not necessary. If so, we marked them as candidates for extraction. Moving on…

LOOK FOR BLOCKS OF CODE

Sometimes there are distinct blocks of code in an code structure that seem to be separated by some kind of logic added by whomever wrote the original piece of code. Those code blocks can be if/else, while, try/catch statements or something as simple as a few lines of code separated by line breaks.

Why should we look for blocks of code? Blocks of code can indicate of some routine something that can be extracted into a function or even its own new class. Modern IDE’s can detect duplicate code blocks when extracting one and suggest replacing all similar blocks of code with calls to the newly created function. Make sure to use those refactoring features provided by your IDE of choice.

DRAW DEPENDENCY GRAPHS OR LIFECYCLE PATHS/USECASES

To further clarify and understand monolithic code, we can create/draw dependency graphs for the whole project, in order to better understand all the references to the monolithic code, and also the dependencies required for it to function correctly. By simply visualizing all of the connections and relationships between different parts of a project and its lifecycle, we can easily spot “leaf” dependencies, very loose dependencies, that are possibly not required by the monolith at all times. They can be, perhaps, converted into some kind of service classes, called only when necessary. A graph can also show us some long-chained dependencies that are basically never invoked in the first place. Follow each dependency from origin to where its invoked. Maybe there’s some old code for which there are none usecases in the current project state. Excellent! Be sure to remove them step by step, cleaning up any remnants of them. Don’t worry, the functionality of the impacted structures will stay in tact because we already deducted that they are no longer being needed.

From the project graph and lifecycles, we switch to the monolith graph and draw all the different scenarios for which the monolith is being called upon. As we did with the project as a whole, we approach the task identifying the “leaf” scenarios. Find the root usages. There are cases where the root usage doesn’t make sense and it’s there as a simple placeholder type. Unused references are usually marked by the IDE, but that’s not necessarily true. Some IDE’s will report on existing usages for a variable, class, function, etc. when that simply isn’t the case. Some functionality might be present in test classes and not in the production code. This usually happens when the tests were written after the production code. Just remember to check the ins and outs references thoroughly.

FOCUS ON WHAT MAKES YOU BETTER UNDERSTAND THE CODE

Software developers have different techniques and approaches in tackling a specific problem. If drawing graphs and diagrams does not support your personal understanding of the problematic code, use something that helps you personally. Perhaps talking about the problem is helpful to you and something as small as a simple word can trigger a mechanism of understanding. Perhaps you’re the type of developer who prefers a spike developing method. You can always collapse/hide blocks of code in your IDE that your not currently working on. Every single little thing that personally improves your code understanding and the development process, should be utilized.

IT WILL GET WORSE BEFORE IT GETS BETTER

This is a very common scenario. One of the side effects when dealing with a monolith can be the creation of lots of smaller monolithic structures, function or class explosions. Refactoring a monolith of 5000 lines of code can result in the creation of lots of smaller code structures that, when combined, can have tens of thousands lines of code. That is perfectly OK, just remember what the end goal is and continue your refactoring process. The creation of new structures is basically mandatory and that’s a good thing. We want smaller bits of reusable and testable code. As your refactoring process continues, slowly some of the newly created code will become obsolete or unnecessary and can be removed altogether. They were created for an specific purpose in our refactoring process that has no use later on.

Some structures will open up by crystallizing themselves and they can be easily merged together. Our focus should not be the number of new lines of code, but the untangling of the spaghetti code and future code maintainability.

FOCUS ON ONE PROBLEM AT A TIME

Since the refactoring of a monolith is a long process, there will be times where some other problems manifest themselves. Perhaps they are problems that were always there, but hidden by some messy code or perhaps they are newly created problems by the refactoring process. We don’t want to lose track of them, but we also want to address them one at a time in the order they appear. Remember, we want to maintain the existing functionality and don’t want to break anything, so we proceed with resolving the problem and until that’s accomplished we do not move on with the main code refactoring task. To-Do’s and similar comments should be as low as possible.

ASK FOR INPUT OR HELP

I’ve intentionally extracted this into its own subtopic because of its general importance in software development.

No matter of level of experience, we all need help with our code from time to time. Perhaps we are invested in our tasks for so long that we’ve, at some point of time, lost the general oversight of the original task. A 5 minute chat with an colleague can save hours or even days in the long run. “How would you do this?” and “what do you think about this specific line of code?” go a long way in the development process.
Perhaps you can speak with the developer who wrote the original code? Ask them about their thought process while writing the code and present them what you want to accomplish. If you’re new to the project, perhaps someone can explain it to you.

Read the documentation if there’s one available.

Medium and large companies have these processes in place either by pair programming, code reviews, etc. If a developer is part of a smaller company or is a freelancer, asking online for help is a suitable option. StackOverflow clearly comes to mind. Chances are that the problem in hand or something similar has already been solved. Broaden the scope of your queries if you can’t find a solution immediately. Maybe there’s a solution available in some other programming langue or some other framework which you can adapt to your particular problem. At the end of the day we want as much help as possible.

MARATHON RATHER THAN SPRINT

Code refactoring should be an marathon, an long running even an ongoing process. It’s much easier to refactor bits of code here and there from time to time, instead of waiting for them to pile up.

When planning out the task of refactoring a monolith, we should give ourselves the proper amount of time to accomplish it. If the schedule is tight, a compromised solution might be possible, but it should be avoided. Take the time, plan your refactoring strategies.

In the “Origins” part of this article, I’ve made the point of having to sit down and take my time with the problem I had at the expense of releasing new versions of the application. To an outside viewer the application was on hold for the duration of the refactoring process and that’s something we want to avoid in the future.

USE THE TOOLS AVAILABLE TO YOU

Software development has come a long way since its origins. Modern developers have access to powerful tools that help them automate or simplify some tasks.

Static code analysis or lint checks can give valuable insight in the quality of code. Most IDE’s have support for built-in debuggers that can help us understand what’s actually going on with our code and even change values on the fly.

Something as simple as a different IDE theme can also impact the understanding of the code.

Some IDE’s have the option to modularize a specific code structure. By just previewing what the end result of the modularize process would be, we can see all kind of dependencies and relationships to our monolith.

It’s always helpful to take the time and spend researching the tools we are working with and that are available to us. At the end, we don’t want to spend to much time working on something that can be easy automated by an IDE feature or plugin.

FREQUENT COMMITS

It’s easy to oversee some mistakes or side effects we make while coding and for that reason we want to have a safe spot ready to go back to where everything was somewhat OK. Even just checking the commit log can be very useful. Commit frequently and regularly with meaningful commit messages about what that commit is about. The same as writing an comment over a long function. Avoid single giant commits where you bundled multiple large changes together.

Hopefully the examples mentioned in this article were helpful to you. Hindsight is 20–20. What we can do from now on is to take these kind of examples and learn from them to be able to grow as developers.

Software developer and photography enthusiast