Thoughts on Refactoring
I recently worked with a team on a fairly significant refactoring. I paired with different team members each day over a three day period as we moved code around; pushing responsibility into other classes, reducing complexity, and making the code more expressive. On the fourth day, I put together some notes on the things we saw and general guidelines for the team to keep in mind. Nothing herein is new to the community, but it might be new to you.
General Approach
Empty your cup
For me, refactoring code is primarily an organic process. I try to approach the code with an empty cup, but this is not always easy. At a minimum, I try to suspend my expectations about what the target design should be. Perhaps I know that I want to push responsibility into a particular class, but I try not to plan out the specifics. I want the code to speak to me; to tell me where it needs to go. As a result, I will occasionally move code more than once - from this method into a class and then from that class into a new or different class. I am okay with this. Had I decided at the onset that the new class needed to exist, I would have pushed in that direction rather than letting it reveal itself to me. When I make these decisions early, I often find that the final design is less elegant.
Making decisions early and implementing them feeds my ego, but does not necessarily serve the code.
Trust Your Tests
For this approach to work, I have to be able to trust my tests. If I am intentionally not concerning myself with how the code works or what it does, then I need somebody or something to do this job for me. As a result, tests are critical.
If I cannot trust the tests, then I first need to profile the code by writing character tests. This, of course, makes emptying my cup more difficult.
Take one small step at a time
Starting with an empty cup helps me to focus in on small opportunities. But now and then, I see something “obvious”. A list of objects is being passed around and specific attributes from that list are being consistently used in this class while other attributes from that list are being consistently referenced in another class.
“Ah.”, I think to myself, “I am going to need a collection object composed of no less than two other objects.”
Even if I am confident the final design will look like this, I don’t push for it. I don’t create the classes that will compose the new collection class. I start by pushing the list into a collection and then slowly, but surely, step by step, encapsulate the list such that no calling class is aware the internals of my new class are a list.
Then, I evaluate the new collection class and see if it has opportunities for improvement. Maybe I create additional classes. Maybe I don’t.
Either way, I go there one small (tiny) safe step at a time where the code is always in a state where it functions as intended and is usually in a state objectively better than I found it.
Things to look for
“But, Doc”, you say, “If I’ve emptied my cup, how do I know where do you start the refactoring?”
Start where you are.
Look for smells and anti-patterns and resolve them. As you do this, the code will reveal new things to you.
(Some) Code Smells and Anti-Patterns
This is not a comprehensive list of code smells and anti-patterns. It is a list of those that I’ve seen often in the past couple of years. Recognizing that over the past couple of years, I’ve had a handful of clients for whom I do technical coaching, so my exposure has been rather limited. Maybe these are common only to my clients.
Primitive Obsession
Primitive Obsession refers to the use of primitive (or raw) types instead of small objects.
Acting on Primitives
Any time we are acting on primitives, say to validate content, this may be a hint that we are hiding a class. An example of this is the use of String to represent PhoneNumber or ZipCode. While this seems okay at first glance, we can quickly see that some other class, which likely has some other primary responsibility, would also need to validate the format of the phone number for use. Same for the zip code.
For example
could be
These are very common and simple enough to resolve.
Lists of Objects
Any time we see a list of objects being passed around, it is an indicator that we need a new object; a collection for the objects we’ve pushed into a list.
Imagine we have a List of Dogs. A Dog has some properties such as breed, age, temperament, and color along with methods such as bark(), sleep(), play(), nextFeeding(), and eat().
Our dog care center wants to make sure the school tour gets exposed to friendly dogs, so we create a list of friendly dogs:
Now, say we pass friendly dogs to another method that returns only dogs that don’t need to eat during the visit.
Consider that the dog list is actually an object hiding in plain sight. If we create a new object called Dogs that is a collection of Dogs, we can then move some of this behavior into the new Dogs class.
Now instead of
We have something like:
Long Methods
I don’t have a specific rule about how long a method can (or cannot) be. For me, I am generally suspicious of any method over 15 lines long. But it is not really about length. Length is a proxy. It is about responsibility.
Maybe a better heuristic than length is whether or not you use the words “and” or “then” when describing what a method does. If you can say, “This method does this and that.”, or, “This method does this then that.”, you have at least two methods, this and that.
Data Clumps
If you find that you are frequently (more than once) passing a set of variables around together, then you have a data clump. A data clump is another object hiding in plain sight.
Use an object to replace the data clump. This will reduce the overall code size, improve the maintenance, improve the readability, and make the code easier to debug.
Often, once a data clump is converted to an object, we discover responsibility that was strewn throughout other classes now finds a more appropriate home in our new class.
Feature Envy
Feature envy is essentially when a method in Object A, interrogates the properties and/or methods of Object B in order to perform a computation or make a decision. Often, we’ll notice that the method in Object A references none of the properties of Object A.
These methods belong in the class being interrogated.
Keep Learning
I’ve listed a few smells and anti-patterns. This isn’t comprehensive by any means. I can’t even say these are the most common issues I see. All I can say is that these are the ones I’ve been talking about recently. And I thought I’d write a bit about them.
Whether you are reading about smells for the first time or you’re surprised I didn’t even mention Refused Bequest or Shotgun Surgery, I encourage you to continue to keep learning and keep practicing. I’ve been coding for over 30 years and I keep learning new things; new approaches and perspectives and techniques.