Removing Code Duplication

Today’s offering is another post inspired by a question on Quora about when to refactor away duplicate code.

The specific question was, “What is the limit for duplicate code to start refactoring?”

I took this to mean, “How much duplication needs to exist before you should refactor it?”

And I’m not sure that’s really a great question. So I decided to start with clarifying what duplication needs to be cleaned up and then when might that happen.

What is being duplicated?

Is it a construct?

“Hey, look, there is a case statement with 3 parts in the method that calculates sales tax and a case statement with three parts in the method that determines user authorization roles.”

This is not duplication we care about. We will find a lot of code that looks similar, but performs distinctly different functions.

Is it logic?

“Hey, look, here we use a chain of command to determine if a user is authorized to access admin pages and here we use a case statement to determine if a user is authorized to perform specific functions on the data entry page.”

This is duplication we care about. Here, we have different expressions of the same fundamental logic; user authorization. This type of duplication indicates a violation of the Single Responsibility Principal.

When do we clean it up?

When we create it

When I create logic duplication, I try to clean it up right away. I might make a TODO and leave the duplication long enough for me to complete the functionality I am currently working on, but I try to clean up such TODO statements within a matter of hours - not days or weeks.

When we discover it

If I come across logic duplication while doing something else - say I am making an update to state validation logic which is present on both the admin screens and the data entry page and I notice the authorization is different for both. In this case, I’ve not caused the duplication and I might not clean it up. If the authorization code itself has not changed in a long time and I have no reason to change it for this story, I might leave it alone. Once there is a story that requires an update to the authorization logic, I will clean up the duplication.

When we feel like it

I am an advocate of leaving the code cleaner than you found it. So, on occasion, I will clean duplication even though I am not working on a story that directly requires an update to the duplicated logic.

Conclusion

For me, the question shouldn’t be “what is the limit” - all duplication of logic is a risk. The question should be “at what time”. If I knowingly make it, I clean it in short order. If I discover it while working on the logic, I clean it in short order. If I discover it while working on something related, I might leave it alone until a logic change triggers the refactoring. And sometimes, I’ll clean it just because I can.