Data clumps - Kotlin refactoring recipe (without tests) ⚠️
Data clumps
In your codebase, you may have come across objects that hang out together. They often are passed around as a group to serve some purpose(s) and lose meaning if you isolate even one of them from the group. These groups of objects are called data clumps.
Data clumps are usually a result of a poorly expressed idea. If you have spotted one of these clumps and are reading this article, it is only a matter of time until a new object is born 😉
Why fix data clumps?
- Express the idea more clearly and concisely.
- Reduce the number of parameters in function signatures.
- Identify behavior associated with the group and extract it into a class a.k.a. encapsulation.
- Eliminate duplicate behavior.
Ground rules
If you are going to refactor without tests, these rules apply more strictly than refactoring with tests.
- Every change should be atomic and easy to follow, and you should record them in your version control system (usually Git).
- Your code should always be deployable. You should be able to stop doing whatever step you are in (stash or discard the changes) and be able to make a build for deployment.
- Close to zero compilation errors.
- Close to zero manual edits.
- Your reviewer should be able to follow the changes with ease.
Tools we'll rely on
- IntelliJ-based IDE
- Kotlin compiler
- Git GUI tool (recommended, I use Sublime Merge)
Kotlin language features
Fixing data clumps is non-trivial, and the recipe varies from language to language. What makes this recipe relatively trivial and robust are these Kotlin language features - data classes, destructuring declarations and variable shadowing.
The recipe
The simplified example below highlights two functions - totalPayable
and totalPayableWithPenalty
. Both take in few parameters (fig. 1).
You can also see that both the functions have a few common parameters - cGst
and sGst
interest us. These parameters have the same names, types and are used together across both the functions (fig. 2).
I have intentionally excluded the costOfGoods
parameter because there are other hypothetical functions where costOfGoods
does not "hang out" with cGst
and sGst
and therefore is not part of the group.
Step 1 - Copy the parameter names and types that form the clump.
We know that the parameters cGst
and sGst
form a clump. To replace these individual parameters with a single object, we have to copy the parameters and their types (fig. 3).
Notice that the parameters are right next to each other in this case and hence easy to copy. If the parameters are not next to one another, use IntelliJ's Change Signature ⌘ + F6
/ Ctrl + F6
action to order them one after another before copying. Remember to commit if you make this change.
Step 2 - Name the idea and create a Kotlin data class.
After copying the parameters and types from one of the function signatures, come up with a reasonably good name for the idea. It doesn't have to be perfect, don't spend more than a few seconds choosing a name (Jay Bazuzi prefers 5 seconds). You can always rename the class when you come up with a better name.
Next, create a Kotlin data class
using the name you just came up with. In this example, I am using TaxComponent
; it may not be perfect but just good enough to get the idea across.
The parameters we copied will become public immutable properties of this data class (fig. 4). The data class doesn't have to be in a separate file; use your discretion to place it wherever you see fit.
Step 3 - Create a new instance of the data class inside the function.
After creating the data class, go back to the totalPayable
function and, in the first line of the function body (line 22, in this example), attempt to invoke the constructor of the TaxComponent
data class (fig. 5).
Bringing up the code completion menu ⌃ + Space
/ Ctrl + Space
will automatically suggest appropriate arguments for the constructor (fig.6).
Hit Enter ↵
after selecting cGst, sGst
from the list, and the IDE will automatically fill in the constructor arguments. Ensure there are no compilation errors in the auto-completed code (fig. 7).
Step 4 - Shadow function parameters with properties from the data class.
Use the Introduce Variable ⌥ + ⌘ + V
/ Ctrl + Alt + V
action on the new TaxComponent
instance expression. If a floating menu shows up (depends on where your cursor is), select the entire new instance expression from the menu and hit Enter ↵
(fig. 8).
Another floating menu will show up asking you to either create a single variable or a destructuring declaration. Choose, Create destructuring declaration (fig. 9).
IntelliJ will create the destructured declaration for you. However, it will also append a number to the variable names in the declaration to prevent conflicts with the function parameters (fig. 10).
Usually, this is desirable, but our plan is to replace the function parameters with our new data class TaxComponent
. With that in mind, we will rename this variable to match the name of the parameters. So, rename cGst1
to cGst
, matching the name of the incoming parameter (fig. 11).
We now rename the second variable sGst1
to sGst
to match the second parameter (fig. 12).
If you did this correctly, you will see that IntelliJ highlights all the destructured declaration variables with a warning highlight, usually yellow (fig. 13).
We have to verify if the warning highlight is the one we wanted. Hover your mouse over the warnings, and you should see the "Name shadowed" message (fig. 14).
Step 5 - Introduce parameter.
We are in the final stages of refactoring. Use the Introduce Parameter ⌥ + ⌘ + P
/ Ctrl + Alt + P
action on the new instance expression. You'll get a floating menu asking you to select an expression, select the new instance expression, and hit Enter ↵
(fig. 15).
IntelliJ will show you a post-transformation preview of the function signature. You'll notice that IntelliJ introduces a new TaxComponent
parameter, and at the same time, removes the old cGst
and sGst
parameters. The IDE is smart enough to figure out these values are available through TaxComponent
(fig. 16).
Hit Enter ↵
, and IntelliJ will safely make the change to the function signature and also to the call sites (fig. 17).
Tada! 🎉 One down and one more to go, but I'm sure you can handle the other function on your own. And that, my friend, is how you address data clumps in Kotlin using IDE-assisted refactoring.
Post-refactoring checks
- Build your project and ensure there are no compilation errors.
- If you don't have tests, verify manually by executing the code.
- If you have tests, ensure all your tests pass after the change.
- Use a Git GUI client to review every call site affected by the change (fig. 18).
Achievements and possibilities
- Uncover an idea from the business domain. We identified a concept from the business domain and gave it a name, i.e.,
TaxComponent
. Good names help maintainers understand the codebase better; this could be your future self or peers. - Opportunity to introduce polymorphic behavior.
TaxComponent
currently takes incGst
andsGst
, which represent inter-state trade. There are different GSTs for intra-state trade and trade within a union territory. WithTaxComponent
in place, we can make it a superclass and have subclasses for every tax. - Encapsulation. Move behavior associated with the
TaxComponent
into the class. You may have utility functions or duplicated code that work oncGst
andsGst
at the moment. We can encapsulate them inside the new class. - Testing. Easy to unit test this one idea after encapsulation.
Next time you find a data clump in your codebase, you know how to fix it in few minutes and leave the codebase better than you found it.
For more content like this, follow me on Twitter 😉