Data clumps - Kotlin refactoring recipe (without tests) ⚠️

Data clumps

In your codebase, you may have come across objects that hang out together. They often are passed around as a group to serve some purpose(s) and lose meaning if you isolate even one of them from the group. These groups of objects are called data clumps.

Data clumps are usually a result of a poorly expressed idea. If you have spotted one of these clumps and are reading this article, it is only a matter of time until a new object is born 😉

Why fix data clumps?

  • Express the idea more clearly and concisely.
  • Reduce the number of parameters in function signatures.
  • Identify behavior associated with the group and extract it into a class a.k.a. encapsulation.
  • Eliminate duplicate behavior.

Ground rules

If you are going to refactor without tests, these rules apply more strictly than  refactoring with tests.

  • Every change should be atomic and easy to follow, and you should record them in your version control system (usually Git).
  • Your code should always be deployable. You should be able to stop doing whatever step you are in (stash or discard the changes) and be able to make a build for deployment.
  • Close to zero compilation errors.
  • Close to zero manual edits.
  • Your reviewer should be able to follow the changes with ease.

Tools we'll rely on

  • IntelliJ-based IDE
  • Kotlin compiler
  • Git GUI tool (recommended, I use Sublime Merge)

Kotlin language features

Fixing data clumps is non-trivial, and the recipe varies from language to language. What makes this recipe relatively trivial and robust are these Kotlin language features - data classes, destructuring declarations and variable shadowing.

The recipe

The simplified example below highlights two functions - totalPayable and totalPayableWithPenalty. Both take in few parameters (fig. 1).

fig. 1 - totalPayable and totalPayableWithPenalty functions

You can also see that both the functions have a few common parameters - cGst and sGst interest us. These parameters have the same names, types and are used together across both the functions (fig. 2).

fig. 2 - cGst and sGst are passed as a group across functions

I have intentionally excluded the costOfGoods parameter because there are other hypothetical functions where costOfGoods does not "hang out" with cGst and sGst and therefore is not part of the group.

Step 1 - Copy the parameter names and types that form the clump.

We know that the parameters cGst and sGst form a clump. To replace these individual parameters with a single object, we have to copy the parameters and their types (fig. 3).

fig. 3 - select and copy the parameters that form the clump

Notice that the parameters are right next to each other in this case and hence easy to copy. If the parameters are not next to one another, use IntelliJ's Change Signature ⌘ + F6 / Ctrl + F6 action to order them one after another before copying. Remember to commit if you make this change.

Step 2 - Name the idea and create a Kotlin data class.

After copying the parameters and types from one of the function signatures, come up with a reasonably good name for the idea. It doesn't have to be perfect, don't spend more than a few seconds choosing a name (Jay Bazuzi prefers 5 seconds). You can always rename the class when you come up with a better name.

Next, create a Kotlin data class using the name you just came up with. In this example, I am using TaxComponent; it may not be perfect but just good enough to get the idea across.

The parameters we copied will become public immutable properties of this data class (fig. 4). The data class doesn't have to be in a separate file; use your discretion to place it wherever you see fit.

fig. 4 - data class with copied parameters as public immutable properties

Step 3 - Create a new instance of the data class inside the function.

After creating the data class, go back to the totalPayable function and, in the first line of the function body (line 22, in this example), attempt to invoke the constructor of the TaxComponent data class (fig. 5).

fig. 5 - attempt to invoke the constructor of TaxComponent data class

Bringing up the code completion menu ⌃ + Space / Ctrl + Space will automatically suggest appropriate arguments for the constructor (fig.6).

fig. 6 - code completion showing a list of available options

Hit Enter ↵ after selecting cGst, sGst from the list, and the IDE will automatically fill in the constructor arguments. Ensure there are no compilation errors in the auto-completed code (fig. 7).

fig. 7 - constructor parameters filled by IntelliJ code completion without errors

Step 4 - Shadow function parameters with properties from the data class.

Use the Introduce Variable ⌥ + ⌘ + V / Ctrl + Alt + V action on the new TaxComponent instance expression. If a floating menu shows up (depends on where your cursor is), select the entire new instance expression from the menu and hit Enter ↵ (fig. 8).

fig. 8 - select the new instance expression from the floating menu

Another floating menu will show up asking you to either create a single variable or a destructuring declaration. Choose, Create destructuring declaration (fig. 9).

fig. 9 - create a destructuring declaration from the floating menu

IntelliJ will create the destructured declaration for you. However, it will also append a number to the variable names in the declaration to prevent conflicts with the function parameters (fig. 10).

fig. 10 - IntelliJ appends 1 to the variable names in the declaration

Usually, this is desirable, but our plan is to replace the function parameters with our new data class TaxComponent. With that in mind, we will rename this variable to match the name of the parameters. So, rename cGst1 to cGst, matching the name of the incoming parameter (fig. 11).

fig. 11 - Rename cGst1 to cGst

We now rename the second variable sGst1 to sGst to match the second parameter (fig. 12).

fig. 12 - Rename sGst1 to sGst

If you did this correctly, you will see that IntelliJ highlights all the destructured declaration variables with a warning highlight, usually yellow (fig. 13).

fig. 13 - IntelliJ highlights all variables inside the destructured declaration with a warning

We have to verify if the warning highlight is the one we wanted. Hover your mouse over the warnings, and you should see the "Name shadowed" message (fig. 14).

fig. 14 - Name shadowed warning message from the IDE

Step 5 - Introduce parameter.

We are in the final stages of refactoring. Use the Introduce Parameter ⌥ + ⌘ + P / Ctrl + Alt + P action on the new instance expression. You'll get a floating menu asking you to select an expression, select the new instance expression, and hit Enter ↵ (fig. 15).

fig. 15 - floating menu asking you to choose an expression to extract

IntelliJ will show you a post-transformation preview of the function signature. You'll notice that IntelliJ introduces a new TaxComponent parameter, and at the same time, removes the old cGst and sGst parameters. The IDE is smart enough to figure out these values are available through  TaxComponent (fig. 16).

fig. 16 - IntelliJ showing a preview of post-transformation changes

Hit  Enter ↵, and IntelliJ will safely make the change to the function signature and also to the call sites (fig. 17).

fig. 17 - totalPayable function after the extract parameter refactor action

Tada!  🎉 One down and one more to go, but I'm sure you can handle the other function on your own. And that, my friend, is how you address data clumps in Kotlin using IDE-assisted refactoring.

Post-refactoring checks

  1. Build your project and ensure there are no compilation errors.
  2. If you don't have tests, verify manually by executing the code.
  3. If you have tests, ensure all your tests pass after the change.
  4. Use a Git GUI client to review every call site affected by the change (fig. 18).
fig. 18 - SublimeMerge showing the refactored function and an affected call site

Achievements and possibilities

  • Uncover an idea from the business domain. We identified a concept from the business domain and gave it a name, i.e., TaxComponent. Good names help maintainers understand the codebase better; this could be your future self or peers.
  • Opportunity to introduce polymorphic behavior. TaxComponent currently takes in cGst and sGst, which represent inter-state trade. There are different GSTs for intra-state trade and trade within a union territory. With TaxComponent in place, we can make it a superclass and have subclasses for every tax.
  • Encapsulation. Move behavior associated with the TaxComponent into the class. You may have utility functions or duplicated code that work on cGst and sGst at the moment. We can encapsulate them inside the new class.
  • Testing. Easy to unit test this one idea after encapsulation.

Next time you find a data clump in your codebase, you know how to fix it in few minutes and leave the codebase better than you found it.

For more content like this, follow me on Twitter 😉