Tidy First?, reflect next: chapter 4
Kent Beck's "Tidy First?" applications in my software development
This is the fourth iteration of Tidy First?, reflect next. A brief recap so far: I worked with Kent and learned a lot. He recently wrote Tidy First?: a book about tidying, but also when do to it. I’m studying it as applied to my recent work in bioinformatics. [GitHub].
Previously:
Chapter 4: New interface, Old implementation
I really like this one. You define the new interface you wish you had, then you wrap it around the old–well, the current implementation. This wrapping could be straightforward, like passing in object IDs, fetching the objects, then calling the old object-based implementation. This provides a web-friendly, serializable API (plain types for the win!). Or it could be a more substantial change.
I’ve seen and deployed this pattern many times. It reminds me of the ‘Strangler Fig’ pattern described by Martin Fowler:
This will involve identifying seams that we can insert into the system to allow it to be split […] Once we have seams to isolate a few components, we can start replacing them.
As Wikipedia summarizes, the pattern:
involves wrapping old code, with the intent of redirecting it to newer code or to log uses of the old code.
In other words, you identify the seam, define the interface that makes the seam work, wrap the old implementation, and gradually replace the old internals with new. (Or not, as the case may be.)
Example: stateful class → stateless functions
I used this technique in my ongoing work scaling DeepCell on GCP for cancer research. The DeepCell interface provides an Application object, encapsulating the pieces to predict cells in a tissue sample image. It’s a nice, straightforward API:
app = Mesmer()
segmentations = app.predict(input_channels)Under the hood, the API does several things as diagrammed here:
The simplicity comes at a cost:
Everything runs in a single process. The entire process succeeds (or fails) together. Aka, instance preemption means restarting everything.
NB: TensorFlow provides a remote inference API, but that imposes a server rather than being serverless
Everything uses the same infra, even though only inference needs a GPU. Aka, wasted money.
The default model is hard-coded, and requires authenticating with DeepCell’s data server & access to AWS S3. The model (like the GPU) is only relevant for prediction.
From what-if to this-here
The core proposal of “new interface, old implementation” is to start with what you wish you had, and work backwards from there. In my case I wished I had a functional interface for each step, not a class object, with fewer implicit/default values. So we started with this:
model_path = "gs://hardcoded/path"
def predict(input_channels, image_mpp, compartment, batch_size):
model = tf.keras.models.load_model(model_path)
app = Mesmer(model=model)
return app.predict(
input_channels,
image_mpp=image_mpp,
compartment=compartment,
batch_size=batch_size,
)Earlier, I elided over the defaults, here I made them explicit. Now, we have an interface that does “just” prediction, except under the hood, it does all of it. The subsequent 25 commits in the PR drive toward the “what if” question: how do we need to keep transforming this interface to get us closer to separate functions where the phases don’t depend on each other. Basically, keep inlining the class functions to use parameters instead of instance variables.
By the end, we had the three desired new interfaces:
preprocess_image(model_input_shape, image, image_mpp)
infer(model, image, batch_size)
postprocess(output_images, input_shape, compartment="whole-cell", whole_cell_kwargs={}, nuclear_kwargs={})… all of which wrapped the old implementations, with very slight rearranging. To say it again: we didn’t change anything of how these pieces actually work, but instead the function signatures and using parameters, not class data.
Given this new architecture, we can start leveraging serverless parallelism to run the steps separately. Check out this screenshot from GCP Batch, where we process 16 images (tasks) in a 5-step pipeline, processing 4 at a time.
What I love about this technique is combining working backwards from a goal, with working forwards from what’s real today.
New interface, unimplemented exception
Sometimes, the new interface simply can’t be implemented with the old code. For example, the old code might assume an implicit date (eg today) but you want your interface to support any date.
Well– just add a date parameter, and throw an exception if it’s not today.
This draws a line in the sand regarding how it should work, while also requiring callers to treat the date explicitly. In the future, when other dates are supported, all callers will continue working as if nothing changed. And new callers can vary the date, for honor and glory.
This is the fourth in a series of posts reflecting on the book Tidy First? by Kent Beck. See also: Chapter 1, Chapter 2, Chapter 3. Follow Kent’s ongoing work here:
The series introduction:
About Binary Alchemy
Binary Alchemy is the art of software engineering: part programming, part philosophy. Software melds the elegance of mathematics with the unyielding realities of zeroes and ones. Software is ethereal, less constrained than materials in its ability to metamorphose (or metastasize).
I wrote my first line of code in 1989. I have decades of experience building, operating, and re-building software spanning from MS-DOS to GCP, AWS, and Azure. ️⚙️ I’ve worked with startups and large enterprises, including Google, Microsoft, Square aka Block, Gusto, and others.
I provide software & cloud architecture consulting services. Learn more: RedwoodConsulting.io




