Creating a Technical Debt Payment Plan - What Do We Fix?

In the first post, I laid out an overview of creating a technical debt plan, and then outlined some steps to determine whether or not to address problems in the system.

What Do We Fix?

It's likely the system under discussion has a significant number of problems; however, it makes little sense to try and fix all those issues. This next phase drives the team to focus on the highest-value, highest-risk areas of the system. 

The Outcome

Work in this phase results in a list of high-level business value propositions. Depending on the methodologies and practices you use, these propositions may be broken out into Epics, Business Value Stories, etc. Rough order-of-magnitude estimates need to be provided to give the business enough information to make good decisions on how to prioritize work.

Building The Goals

Remember the critical theme: Business drives the goals. Delivery, support, and other roles should have input, but we're trying to ensure our organization is improving its revenue!

When looking at the overall system, first consider broad areas: Can we lower risk so we stop losing revenue through lost new sales, avoid getting sued (again), and stop losing existing customers? Can we improve our overall value so we're spending less on support ticket time, or we're gaining new sales and new customers, or we're able to ship new features faster?

As a first step, after deliberation, the entire team will bring up some areas to consider, driven both by technical and business aspects.

Some Examples

I regularly illustrate this process using an online shopping site as an example. It's a well-known concept, so it's easy to make examples everyone can understand. For discussion use three specific examples the business might want to address:

  • Login very infrequently takes the user to a different user's account. Appears to happen to one random user of 200,000 on the site and only once a month. That's bad. Privacy and security are at risk, and bad actors might misuse user financial details.
  • Product recommendation engine shows wildly inappropriate products. We lose revenue because instead of showing reasonable, useful recommendations the engine displays things like snow tires when the shopping cart contains light bulbs, and the user lives in Florida.
  • Users say product searches take a long time. Direct feedback from our customers. We lose revenue and return customers as users get frustrated and go somewhere else.

This set of examples may give the business folks enough information to prioritize before doing further investigative work.

Business May Make Decisions Technical Folks May Not Agree With

At this point in my talks, I play the role of a business decision maker, and I purposely use a controversial stance: I decide not to address the login-related security issue and instead focus on the performance and recommendation engine issues. This example is where most conscientious developers' faces get red, and their heads start to explode. "But it's a security problem! Users' privacy and financial information may be at risk!"

This outraged view from a developer is entirely understandable. It's one I agree with, frankly. However, from the business's perspective, the risk is tiny if it's just one user per month per 200,000 other users. The business has made an informed decision to accept the liability risk of the situation and instead focus on trying to improve areas that are losing the company revenue: performance and recommendations. 

It's worth repeating: the business drives the goals. Technical considerations for fixing the system must support the business goals. The business creates the high-level goals, and then the team starts to consider ways to meet those goals.

Investigating Potential Fixes

At this point, it's time to start digging into lower-level views of the system. As the team begins spelunking, I like to keep three major categories in mind for potential fixes: Does the fix require a small patch (perhaps a few lines of code), is it a mid-sized repair (swapping out a data provider, perhaps), or does the fix require burning part of the codebase to the ground? (Rebuild the entire product recommendation engine, e.g.)

Again, the focus is on building high-level features, epics, etc. We're trying to get an understanding of the magnitude of the fix, what it will take for proper testing, and getting our hands around potential other risks in the codebase.

Great places to start investigating include:

  • What's the current state of the codebase?
    • Open and closed bugs
    • Static code analysis (dependencies, complexity, coupling, etc.)
    • State of any automated test suites
  • What do current and past support tickets tell us?
  • What's the gut feel of delivery, support, and ops staff who've had to work with the system? Data is great, but don't ignore your gut feeling!

Create Epics and Estimates

At this point, the team should have enough information to complete some high-level description of work to be done. Yes, you must provide estimates. I know, I know, the #NoEstimates crowd is vociferous about skipping estimates and just doing work. (Yes, I'm trivializing and exaggerating.) I don't care. The business needs to have a rough idea of how complex and challenging work might be so that they can prioritize which set of work to do first.

We'll ask for estimates and treat them as deadlines!

In many organizations, estimates are broken. Badly broken. Fixing that trust is something far outside the scope of this blog post. Let's just level-set that estimates should be taken as honest, informed guesses which aren't treated as commitments or promises by management.

Focus on Small Slices of Value

An essential factor to consider here: keep the epics or value stories as thinly sliced as possible. We want to be able to size our work so that we can accomplish significant pay-down of technical debt while continuing to deliver ongoing value. That means we need epics that are small enough to interleave with other work or hand off to a separate feature team that will focus on this specific effort.

You will want to have another look at how you've addressed the features if business balks at the amount of work required. In the case of the site performance issues I used as an example above, a team could first do a discovery phase to identify the causes of slow-performing use cases, then create an improvement plan that would allow further prioritization of rework. This could result in a list of improvements with rough effort estimates, i.e., move to dedicated database server--two weeks; move search into a separate application container--four weeks; reassess indexing strategy--two weeks; and so on.

The effort of ensuring tech debt paydown is in small slices is the most critical aspect of this entire work effort. Everything around the work laid out in this entire blog series (and my conference talk on the same subject!) hinges on this specific result. The point is to make small units of work that the business finds of great value and can be interleaved with ongoing other valuable work.

This is exactly how you accomplish your goals of trying to pay down technical debt while delivering other value!

The Outcome

Work performed in this phase results in a set of prioritized epics or features that business agrees, for now, are what they want to focus on. The team agrees they've done enough investigating to ensure the scope of the features is acceptibly accurate and reasonable regarding complexity, testing needs, other resources, external dependencies, etc.

Up Next

Next up I'll walk you through critical aspects like culture change, building up backlogs, and thinking about how to change your culture. Did I mention culture change?

Need Help?

Is any of this sounding familiar? Are you struggling with this situation yourself? I'd love to chat with you and see if I might be able to help! Drop me a line:

No Comments

Add a Comment