Hypothesis- and Goal-Driven Development

Wednesday morning about two weeks ago I held a breakfast presentation with my colleague, Lars Dølvik, about the development process that our team utilizes in our work on a web application for one of our major clients. Our redacted slides are available here (PDF, Norwegian).

Edit: A slightly rewritten version this of post is now available in Norwegian on Bouvet's official blog.

Our process essentially follows a form of Lean Startup methodology, which makes it all about eliminating waste (i.e. effectivizing the development cycle). In our case, we attempt to do this via testable hypotheses, iterative work on the solution and validated learning. There's also elements from Goal-Driven Software Development Process and DevOps.

For a short intro to Lean vs. Agile in the context of software development, see this post by Abby Fichtner.

The project

Lars and I entered the picture earlier this year, taking over continued development, administration and operations of the solution.

Our project is approved by the customer for a period of three months at a time, basically in the form of "X consultants will work on the solution during this period, Y of which are devlopers" and so forth.

We use an agile process, where we focus on results rather than a specified (specced) delivery – somebody else's, maybe even someone non-technical, interpretation of what is needed – which enables us to meet the customer's needs to a larger extent. On the technical side our pipeline involves the usual agile stuff like reproducible dev-environments in Docker, Git Flow-style branches, linters, continuous integration via Jenkins and deployment via Kubernetes on the Google Cloud Platform.

We also have weekly demo meetings, where we go over the latest data to see whether or not we've reached our goals and prioritize our backlog.

All of this enables us to change fast as a consequence of the shortened way from idea to product. Naturally, this works well with our focus on MVPs and launching early to iterate over the solution.

This way of working is very inspirational as a developer; The collaborative decision making processes affords me more big-picture insights, and the process makes it easy to see (and understand) how my contributions affect the totality of the application.

The outline of this way of working is: Test your hypotheses (as to how to reach your goals) and fail fast if need be, so that you don't expend resources on actions that do not make the product better. This inspires creativity, and leads to lowered risks and costs for the customer.

In our case, this is all made possible by a good relationship with our customer, from which we enjoy a great deal of trust – which gives us the freedom to work according to these parameters. The process also depends upon good communications between stakeholders, such as the project owner, and the members of the development team, like developers, designers, analysts and project leaders.

The development cycle

During our cycle we plan, measure and document in order to achieve our specified goals. We use the data we collect to track and test our changes in order to assure that our changes are for the better of the product, i.e. that we achieve our goals.

We work according to three levels of goals (decisions);

  • Strategic goal (the customer's overall goal: between 10 and 20 KPIs, e.g. elevate reputation)
  • Tactical goal (the product owner's goal via strategy to achieve the strategic goal, e.g. achieve a high Google ranking to assure more organic traffic)
  • Operational goal (measure effect from our actions towards the tactical goal, e.g. optimize images to reduce load time)

This demands good communication and flow of information.

We measure and learn from what we make, in an effort to make the solution better. The point is that our hypotheses and our thoughts are only assumptions – which we verify as theories via measurements (empricism), and so on.

The steps of our development cycle include:

Identify tactical goal

We and/or the customer identify a goal, e.g. to make visitors stay longer on the site - not return to Google.

Action

Something we can do in an attempt to reach our goal, e.g. to add another level to the breadcrumbs on content pages.

Hypothesis

The reason for performing our action; why we think that the action in question will work (achieve the results we want), e.g. making more related content from the same category available, the user's will have alternative paths to explore.

Measure

How we measure the effect of our action; identifying what a successful experiment would look like, e.g. more page views and/or an increase in time spent per visit.

Here we will need to plan our launch date, and possibly collect a grounds for comparison (data) if we don't continually compare with the "original" version of the site. We will also need to define a time frame for comparison.

Estimated effect

The kind of effect we think we will be able to see from our measurements, e.g. an increase in page views on the category pages.

Mindset

The only true wisdom is in knowing you know nothing.

Socrates – allegedly...

We need to embrace Socrates' enlightened ignorance; we don't know anything unless we've "proven" it via measurements – because only then we can substantiate our claims and assumptions with knowledge, in the form of data.

Don't expect the first change to meet your goals. Work strategically with small changes continuously. Document changes with numbers along the way, so that you have history and can learn from it. Ideally, you would do this in a central knowledge base, for instance via experiment reports.

Also: Measure the right things! Don't measure too many different things at once - this may make it difficult to separate the different factors and see what is really decisive.

It is important to have numbers from both before and after the change, or it will be difficult to see any resulting progress.

In our case, where we have defined page scrolling by using sections of vertically defined content, when we (litterally) prioritize something up something else is implicitly down-prioritized. Content priority is thus a zero-sum game – be aware of how singular changes affect the entirity!

Implementing tracking

In implementic tracking, it is important to follow a standard, structure events logically according to category, action, label, value, etc. and to document everything. In our case, we keep a spreadsheet of all our events (along with comments for explanations and technical details) so that anyone on the project team can look up any event at any time.

We also document every experiment on its own wiki page. Here we write down our hypothesis and action, allong with goals and any collected statistics. We also mention launch date and experiment plan, along with dates for data collection.

On our JIRA-board, we put a link to these pages in the relevant story-cards.

Interpreting the numbers

There are many potential pitfalls in trying to make sense of analytics.

There are three kinds of lies: lies, damned lies, and statistics.

Benjamin Disraeli, according to Mark Twain

Global average temperature vs. number of pirates

One way of comparing alternate solutions is through A/B-testing. One should preferrably also compare these new post-change states with the original variety of the product, so that one does not have to think about seasonal variations, and so on.

It is also important to avoid noise. Too many changes at once will possibly pollute the statistics within a variation, as there might not be enough dimensions in the data to reflect the amount of changes. If one does not test the product before and after change is implemented, one is potentially missing something that can be learned as a consequence of not having enough data.

There is also the possibility of supplementing your interpretations of the quantitative data with qualitative research – for instance via limited user tests and semistructured interviews.

If one is available, you should also make use of a professional analyst to make sense of the data – for the human mind is inclined towards logical fallacies, cognitive biases and apophenia.

Since "specificity is the soul of narrative", as my favourite podcaster John Hodgman says, I'd like to illustrate the main points using a concrete example...

Case

The Lean learning loop

Ideas

From a user test, we discovered that there was little navigation between content; If users did not find what they were looking for, they returned to where they came from.

Amount of users (in percentage) that scrolled to lower parts of the page

Build, Product

We therfore laid a plan:

  • Action: Move related recipes upward, over comments.
  • Hypothesis: By displaying related recipes eariler on the page, we believe that users will be more likely to click elsewhere in the solution, rather than going back to Google
  • Measure: More clicks on related recipes
  • Current numbers: Between 5000 and 7000 clicks per week
  • Estimated effect: 180% increase, about 8000 more clicks
  • Timeframe: Check status 1 week after launch

A possible solution to our problem

Measure, Data

  • Tactical goal: Increased engagement Make visitors stay on the site, and not return to Google
    • More page views per visit
    • Increased time spent on page per visit

Our numbers

From our data, we can see the following:

Avg. time spent per visit (HH:MM:SS) Avg. pages seen per visit
Change + 00:00:02 - 0,02

Learn

Percentage of users that triggered our various events

  • By moving content up, we down-prioritize other content, e.g. comments
  • The amount of users that see the comments has gone from 19,8% to 9.01%
  • There were fewer comments added this Semptember than the year before

Ideas

  • Action: Make a shorcut to the comments available from the main content of the recipe.

  • Hypothesis: The users can not easily know that there are comments on the recipe. It appears that the page is over when one reaches the related content.

  • Measure: Counteract negative effect from changed position on the page

  • Current numbers: From 280 to 145 new comments

    Sketch of a shortcut to the comment section

In summary

Measurements and a focus on goals lets us verify that changes are for the better – to check whether we reach our goals; The Lean-startup methodology sums this process up as "build-measure-learn". What we really care about are the results.

For this to work well, we need a tight dialogue between developer and analyst, as well as good communication between the development team and the product owner. It is extremely important that both the customer and the entire team is down with this way of working and understands the strategy.

It is very helpful to document everything related to testing and to keep the information structured.

This way of working, in our experience, speeds up the development cycle, and the fact that we have data about changes makes things very exciting – we can measure the way our changes impact the users. It's motivating to see the progress we make, to see that we achieve our goals. It's also exciting to see how our actions affect the total solution. It makes it easier to feel ownership of the changes; there's less codemonkeying and more power of definition for developers.

The example case above is more or less a translation of some of the slides from a presentation held at an internal event at Bouvet by our project leader, Jasmine "The Lean Machine" Garry.

Newer post
A/B Testing with React
Older post
Presentation about Transfer Learning