Supercrunch Blog

Werner Colangelo December 18, 2017 Agile, Project Management

Building Data Products Using Agile

Part 3

In Parts 1 and 2, we focused on the Product Design Phase of Building Data Products Using Agile. In Part 3 we will focus on the Product Build phase, the right hand side of the diagram below.

Once you have passed the ‘RAT’ test of the Product Design Phase, it is time to move to the Product Build Phase. What does this entail?

From a resourcing perspective you should consider staffing your product team as follows:

  • a Product Manager (PM), responsible for client feedback, managing and planning the mid-to-long-term product roadmap and providing the overall strategic direction for the product
  • a Technical Product Owner (TPO), responsible for translating the features that come from Product Management into User Stories, managing the short-to-mid-term product backlog, and tasked with eliminating technical debt and data science debt
  • a Scrum Master (SM), for managing ceremonies and coaching the Scrum Team

As a product moves from the Design to the Build Phase three other resourcing considerations should be made:

  • ramping down Data Science involvement, but never eliminating it, since after the production Data Science models still need to be built and tested
  • ramping up to a full team of Software Engineers, Site Reliability Engineers (SREs, or more commonly known as DevOps) and Quality Assurance (i.e. a fully staffed Scrum Team)
  • co-locating the Scrum team in order to make communication and collaboration as seamless as possible

The Product Build Phase follows the standard Scrum methodology with the normal Scrum ceremonies: Daily Standups, Backlog Grooming, Sprint Reviews, Retrospectives and Sprint Planning. The product backlog should contain all the user stories for the whole team but should allow for some specialization to occur. In a typical Scrum setup, team members do not specialize in particular tasks, but given the nature of a Data Product Scrum Team, some specialization between the Data Scientists and Software Engineers can occur. One mechanism to allow and manage this specialization is to split single backlog into two (via a JIRA filter) so that there are Software Engineering and Data Science backlogs. (Running such as setup is a blog post in itself…)

Coming out of the Product Design Phase, the team should be focused on building out a Minimum Valuable Product (MVP). Of course, once the MVP is launched, the Team should be focusing on Potentially Shippable Product Increments (PSPIs)…As an aside, one could make the distinction between an MVP and an MSP, where the latter (a Minimum Sellable Product) is the minimum set of features that a client would be willing to pay for. This is different from an MVP where by definition it is a minimum set of features that works, is usable by a client, and provides some value. Note that in many cases, an MSP == MVP.

When building a cross-functional team, especially one that consists of Data Scientists and Software Engineers there are several things to consider. First, and most importantly, do the Data Scientists and Software Engineers / Technologists have any experience in working together? A (possibly unfair) generalization is that Data Scientists typically do not have much experience in formal Software Engineering methodologies. Key processes and best practices in Software Engineering, such as source control (i.e. version control, code branching, code merging) and writing code and data science models that can scale are not typical concerns for a Data Scientist (they have other very useful skill sets…). As such, it is very important for the Scrum Master to help provide the necessary training and guidance to the full Scrum Team so that Data Scientists and Software Engineers can work collaboratively (and seamlessly) as possible…


How have you setup your Data Product Build teams and processes? We would love to hear!