The starting point of the journey to build analytical big data products usually requires assessing a range of Data Science ideas. Key question is how to vet these data science ideas before committing to building a product around them? An important related question is, how to make data science exploration work in an Agile environment?
Let’s call the product-focused data science exploration phase Product Design (left side of the figure below) and its main goal is to answer key business questions posed by Product Management. In order to have a very focused and efficient Product Design phase, we have to establish two key research process requirements:
Two Phases for Product Development
Scrum is an Agile framework for completing complex projects and originally was formalized for software development projects. But Scrum works well for any complex, innovative scope of work and particularly for Product Design. Importantly, the two key research process requirements outlined at the top of this post dovetail very well within the Scrum framework with some minor customizations:
The first change is to split the role of the Product Owner into two: a Product Manager, who focuses on client needs and truly represents the client in the process, and a Technical Product Owner, who focuses on the technical implementation of the product management requirements. This split serves data science exploration best since data science work is very technical, so it needs to be managed by someone (i.e. the Technical Product Owner) who has deep domain knowledge. As well this frees up the Product Manager to conduct in-depth market and client research and also focus on the long-term product roadmap.
The second customization is to setup a two week Sprint cycle with the standard Scrum ceremonies: daily standups, bi-weekly Sprint planning, bi-weekly retrospectives, backlog grooming and bi-weekly Sprint reviews, with these last two ceremonies customized to meet the specific needs of Product Design. Backlog grooming is generally more ad-hoc and should involve the Scrum team’s Senior Data Scientist meeting with the Product Manager and Technical Product Owner to discuss key research questions. This allows the Senior Data Scientist and Technical Product Owner to populate the backlog with the results of these discussions. (You might have noticed that the Senior Data Scientist takes on some responsibilities of a Technical Product Owner as well…)
The Sprint review sessions should be split into two: a Data Science Sprint Review and a more ‘typical’ Sprint Review. The former is focused on explaining the low-level, technical details to an audience of data scientists (although the meeting should be open to all). The goal of this review is to share information and insights with the data scientists across all of the organization. The latter Sprint Review is more traditional and its goal is to present to stakeholders the results of the Sprint, but from a higher-level business perspective, and is aimed at answering the questions posed by Product Management.
With respect to tools for managing Data Science Exploration Sprints, JIRA configured with a Scrum board for creating and managing the User Stories, and Confluence for collecting and sharing key information and findings are highly recommended.
How do you manage your data science exploratory work? What are your best practices? We would love to hear your thoughts regarding this.