Wasted data science?
Data scientists are expensive. Good data scientists are hard to find and they are often the bottle neck resource in analytics projects. Reducing the amount of time they spend on unnecessary work by, say 50% would be amazing, right? It would be amazing because it would not only speed up strategically vital projects; it would also reduce costs per project substantially. Well, we found that this is achievable. Let me tell you how. But first, let me explain the problem before I come to the solution.
When developing analytics solutions there are vast time wasting issues:
– Waste from overproduction and over-processing: Data science efforts may be performed without any clear need
– Waste through working without creating value: Data scientists have to do “detective work” in order to understand the data – looking for people who know the data, etc….
– Waste through waiting: Data scientists are unable to work on the specific project as they are waiting for data assets
– Waste from defects: effort involved in inspecting for and fixing defects found in data sets
– Waste from transport and inventory: Data is transferred to databases without any need
These sources of waste may be familiar – they are pretty much the same as in classical lean management and lean production frameworks. All can be reduced substantially by taking into account a few important rules when it comes to giving data scientists the ideal working environment. These rules are the result of the collective experience of our data scientists and commercial unit and are a manifestation of our strong commitment to agile development and design thinking in data science.
#1: Feasibility checks – one step after the other, please!
The surest way to waste everyone’s energy is to start developing right away – without thinking it through first. And there is a lot to think through: First, be clear about your use case and check the business sense of your solution. Use time and resources smartly, by ensuring you develop something people will actually use. Then, check whether the available data has the required quality and can be combined in the required ways. Use time and resources smartly, by making sure what you want to do is actually feasible. Then, build a prototype before the big solution. And use time and resources smartly, by assuring you develop something that can be expected to work once it has been rolled out.
#2: Required features only
Understand what your solution needs to deliver. Ask your users what they need. If you know this, you will not only address their needs, but you will also be able to focus. And you will use time and resources most efficiently by only developing features that are actually required.
#3: Required data only
Once you have the features available, take the time to think which data you really need. Save valuable time by only checking out data for a feasibility study that actually contributes substantial value.
#4: Data science from the use case perspective
If you are just looking at data that is required you may waste people’s time. Even when just exploring the data you need, it can be tempting to see what can be done with it in general – instead of just checking whether the operations relevant for the required features can be performed. After all, data scientists are curious people. But if you want to be fast and efficient don’t do this. Save valuable time by only investigating options to do things people really need to do.
#5: Clear roles and responsibilities
The last point is an important one. Data scientists should do data science – and that’s it. They should not do detective work, trying to get hold of people who might know some important information about a given data set. Set clear Single Points of Contact (SPOCs) for each data set and ensure availability of these persons. Save valuable data scientists’ time by sparing them the detective work.
These five rules will help you to avoid lots of waste; easily around 50% and more.
Let’s take the example of a simple data exploration; the phase in which the theoretical feasibility of a potential solution is checked by having a first look at the data concerning data quality and ways to combine data sets: Exclude one quarter of potentially relevant data sets from the first exploration by finding out that they are not relevant for core features, save your data scientists a third of the remaining exploration by concentrating on use case relevant stuff and save your data scientists one day per week of detective work by handing them SPOCs that know their business and are available.
This is one of the key reasons we are committed to design thinking and the use case first principle. It does not only ensure the quality of the delivery, it can also boost your operational excellence in data handling to new levels.
What are your experiences with this topic? Does it make sense to apply lean management frameworks to data science in your opinion? And what is your perspective on what can be gained from doing so? We’d love to hear your comments.