I recently read Why Data Science Projects Fail: The Harsh Realities of Implementing AI and Analytics, without the Hype. It is a great book that talks about what why many data science projects fail. It is a very informative read and I suggest you check it out. In this post I will give a very quick summary of what I learned for my future self.
The authors highlight that It many data science projects fail evil at big companies. They have identified 4 themes (strategy, process, people and technology) that affect the success of data science projects.
Strategy
The main strategic failings are:
- The need: there is a lack of need in the organization for data science projects this can be due to:
- A poorly defined use case that has no clear business value or solving a problem that is not currently a business priority.
- No actionable insights generated by the project that can derive business decisions.
- A solution looking for a problem; this could happen if a new hyped technology wants to be adopted without finding a proper use case.
- Measure of success: projects fail because there is no clear definition of done and a measure of success e.g., what are the expected deliverables
- Buy-in from management: projects fail because there is no buy-in from leadership and management this could be due to lack of awareness, feature of becoming obsolete or internal politics.
- data-driven Decision-making culture: data projects need a data-driven or data-informed organizations that are analytically oriented. Projects fail in organization where decision making is driven by the highest paid person.
Process
The main process failings are:
- Data: Not having enough data or having too many data issues (e.g., missing, inconsistencies) will make it difficult to build successful projects that deliver value.
- Reasonable Expectations: data science and AI have received a lot of hype which usually sets up a lot of unreasonable expectations for a new project. It is important to set realistic expectations from the start given the data state and the expected time frame
- Clean and Continuous Communication: Lack of communication between the data scientists and business partners causes misunderstanding and disagreement of what the solution is or going to do
- Project Approach/Scope: Beginning with a too wide scope of too big a project will fail due to the complexities involved. Instead start with a minimum viable product or a simple model as a start.
People
People involvement in data science is significant in every step of the process, and requires the development of many soft skills. Specifically, clear communication both in verbal and written forms.
Technology
- Proper Model: making incorrect assumptions, misapplying a model or not setting up a proper experiment design can lead to misleading results or failure to solve the problem at hand.
- Deployment to production: The data science solution and/or model is not useful until it is deployed to production and put in the hands of users. Deploying a model involves a lot of steps besides training such as availability and monitoring.