Data science: What is data science and how does it fail?

Data science has become one of the most important tools for organizations when they want to find value from the data they have collected. Organizations strive to optimize their operations with data science methods and it affects life in many ways today. According to Koty & Deshpande (2019), “Some of the ways that data science may affect your daily life include determining which advertisements are presented to you online; which movies, books, and friend connections are recommended to you; which emails are filtered into your spam folder; what offers you receive when you renew your cell phone service; the cost of your health insurance premium; the sequencing and timing of traffic lights in your area; how the drugs you may need were designed; and which locations in your city the police are targeting.

Definitions consider data scientists rather than mathematicians and computer science researchers who come up with new ways to analyze data. Drew Conway’s Venn diagram of data science (Conway, 2015) places data science as an interdisciplinary activity with three dimensions: domain expertise, mathematics, and computer science. User-friendly data science tools have lowered the barriers to entry into data science (Kelleher & Tierney, 2018) and many of the tools are advertised as easy to use and claim to deliver results with certainty. In this paper, I try to prove, that data science fails without multidisciplinary expertise. Without domain knowledge, mathematics, and computer science, data science projects can lead to faulty results and conclusions.

What is data science?

Data science is a compilation of techniques that strive to find useful patterns, connections, and relationships within data (Koty & Deshpande, 2019). The goal of data science methods is to improve decision-making by basing decisions on insights gained from large data sets (Kelleher & Tierney, 2018). Data science comprises a set of scientific methods, problem definitions, processes, and algorithms for extracting value from large data sets. According to Prevos (2019), “Data science is a systematic and strategic approach to using data to solve practical problems”. He agrees that data science is paradoxically not a science of data. As data science is closely related to business outcomes, data science problems are practical because pure science has a different objective to business.

How does data science fail?

According to Kelleher & Tierney (2018), one of the biggest myths related to data science is the belief that data science is an independent process into which we can only enter our data to find answers to our problems. In reality, data science requires the supervision of a skilled person at different stages of the process. Human analysts are needed to create the problem, pre-process the data, select the most appropriate algorithms, critically interpret the results of the analysis, and design based on the results. Without professional supervision, a data science project will not achieve its goals.

Koty & Deshpande (2019) argue that most often organizations that fail with data science use their techniques in the wrong context. According to Kelleher & Tierney (2018), data scientists should also have domain expertise, as most data science projects start with a real, industry-specific problem and the need to design a data-based solution to this problem. While the results of advanced applied mathematics, such as machine learning, are impressive, without an understanding of what reality these models describe, they can do more harm than good. Anyone who analyses the problem should understand the context of things and possible solutions.

Experience in similar projects in a similar industry also helps the data scientist in determining the focus and scope of the project. To be a successful data scientist, one must also be able to tell the story of the data. According to Kelleher & Tierney (2018), this story may reveal new insights that the analysis has revealed or how the models created during the project fit into the organization’s processes and what impact they might have on the organization’s operations. In addition, they state that it is not appropriate to carry out a data science project unless its results are used and disseminated in a way that is understood and trusted by those with a nontechnical background tank.

Conclusions

The use of data science does not give positive results in every project. Sometimes the results of a data science project do not produce valuable results, and sometimes the organization is unable to act on the views produced by the analysis. However, in situations where a business problem is understood and appropriate data and expertise are available, data science can provide the organization with the competitive advantage it needs to succeed. (Kelleher & Tierney, 2018) The biggest problems in data science projects are commonly related to incorrect or impure data or a lack of qualified staff. Often a data science project produces misleading results in incompetent hands. Indeed, a lot of expertise is required from data scientists, but data science projects are often done in teams whose participants may have their own areas of expertise. According to Kelleher & Tierney (2018), an experienced expert in the field understands the impact of different variables on the results and can check the rationality and analysis results of the data science process. Data science software has indeed become more user-friendly, but this ease of use may hide the fact that doing data science requires both domain expertise and an understanding of the logic and mathematics used in algorithms.


References

Conway D. (2015). The data science Venn diagram. Retrieved from: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Kelleher, J. D. & Tierney, B. (2018). Data science. Cambridge, MIT Press.

Koty, V. & Deshpande, B. (2019). Data Science: Concepts and Practice. Cambridge, Morgan Kaufmann Publishers, an imprint of Elsevier.

Prevos, P. (2019). Principles of strategic data science: creating