The Day I Learned What Data Science Is
What is data science? What does a data scientist do? How do I become a data scientist? These are commonly asked questions on data science social media sites and often debated in academic circles. These can be difficult questions to answer because data science is so new and rapidly evolving. Further, the answers are heavily dependent on the backgrounds of those doing the answering. For example, a computer scientist might answer in terms of machine learning and optimization while a statistician might speak to measurement error and inference. Applied mathematicians might have yet a different take focusing on the importance of linear algebra and calculus. All are correct, which is what makes data science such a rich and interesting discipline.
My own academic training is grounded in a specific biological application domain but with formal training in artificial intelligence (AI), complex adaptive systems, and statistics. I didn’t know it at the time, but my cross-disciplinary training prepared me very well for a career in data science. I owe much of my training to my Ph.D. mentor, who was well ahead of time in insisting his graduate students receive formal degrees in statistics while earning a Ph.D. in a biomedical science. As a result, I have spent my career doing research at the interface of computer science, statistics, and the biomedical sciences. This what we today call data science.
When I was working on my Ph.D. dissertation my advisor used to talk about gaining a maturing in statistics. At first, I had no idea what he was talking about. After my fourth or fifth graduate-level statistics course it clicked. I found myself thinking through problems like a statistician. I understood the logic of how statistics worked and could for the first start to see a path forward for any problem I encountered. This, along with my computational coursework and research in AI and other areas like nonlinear dynamics, gave me the skills and confidence to become the data scientist I am today.
I had a similar epiphany about data science about 15 years ago while attending an AI workshop. Someone was presenting their work on AI and machine learning algorithms for making investment decisions. This person was not an academic and worked with a small group who invested their own money. His work involved setting up 50 different prediction algorithms on Friday to analyze historic financial data over the weekend. He would then choose the best performers and use them to make investments. The results he showed demonstrated superior performance a type of AI algorithm that is not backed by the depth of theory that popular methods such as neural networks are.
What struck me about his work is that he did not care which algorithm came out on top. He was only concerned with investment profits. It was at that moment that data science clicked for me. He was solving a problem in a truly discipline-agnostic way. At the end of the day, the value of an analytic approach is not citations or awards. The value of an analytic approach is whether you are willing to invest your own money with it.
Data science is not about theory. It is not about the decades of tradition in disciplines such as applied mathematics, computer science, and statistics. It is not even about the scientific method we champion in academia. Data science, at its heart, is about solving a problem with whatever tools you have at your disposal. My investor colleague did not care about theory or what academic scientists thought of him. He only cared about the end result. I see this as a practical approach, and we have plenty of practical problems with solutions that would help society. This of course doesn’t mean that data science does not benefit from the knowledge arising from the scientific method. What it means is that it is necessary sometimes to be creative and break disciplinary rules to achieve a particular outcome.
Data science will continue to evolve and, as with all disciplines, will likely develop its own traditions and scientific rigor. My hope is that it does not lose sight of its origins — to solve hard problems by bringing tools and methods together to achieve a practical and useful outcome. For now, it is an exciting time to be a data scientist.
Continue reading
Nvidia Unveils ‘Grace’ Deep-Learning CPU for Supercomputing Applications
Nvidia is already capitalizing on its ARM acquisition with a massively powerful new CPU-plus-GPU combination that it claims will speed up the training of large machine-learning models by a factor of 10.
New AI Writes Computer Code: Still Not Skynet, But It’s Learning
The Singularity is now in private beta. But you still have to care about syntax errors.
Google to Make Chrome ‘More Helpful’ With New Machine Learning Additions
Google is looking to make notifications in Chrome less annoying, and it wants to anticipate a user's behavior as well.
Google’s AutoML Creates Machine Learning Models Without Programming Experience
The gist of Cloud AutoML is that almost anyone can bring a catalog of images, import tags for the images, and create a functional machine learning model based on that.