Defining Data Science (Part 2)
Some very good progress on what a Data Scientist is, and isn’t. @neilraden and @teddy777 have contributed, and here is where it started. – and where we’re at now.
Some people say:
- The definition of a scientist is somebody who does original research and publishes in peer reviewed journals.
- Most people who call themselves data scientists aren’t actually scientists.
- Data scientists should be stratified depending on the sophistication of the tools they use.
A few points to make:
Science is a learning algorithm. If you’re executing the algorithm, then you’re doing science. If you execute the algorithm frequently, then you’re a scientist. Science is what you do.
Most people aren’t scientists because they don’t actually use the scientific method.
Consider two broad categories of science:
- There’s science for the sake of science, with no practical application in sight.
- There’s applied science.
If you’re in industry, you’re generally in applied science.
Further, there’s a difference between engineers and scientists:
- Engineers design.
- Scientists discover.
Many definitions of data science involve both design and discovery. Nearly all data science involves progressive hypothesis testing and the scientific method. The outputs of data science is a product, so there’s engineering involved.
Data Science can be rightfully called an applied science, and, it also involves engineering.
***
I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca