Some very good progress on what a Data Scientist is, and isn’t. @neilraden and @teddy777 have contributed, and here is where it started. – and where we’re at now.

Some people say:

  • The definition of a scientist is somebody who does original research and publishes in peer reviewed journals.
  • Most people who call themselves data scientists aren’t actually scientists.
  • Data scientists should be stratified depending on the sophistication of the tools they use.

A few points to make:

Science is a learning algorithm. If you’re executing the algorithm, then you’re doing science. If you execute the algorithm frequently, then you’re a scientist. Science is what you do.

Most people aren’t scientists because they don’t actually use the scientific method.

Consider two broad categories of science:

  • There’s science for the sake of science, with no practical application in sight.
  • There’s applied science.

If you’re in industry, you’re generally in applied science.

Further, there’s a difference between engineers and scientists:

  • Engineers design. 
  • Scientists discover.

Many definitions of data science involve both design and discovery. Nearly all data science involves progressive hypothesis testing and the scientific method. The outputs of data science is a product, so there’s engineering involved.

Data Science can be rightfully called an applied science, and, it also involves engineering.

***

I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca