Summary From Thinking Like A Data Scientist(part 1)

2 minute read

Published:

Here are some of the vital points I got from the book Think Like a Data Scientist by Brian Godsey.

Link to a copy

This is a summary of Chapters 1 and 2.

If a self-driving car makes it 90% of the way to the finish line but is washed into a ditch by a rainstorm, it would hardly be appropriate to say that the autonomous car doesn’t work.

Priorities:

  • Knowledge first
  • Technology second
  • Opinions third

Use this to help settle disputes in the never-ending battle between the various concerns of every data science project— for example, software versus statistics, changing business need versus project timeline, data quality versus accuracy of results.

Often people are blinded by what they think is possible, and they forget to consider that it might not be possible or that it might be much more expensive than estimated. GUILTY!!!

Some key things to try and incorporate as a DS.

  1. Documentation- for future self and others that may work on your project
  2. Code repository and versioning.
  3. Code organization - Useful especially for Cove re-use
  4. Ask questions- from business,Software guys , PMs.
  5. Stay close to the data- Sometimes simple algorithm is all you need.

On Project Goals and Client expectations:

A notable difference between many fields and data science is that in data science, if a customer has a wish, even an experienced data scientist may not know whether it’s possible. Ensuring a data scientist communicate uncertainties to expect in a project should be one of the early TODOs.

Treat goal discussions between a client and yourself as somewhat finding common grounds. Since expectations can be high but may be unrealistic considering many factors. Sometimes, it is good to lay the foundation of final products on suggestions of what it will look like.

Important to keep and distinguish facts from opinions. Judgements should be based on facts.

You will need to learn how to manage the Salesman claims of your in-development projects, It will offen happen that client will be selling the project in development which you are not even sure will work 100%.

  • No one ever wants to declare failure, but data science is a risky business, and to pretend that failure never happens is a failure in itself.

Two Dangerous pitfalls from Data you may want to avoid:

  1. Expecting data to answer questions it can’t
  2. Asking questions from data that doesn’t solve original problem.

The beauty of negative results: It probably forces you to rethink your project towards a more informed solution.

Litmus test for the goal of a DS project:

  • What is possible
  • What is valuable
  • What is efficient.

Leave a Comment