A weekend of a Data Scientist - July 6th, 2018

A weekend of a Data Scientist - July 6th, 2018

Alexander Osipenko - Lead Data Scientist at CIndicator
A weekend of a Data Scientist is a series of articles with some cool stuff I care about. Idea is to spend weekend by learning something new, reading and coding.

Interpreting Model Predictions

One of the problem in Machine Learning is that the more complex the model you create - more your model will be closer to black-box.

For example with a simple linear model with only several features, it is easy to track what impact each feature have on a final result.

Now imagine you have a complicated model with hundreds of features, with dimensionality reduction layers, and then finally complicated neural network with a non-linear approximation. In the second case, it is not easy to explain what impact each feature have on the final result. 

In a real life, the goal for a Data Scientist is to bring value to the company, this means that models we create must support some business decisions. In that sense black-box is very bad for the production model, because it brings trust issues, like can I trust this prediction from that model, if I don't understand how it works and so on.

In 2017 NIPS new approach for interpreting model predictions based on Shapley values was introduced, that supposedly is a new state of the art approach, before that there was Lime approach that allows us to interpret results.

So the goal for this weekend is to learn more about that. Start with from very intuitive understanding by listening podcast and reading medium articles. Then proceed to NIPS paper and finally take the Apractical experience with existed python libraries.  

Materials:

  1. Podcast: https://soundcloud.com/linear-digressions/shap-shapley-values-in-machine-learning
  2. Initial paper: http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions
  3. Medium article 1: https://medium.com/civis-analytics/demystifying-black-box-models-with-shap-value-analysis-3e20b536fc80
  4. Medium article 2: https://towardsdatascience.com/one-feature-attribution-method-to-supposedly-rule-them-all-shapley-values-f3e04534983d
  5. GitHub repo with SHAP: https://github.com/slundberg/shap
  6. GitHub repo with Lime: https://github.com/marcotcr/lime

Previous articles:

1.Weekend of a Data Scientist - May 25th 2018

2. Podcasts for data scientist

Medium:

https://medium.com/@subpath