Putting SHAP into production with Amazon Sagemaker.
A howto on how to put explainable models to production Amazon Sagemaker.
- Deploying explainable models to AWS Sagemaker
Deploying explainable models to AWS Sagemaker
Nowadays fairness and transparancy are becoming more and more important for machine learning applications. Especially when your machine learning model will be used to make decisions with a large effect on people’s wellbeing such as acceptance or rejections of loan applications, fraud investigations, etc.
Building some unscrutable black-box deep learning algorithms and claiming it has an accuracy of over 90% is no longer good enough. Models and model predictions should be explainable to both regulators and end users.
However with modern approaches such as SHAP values (Lundberg et al 2017; 2018), the contributions of each feature to the final predictions can be calculated, thus providing an explanation for how the model used the inputs to reach its final prediction. The challenge is however to integrate such SHAP values into a production system, in order to provide transparent model decisions to stakeholders, decision-makers, regulators and customers alike.
Here we will be focusing on getting our model into production using Amazon Sagemaker.
It turns out that the shap
library is not included by default in sagemaker
estimator docker images, so it will not work out of the box.
Which means that in order to provide shap values as
part of your output, you have to build a custom docker container. Which is doable
but a little but more complicated than usual.
The other part that is different from standard models is constructing the response payload to include shap values. But that should be the easy part.
Github Repository
All the code and examples mentioned in this blog post are hosted at https://github.com/oegedijk/sagemaker-creditscore-explainer
notebook
It is easiest to deploy sagemaker containers, models and endpoints from within a sagemaker notebook instance. So attach the git repository https://github.com/oegedijk/sagemaker-creditscore-explainer to your sagemaker notebook instance.
More info on how to attach a git repo to your notebook instance here: https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-git-repo.html
After you have connected to the repo, go to the sagemaker/notebooks
directory
and open sagemaker_explainer.ipynb
.
ECR Container
Given that we will be deploying a custom docker container, we need to make sure we have a ECR docker registry set up.
You can go the the AWS console, find Elastic Container Registry
and create one in your AWS region. For example, to create one in eu-central-1
go
to https://eu-central-1.console.aws.amazon.com/ecr/repositories?region=eu-central-1)
The name I gave to my repository is sagemaker-explainer
.
setting ECR permissions
In order to create a custom docker container for our model we need to
add AmazonEC2ContainerRegistryFullAccess
policy to our notebook.
In the sagemaker console:
- click on notebook instances
- click on the notebook instance that you are using
- go to Permissions and encryption
- click on the
IAM role ARN
- click on ‘Attach Policies’
- find
AmazonEC2ContainerRegistryFullAccess
- add it to the notebook.
Attach additional policies:
You may have to add some additional permissions to your notebook policies, namely
"ecr:GetDownloadUrlForLayer"
, "ecr:BatchGetImage"
and "ecr:BatchCheckLayerAvailability"
.
You can either edit these manually, or paste the following json:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "allowSageMakerToPull",
"Effect": "Allow",
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
],
"Resource": "*"
}
]
}
docker-credential-ecr-login
Installing In order to log into ECR from our sagemaker notebook, we need to install
a tool called docker-credential-ecr-login
. We download and install it inside
out Sagemaker notebook with:
!sudo wget -P /usr/bin https://amazon-ecr-credential-helper-releases.s3.us-east-2.amazonaws.com/0.4.0/linux-amd64/docker-credential-ecr-login
and
!sudo chmod +x /usr/bin/docker-credential-ecr-login
Dockerfile
The Dockerfile
to create our custom container is quite straightforward and
can be found in sagemaker/container/Dockerfile
:
ARG SCIKIT_LEARN_IMAGE
FROM $SCIKIT_LEARN_IMAGE
COPY requirements.txt /requirements.txt
RUN pip install --no-cache -r /requirements.txt && \
rm /requirements.txt
So basically we take a scikit learn image (defined by a parameter) and
then install additional requirements into it (basically joblib
and shap
).
DockerImage deployment helper class
In order to build and push our custom image we make use of this nice helper class:
ecr_client = boto3.client("ecr", region_name=AWS_REGION)
docker_client = docker.APIClient()
class DockerImage:
def __init__(self, registry, repository_name, tag="latest",
docker_config_filepath='/home/ec2-user/.docker/config.json'):
self.registry = registry
self.repository_name = repository_name
self.docker_config_filepath = docker_config_filepath
self.tag = tag
self._check_credential_manager()
self._configure_credentials()
def __str__(self):
return "{}/{}:{}".format(self.registry, self.repository_name, self.tag)
@property
def repository(self):
return "{}/{}".format(self.registry, self.repository_name)
@property
def short_name(self):
return self.repository_name
@staticmethod
def _check_credential_manager():
try:
subprocess.run(
["docker-credential-ecr-login", "version"],
stdout=subprocess.DEVNULL,
)
except Exception:
raise Exception(
"Couldn't run 'docker-credential-ecr-login'. "
"Make sure it is installed and configured correctly."
)
def _configure_credentials(self):
docker_config_filepath = Path(self.docker_config_filepath)
if docker_config_filepath.exists():
with open(docker_config_filepath, "r") as openfile:
docker_config = json.load(openfile)
else:
docker_config = {}
if "credHelpers" not in docker_config:
docker_config["credHelpers"] = {}
docker_config["credHelpers"][self.registry] = "ecr-login"
docker_config_filepath.parent.mkdir(exist_ok=True, parents=True)
with open(docker_config_filepath, "w") as openfile:
json.dump(docker_config, openfile, indent=4)
def build(self, dockerfile, buildargs):
path = Path(dockerfile).parent
for line in docker_client.build(
path=str(path),
buildargs=buildargs,
tag=self.repository_name,
decode=True,
):
if "error" in line:
raise Exception(line["error"])
else:
print(line)
def push(self):
docker_client.tag(
self.repository_name, self.repository, self.tag, force=True
)
for line in docker_client.push(
self.repository, self.tag, stream=True, decode=True
):
print(line)
Getting scikit-learn image URI
So now we can get our scikit-learn image:
def scikit_learn_image():
registry = sagemaker.fw_registry.registry(
region_name=AWS_REGION, framework="scikit-learn"
)
repository_name = "sagemaker-scikit-learn"
tag = "0.20.0-cpu-py3"
return DockerImage(registry, repository_name, tag)
sklearn_image = scikit_learn_image()
Building custom image based on scikit-learn image
And use that to build and push our custom image:
def custom_image(aws_account_id, aws_region, repository_name, tag="latest"):
ecr_registry = f"{aws_account_id}.dkr.ecr.{aws_region}.amazonaws.com"
return DockerImage(ecr_registry, repository_name, tag)
custom_image = custom_image(AWS_ACCOUNT_ID, AWS_REGION, ECR_REPOSITORY_NAME)
dockerfile = Path.cwd().parent / "container" / "Dockerfile"
custom_image.build(
dockerfile=dockerfile,
buildargs={'SCIKIT_LEARN_IMAGE': str(sklearn_image)}
)
custom_image.push()
This will take some time, but by the end you should have a custom training image with shap installed. The URI will be along the lines of
‘AWS_ACCOUNT_ID###.dkr.ecr.AWS_REGION###.amazonaws.com/sagemaker-explainer:latest’
Training the Model
Sagemaker Estimator
Now that we have our custom training image, we can make use of the builtin
Sagemaker SKLearn
estimator, as long as we make sure to point it towards
our custom image:
estimator = SKLearn(
image_name=str(custom_image),
entry_point='entry_point.py',
source_dir=str(source_dir),
hyperparameters=hyperparameters,
role=role,
train_instance_count=1,
train_instance_type='ml.m5.2xlarge',
output_path=output_path,
code_location=output_path,
)
When training a custom model on sagemaker you have to define a training
directory (source_dir
) and a file that serves as the entry point
(entry_point
). This file must contain
the train_fn
and the model_fn
, predict_fn
, input_fn
, and output_fn
for the inference later on.
The training entry point can be found in sagemaker/src/entry_point.py
:
import os
import sys
# import training function
from model import parse_args, train_fn
# import deployment functions
from model import model_fn, predict_fn, input_fn, output_fn
if __name__ == "__main__":
args = parse_args(sys.argv[1:])
train_fn(args)
The training function itself is located in sagemaker/src/model.py
. The main
difference with a regular model is that we also fit a shap.TreeExplainer
to
the model and store this in our model directory:
import argparse
import joblib
import os
from pathlib import Path
import json
import warnings
import numpy as np
import pandas as pd
import shap
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
class DFImputer(BaseEstimator, TransformerMixin):
def __init__(self, strategy="median", fill_value=None):
self.imputer = SimpleImputer(strategy=strategy, fill_value=fill_value)
self.fitted = False
def fit(self, X, y=None):
self._feature_names = X.columns
self.imputer.fit(X)
self.fitted = True
return self
def transform(self, X):
assert self.fitted, "Need to cal .fit(X) function first!"
return pd.DataFrame(
self.imputer.transform(
X[self._feature_names]),
columns=self._feature_names,
index=X.index
).astype(np.float32)
def get_feature_names(self):
return self._feature_names
class NumpyEncoder(json.JSONEncoder):
"""converts numpy arrays to lists before they get json encoded"""
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)
def parse_args(sys_args):
parser = argparse.ArgumentParser()
parser.add_argument(
"--model-dir",
type=str,
default=os.environ.get("SM_MODEL_DIR")
)
parser.add_argument(
"--train-data",
type=str,
default=os.environ.get("SM_CHANNEL_TRAIN_DATA"),
)
parser.add_argument(
"--test-data",
type=str,
default=os.environ.get("SM_CHANNEL_TEST_DATA")
)
args, _ = parser.parse_known_args(sys_args)
return args
def train_fn(args):
print("loading data")
train_df = pd.read_csv(args.train_data + "/train.csv", engine='python')
test_df = pd.read_csv(args.test_data+ "/test.csv", engine='python')
TARGET = 'SeriousDlqin2yrs'
X_train = train_df.drop(TARGET, axis=1)
y_train = train_df[TARGET]
X_test = test_df.drop(TARGET, axis=1)
y_test = test_df[TARGET]
print("Imputing missing values")
imputer = DFImputer(strategy='median').fit(X_train)
X_train = imputer.transform(X_train)
X_test = imputer.transform(X_test)
print("Building model...")
model = RandomForestClassifier(n_estimators=50, max_depth=6, max_leaf_nodes=30)
model.fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
print("Saving artifacts...")
model_dir = Path(args.model_dir)
model_dir.mkdir(exist_ok=True, parents=True)
joblib.dump(imputer, open(str(model_dir / "imputer.joblib"), "wb"))
joblib.dump(model, open(str(model_dir / "model.joblib"), "wb"))
joblib.dump(explainer, open(str(model_dir / "explainer.joblib"), "wb"))
Inference
For inference we have to define model_fn
to load our model artifacts. Here
we have to make sure we also load the explainer.joblib
.
def model_fn(model_dir):
"""loads artifacts from model_dir and bundle them in a model_assets dict"""
model_dir = Path(model_dir)
imputer= joblib.load(model_dir / "imputer.joblib")
model = joblib.load(model_dir / "model.joblib")
explainer = joblib.load(model_dir / "explainer.joblib")
explainer = shap.TreeExplainer(model)
model_assets = {
"imputer": imputer,
"model": model,
"explainer": explainer
}
return model_assets
The function input_fn
reads the JSON input and returns a dictionary with a
pandas DataFrame.
def input_fn(request_body_str, request_content_type):
"""takes input json and returns a request dict with 'data' key"""
assert request_content_type == "application/json", \
"content_type must be 'application/json'"
json_obj = json.loads(request_body_str)
if isinstance(json_obj, str):
# sometimes you have to unpack the json string twice for some reason.
json_obj = json.loads(json_obj)
request = {
'df': pd.DataFrame(json_obj)
}
return request
The predict_fn
uses the input dataframe in request
and the model assets
to calculate the predictions and shap values.
We construct a return dictionary that includes both the prediction, the shap base value (comparable to an intercept in regular OLS), and the shap values per feature:
def predict_fn(request, model_assets):
"""
takes a request dict and model_assets dict and returns a response dict
with 'prediction', 'shap_base' and 'shap_values'
"""
print(f"data: {request['df']}")
features = model_assets["imputer"].transform(request['df'])
preds = model_assets["model"].predict_proba(features)[:, 1]
expected_value = model_assets["explainer"].expected_value
if expected_value.shape == (1,):
expected_value = expected_value[0].tolist()
else:
expected_value = expected_value[1].tolist()
shap_values = np.transpose(model_assets["explainer"].shap_values(features)[1])
response = {}
response['prediction'] = preds
response['shap_base'] = expected_value
response['shap_values'] = {k: v for k, v in zip(features.columns.tolist(), shap_values.tolist())}
return response
And finally output_fn
returns the response in JSON format:
def output_fn(response, response_content_type):
"""takes a response dict and returns a json string of response"""
assert (
response_content_type == "application/json"
), "accept must be 'application/json'"
response_body_str = json.dumps(response, cls=NumpyEncoder)
return response_body_str
Fitting the model
So now we simply fit our model as usual with:
estimator.fit({'train_data': train_data, 'test_data': test_data})
deploying the endpoint
And deploy it with:
estimator.deploy(
endpoint_name="credit-explainer",
initial_instance_count=1,
instance_type='ml.c4.xlarge')
(this can take some time)
Test the endpoint
We build a predictor with appropriate json serializers baked in and test the endpoint:
from sagemaker.predictor import RealTimePredictor
from sagemaker.predictor import json_serializer, json_deserializer, CONTENT_TYPE_JSON
predictor = RealTimePredictor(
endpoint=endpoint_name,
sagemaker_session=sagemaker_session,
serializer=json_serializer,
deserializer=json_deserializer,
content_type="application/json",
)
predictor.predict(test_df.sample(1).to_json(orient='records'))
Output should be something like:
{'prediction': [0.020315580737022686],
'shap_base': 0.06050333333333335,
'shap_values': {'RevolvingUtilizationOfUnsecuredLines': [-0.018875482250995595],
'age': [0.0026035737687252584],
'NumberOfTime30-59DaysPastDueNotWorse': [-0.007295913630249845],
'DebtRatio': [-0.001166559449290446],
'MonthlyIncome': [0.00046746497026006246],
'NumberOfOpenCreditLinesAndLoans': [-0.00012379074985687487],
'NumberOfTimes90DaysLate': [-0.010730724822367846],
'NumberRealEstateLoansOrLines': [-0.00029942272129598825],
'NumberOfTime60-89DaysPastDueNotWorse': [-0.004091473195545041],
'NumberOfDependents': [-0.0006754245156943097]}}
lambda + api
Now all that is left to do is set up a lambda function to forward to API call to your endpoint (Sagemaker endpoints can only be reached from within the AWS ecosystem, so you need to put a lambda function in between), and then get a public facing URL from Gateway API.
This tutorial explains the various steps and configurations quite well, so just follow the steps.
In our case we simply forward the event payload onward without any repackaging,
so our lambda_handler
is quite straightforward. I added a bunch of prints so
that we can check our logs in case anything goes wrong.
import os
import io
import boto3
import json
import csv
# grab environment variables
ENDPOINT_NAME = os.environ['ENDPOINT_NAME']
runtime= boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=event)
print("raw response: ", response)
result = json.loads(response['Body'].read().decode())
print("decoded result: ", result)
return result
test api
After setting up the Gateway API you should have a public facing API url so we can test our explainable model endpoint:
# take a single sample row and convert it to JSON:
sample_json= df.sample(1)to_json(orient='records')
# define the header
header = {'Content-Type': 'application/json', 'Accept': 'application/json'}
# API url, copy your own here:
api_url = "https://#########.execute-api.eu-central-1.amazonaws.com/test/credit-explainer"
resp = requests.post(api_url, \
data=json.dumps(sample_json), \
headers=header)
print(resp.json())
And the result should again be something like this:
{'prediction': [0.3045473967302875],
'shap_base': 0.06050333333333335,
'shap_values': {'RevolvingUtilizationOfUnsecuredLines': [0.017002446159576922],
'age': [0.006427313815255611],
'NumberOfTime30-59DaysPastDueNotWorse': [-0.007124554453726655],
'DebtRatio': [-0.006844505153423333],
'MonthlyIncome': [-0.019672520587649577],
'NumberOfOpenCreditLinesAndLoans': [-0.010014011659840212],
'NumberOfTimes90DaysLate': [0.288998818519516],
'NumberRealEstateLoansOrLines': [-0.0007571802810589933],
'NumberOfTime60-89DaysPastDueNotWorse': [-0.022098932417397806],
'NumberOfDependents': [-0.001872810544298378]}}
In this case the customer was predicted to have 30% chance of a loan delinquency in the next two years, mostly based on the number of times that they have been more than 90 days late in their repayments.
Now you can put take this output and embed it in a dashboard where human decision-makers or end customers have access to it!
Good luck building your own explainable machine learning models in AWS Sagemaker!