Amazon Sagemaker Autopilot is used to build, train and deploy machine learning models. Sagemaker is useful for creating machine learning models without an in-depth knowledge of machine learning. It automatically evaluates the data, creates features and creates machine learning models. The Autopilot takes data as input, applies various machine learning algorithms and returns the optimal model.

However, Autopilot does not yet support complex machine learning like image classification and video inference. It is suitable for simple supervised learning tasks like regression and classification.

The advantages of using Autopilot are –

  • Reduces development effort and provides the best algorithm
  • Handles missing values
  • Tunes hyperparameters
  • Generates machine learning code as output
Hands-on Project With Autopilot Sagemaker Autopilot Lifecycle

Hands-on Project

Below is an example of using Autopilot for binary classification. The model predicts whether the credit card request should be granted or denied.

Exploratory Data Analysis

The data is available at ics.uci.edu. The code is available at github. The data has 16 attributes and some missing values.

Hands-on Project With Autopilot Sagemaker Binary Classification

A16 is the target variable. It denotes whether a loan should be approved or denied.

Hands-on Project With Autopilot Sagemaker Target Variable

Autopilot Implementation

It is a recommended practice to divide the data into train and test datasets. Store the datasets at S3 bucket.

Hands-on Project With Autopilot Sagemaker Train and Test Datasheets

Configure Autopilot with arguments. Specify the S3 bucket locations of input data and output folder. Autopilot creates notebooks and stores them at the output location.

Hands-on Project With Autopilot Sagemaker Configure With Arguments

The MaxCandidates argument denotes the number of models to be created. We need to specify the target variable, and optionally specify the problem type (the type of machine learning problem). Valid arguments for problem type are regression, binary classification, and multi-classification. If the target variable argument is not present, Autopilot would determine the type of machine learning problem.

Create an Autopilot job and start it. The job creates models and terminates after completion. Given below is the code for creating an Autopilot job.

Hands-on Project With Autopilot Sagemaker Job Code

The Autopilot job creates child jobs for data analysis, feature engineering and model creation. These jobs can be viewed at Sagemaker > Training > Training jobs url.

Hands-on Project With Autopilot Sagemaker Training Jobs

After completion of the Autopilot jobs, we can view the best score from all the created models.

Hands-on Project With Autopilot Sagemaker Best Score From Created Models
Hands-on Project With Autopilot Sagemaker Final AutoML Objective Metric Name and Value

The notebooks from the Autopilot jobs are available at the S3 output path mentioned during the configuration step.

Hands-on Project With Autopilot Sagemaker Notebooks

The candidate definition notebook has information on the optimal algorithm along with the hyper parameters values.

Hands-on Project With Autopilot Sagemaker Hyperparameters

For the current credit card approval project, Autopilot recommends the XGBoost algorithm.

Reports

Autopilot creates reports that aid in understanding the optimal model. It creates two reports:

1. The Explainability Report which contains feature importance charts. It displays the importance of each attribute regarding the target variable. The Autopilot internally uses the KernelShap method for calculating the importance of features.

Hands-on Project With Autopilot Sagemaker Explanatory Report

2. The Model Quality Report which contains the metrics and graphs of the selected model.

Shown below is the metrics table of the XGBoost model selected by Autopilot.

Hands-on Project With Autopilot Sagemaker Metrics Table

As the current project is a binary classification, the model quality report displays the confusion matrix ROC, and precision recall graphs of the optimal model.

Hands-on Project With Autopilot Sagemaker Model Quality Report

If you have questions regarding AWS Sagemaker or need help with implementation/support, please contact us. We provide expert AI (artificial intelligence) and machine learning consulting.