Machine Learning with Amazon SageMaker

Click to enlarge

Background

One of our clients is in the grocery business. They have a need to accept as input millions of grocery items (e.g. “1 quart” of “Sour Cream” from “Sysco”, item number “7074419”) and categorize them into a standard set of categories (e.g. Dairy, Produce, Beverage, Beer, etc.) Given the vast amount of data to be processed, they want to use machine learning in order to perform the category assignment.

We have prototyped this process using the SageMaker software from Amazon. Here is an overview of the process.

How To

Connecting to AWS

To connect to AWS, start here.

 

In this example, jjmn-net refers to the developer that worked on this task and jg is the name of the user that jjmn created for me.

Navigating to SageMaker

Once logged into AWS, search for the Amazon SageMaker tool and navigate to the SageMaker Canvas:

This provides a good overview of how SageMaker works:

To continue to SageMaker, click on Canvas in the SageMaker Domain navigation:

 

and then select Canvas from the Launch app dropdown menu:

 

Using SageMaker

On the subsequent page, click New model:

 

and name the new model Predict SIMPLE category:

 
 

Training the model

On the next page, select the trainingdata(3).csv file and click Select dataset.

 

where the training data looks like this:

 

The last column contains the SIMPLE category that the given item belongs to. It is used to train the model regarding which categories correspond to which items.

On the next page, we Select a column to predict. Select the SIMPLE Category column.

 

After clicking Standard build, the analysis process begins.

 

That process takes approximately two hours to complete. Once it is complete, a summary of the analysis appears:

 

You can review the information in the Overview and Scoring tabs for insights into the analysis:

 

Predictions

To make predictions using the model, click the Predict button and then Select dataset from the subsequent page:

 

On the following page, select testdata(2).csv, which looks like this:

 

Note that the SIMPLE Category column is blank, since that's the column we want to predict. Click Generate predictions and you'll see the following results page:

 

To view the results, click the View icon:

 

You will see the predicted value for SIMPLE Category for each row, along with the probability of its accuracy. Click Download CSV to download all values, including the predicted values.

Source: https://www.finitewisdom.com/people/joshua...