Using Amazon Forecast To Predict Sales For The Coming Weeks A Case Study

Introduction

  • Forecasting is the science of predicting the future through historic demand
  • Forecasting is used not only to meet customer demand but also to optimize the supply chain to minimize the cost
  • Amazon Forecast provides more accurate forecast of upto 50% more accurate than traditional systems
  • Amazon Forecast provides machine learning capabilities to developers who don’t need to know any machine learning at all. It includes deep learning models in its processing capabilities
  • It helps to build inventory planning solution
  • Amazon Forecast provides statistical and machine learning algorithms to better predict future demand
  • Can be used in tandem with other software to build larger solutions

You need to provide your historic demand data to Amazon Forecast and it handles the rest. It will create a data pipeline to ingest data and then train a model, provide accuracy metrics and create forecasts. But in the background, it’s doing a lot more. It’s identifying features from your data set and applying the best algorithm that suits your data type and tunes those forecasting models. Then it holds that model so that when you forecast, you can easily query the service, for further computation.

Amazon Forecast can be used for a number of functions apart from inventory planning, including work force and web traffic predictions, this requires defining of the forecasting domain to compute the appropriate model.

Key steps for building your inventory planning system

Key steps that you can use for building your inventory planning system

  1. Identify your business objective and business use case
    1. For some customers its increasing the revenue
    2. For others its decreasing the cost
  2. Identify datasets that are needed for those business use cases. It is key to ensure the data set and csv file you’re using is accurate, as any discrepancies are going to cost you, as Amazon Forecast follows a pay-as-you-use model, so the more time you spend on generating the output or recalling the function the more expensive it gets.
  3. When you train and optimise your model, they look at accuracy and tie back that accuracy to the business use case. This involves the use of predictors, AutoML is the preferred route for the uninitiated, or one can chose an algorithm if familiar with the process.
  4. Deployment involves the exporting of the solution generated by Amazon Forecast that gives the user insights on how to temper their production or adjust their inventory for maximum benefits.

First I will walk you through the process of generating forecast from aws console and then walk you through the steps to generate this programmatically.

Business requirement

Create suggested orders using AWS Forecast for the customers based on their average depletion rate of the products and quantities on hand.

Our input data was the historical sales data(customer, product id, quantity and date) and the output was predicted sales data for the next few weeks (customer, product, quantity and predicted date)

Steps to generate forecast from AWS console

1.Create a dataset group.

Creation of dataset

fc2 1

Forecasting domain selection

AWS Forecast supports a range of domains to give you versatile data output. AWS supports the following forecasting domain.

fc3 1

2.Create dataset

fc4 1

Dataset creation

In case you use the schema builder, the UI is pictured below

schema builder 1
fc5 1

Attribute definition

This must be the same as your csv file, if you have 5 attributes in the file, you need to define 5 attributes in the correct sequence.

AwsImg18 1

Detailed attribute overview

fc6 1

Importing your data file

fc7 1

Defining IAM role

This will take awhile to complete the process.

fc8 1

3. Train predictor

fc9 1
fc10 1

Frequency definition

Amazon Forecast supports the predictors outlined in the next screenshot. We need to test and determine which model will suit our business requirements best.
The ideal way to determine this is:

  1. Let’s say you have historical data for 2 years [2018 Jan 01 to 2019 dec 31]
  2. Separate the historical data with
    1. 1.5 year [2018 Jan 01 to 2019 June 30]. We will input this data into forecast. Based on this data this will generate forecasts for next month. In this case it will be 2019 July
    2. 0.5 years [2019 July 1 to 2019 Dec 31]. We will use this data as reference to test the forecast and figure out which algorithm and percentile works best for us
fc11 1

Algorithm selection

fc12 1

Forecast dimensions

Only edit these parameters if you understand algorithms and predictors
fc13 1

Defining pipeline

fc14 1

Supplementary features

This step will take some time and will complete the process.
fc15 1

This will start the generation of the forecast, aka the moment we’ve been waiting for.

4. Create Forecast

fc16 1

Defining forecast parameters

 

A note on prediction quantiles: By calculating prediction quantiles, the model shows how much uncertainty is associated with each forecast.

 

For the P10 prediction, the true value is expected to be lower than the predicted value, 10% of the time. For the P50 prediction, the true value is expected to be lower than the predicted value, 50% of the time, similarly for P90. If you think you might face a problem of storage space or cost of investment if you overstock the item, the P10 forecast is preferable. As you will be overstocked only 10% of the time else you would be sold out every day.

 

But on the other hand, the cost of not selling the item is extremely high or the cost of invested capital is low or it would result in huge amounts of lost revenue you might want to choose the P90 forecast.

 

This step will take some time. This is the final step.

5. Now we can look up the forecast
fc18 1

Our output, aka the forecast model

Steps to generate forecast programmatically

There are 2 ways to get the forecast.

  1. AWS.ForecastQueryService [Retrieves a forecast for a single item, filtered by the supplied criteria.]
  2. Export job [To get the full forecast]

We have used the second option in our case We chose a serverless stack to generate forecasts. As serverless stack will allow us to scale as per the needs of the business and also our cost will be restricted according to our usage.

Steps we followed to generate the sales forecast

There are several steps involved in the forecast generation process. The current step information is stored in dynamo db. We have created a generic lambda function called “StatusCheckActionForecast” which knows how to check the status of a current job and also knows how to call the action for the next job based on the step information stored in the dynamo db.

    1. Upload sales history csv file to S3 (Bucket Name: SalesForecast/saleshistory). The S3 bucket represents the data lake from which we draw information from
    2. We have created an S3 Trigger to call a lambda function (StartForecastFunction) whenever a file is uploaded to this path SalesForecast/saleshistory. StartForecastFunction will do the following (The trigger is a notification that invokes a function):
      1. Create dataset in AWS forecast. The function is as follows:
          createDataset(params = {}, callback) ⇒ AWS.Request
      2. Create dataset group in AWS forecast
          createDatasetGroup(params = {}, callback) ⇒ AWS.Request
      3. Create dataset Import in AWS forecast. (The dataset import is time consuming and hence we have used sqs to call a function periodically to check the status of the import job)
          createDatasetImportJob(params = {}, callback) ⇒ AWS.Request
      4. Trigger SQS (forecastSQS)
    • Trigger Sqs( forecastSQS) – calls StatusCheckActionForecast Lambda function – to check the status of Dataset Import job.
      1. If the job is Done: we move on to the next step.
        1. Create Predictor in AWS forecast (This is a time consuming process and hence we have used sqs to call a function periodically to check the status of the predictor)
            createPredictor(params = {}, callback) ⇒ AWS.Request
      2. Trigger SQS (forecastSQS).
      3. If the job is not Done: We again call Trigger SQS (forecastSQS).
        1. We trigger this every 5 min till the job is done.
    • Trigger Sqs forecastSQS – calls StatusCheckActionForecast Lambda function – to check the status of Predictor job.
      1. If the job is Done: we move on to the next step.
        1. Create Forecast Export
            createForecast(params = {}, callback)⇒ AWS.Request
        2. Trigger SQS (forecastSQS). [This process will also take some time]
      2. If the job is not Done: We again call Trigger SQS (forecastSQS).
        1. We trigger this every 5 min till the job is done.
    • Trigger Sqs forecastSqS – calls StatusCheckActionForecast Lambda function – to check the status of Forecast job.
      1. If the job is Done: we move on to the next step.
        1. Create Forecast Export
            createForecastExportJob(params = {}, callback) ⇒ AWS.Request
        2. Trigger SQS (forecastSQS). [This process will also take some time]
      2. If the job is not Done: We again call Trigger SQS (forecastSQS).
        1. We trigger this every 5 min till the job is done.
  • Trigger Sqs forecastSqS – calls StatusCheckActionForecast Lambda function – to check the status of Forecast Export job.
    1. If the job is Done: we move on to the next step.
      1. This is the Last step. At this step we have all the forecast data exported to SalesForecast/exportforecast path.
      2. We process the files based on our business requirements.
      3. Save all the data in Dynamodb table [ForecastResuts table]
    2. If the job is not Done: We again call Trigger SQS (forecastSQS).
      1. We trigger this every 5 min till the job is done.

Challenges

  1. Processing huge data was a challenge. Lambda functions have a 15 minute timeout, this limits our processing capabilities.
    1. Export job – Exports the forecast file to S3. i.e., a single file can have 8K records. So, we had to split the files with 300 (any small number based on your business requirement. i.e., how much data can be processed in 15 min) records each and processes the forecast file and saves it to dynamodb. We use dynamodb as it uses unstructured data, similar to the JSON schema we use in the dataset.
  2. Keep a tab on the AWS cost when you are building and testing your solution. In our case we overlooked a bug in the code checking the status and creating triggers and updating dynamodb. This bug resulted in making a large number of troubleshooting calls and hence our AWS cost shot up during our testing. It is key to ensure your dataset has no errors, and that you choose the appropriate models and trainers as well.

Read More Articles

 Contact Us Now

Talk to us to find out about our flexible engagement models.

Get In Touch With Us