Getting Started with Smart Predict: Concepts of Predictive Modeling
Smart Predict is an Augmented Analytics feature in SAP Analytics Cloud that helps you generate predictions about future events, values, and trends by leveraging historical data.
The predictive experience in SAP Analytics Cloud is simple. Smart Predict guides you step by step to create a predictive model based on historical data. The resulting predictive model can be used to make trusted future predictions, providing you with advanced insights to plan for your future business evolution.
Before using Smart Predict for the first time, it really helps to understand a few basic concepts of predictive modeling. So, here they are!
The Different Types of Predictive Scenarios
There are currently 3 types of predictive scenarios available in Smart Predict:
- Time Series
Defining the business problem or business question you want to address will help you choose the right type of predictive scenario.
Watch this video for an introduction to the types of predictive scenarios.
Click to go further:
Our Data Sources in a Nutshell
To create and manage predictive scenarios in Smart Predict, you need data sources. In Smart Predict, you can use two different types of data sources depending on the type of predictive model you are creating: Datasets and Planning model.
Datasets are used for any type of predictive models. They can be acquired or live.
- Acquired dataset: they must be prepared locally on your computer, and then imported in SAP Analytics Cloud.
- Live dataset: as Data is stored in the source system, you only need to connect to live data and create a live dataset.
Live datasets can be created from an SAP HANA system. The datasets used to train and apply a predictive model must come from the same data source location: if you used a live dataset to train your predictive model, you would need a live application dataset to generate your predictions.
You need different datasets depending where you are in the modeling process:
- The training dataset contains the historical data your predictive model will learn from. In this dataset, the values for your target (or signal) variable, which is the column related to your business question, are known.
- The application dataset contains current or new data that you would like to create predictions for. In this dataset, the values for the target variable are unknown.
- The generated dataset is a new dataset created by Smart Predict. It contains your predictions and any additional columns that you have requested. You can then augment your SAP Analytics Cloud stories or models using the data available in this generated dataset.
For a Time Series Predictive Model, only one training dataset is used: the training is based on the historical data and the application is done at the same time to generate the predictive forecasts.
Smart Predict can directly use an SAP Analytics Cloud planning model as data source. Once the predictive model is created, you can easily augment a private version of your planning model directly with predictive forecasts considering one or several business dimensions.
Smart Predict supports only standalone planning models. The input version must be a public version, not in edit mode, or a private version. You need a private version of your planning model to save back your generated forecasts.
Watch this video for an overview of the data sources:
Click to go further:
- Combine the power of SAP HANA and Smart Predict to generate predictions on live data
- What are Datasets?
What are Variables and their Roles?
Variables are the column values in your dataset or a dimension of a planning model. You need to assign roles to different variables to create a predictive model:
Role Description Notes Target or Signal variable This is the answer to your business question Target variable is used for Classification and regression whereas Signal variable is used for Time Series Forecasting Date variable This is the time dimension This variable is mandatory for a Time Series Forecasting predictive scenario. Smart Predict supports various date format, depending on the type of data source. Refer to the Documentation for more information. Entity variable This allows you to divide your data source into several subsections leading to more customized predictions This variable is only used in a Time Series Forecasting predictive scenario. You can optionally select up to 5 columns, dimensions or attributes from your data source, for which you want to get distinct forecasts. Influencers Influencers are variables that describe your data, and which serve to explain a target. All variables that aren’t already assigned to a role, are considered as influencers, with only the most significant ones being retained after training for debriefing. Excluded Influencers Some influencers might have too much influence on the target and should therefore be excluded from the predictive model: these influencers will not be taken into consideration by the predictive model. You should exclude influencers that are directly related to the target, especially variables that contain indirectly a target variable.
Excluding influencers that have no influence on the targets (for example <account number>) can help speed up the training
Watch this video to learn more about variables:
Click to go further:
About Training and Debriefing
Once you have specified a training dataset and identified the required variables, you can generate a predictive model.
To train a predictive model, Smart Predict splits your dataset into 2 subsets. It generates several predictive models using the first subset and applies each version of the predictive model against the second subset to test for accuracy and robustness. The best performing version becomes your selected predictive model.
The debriefing stage is where you assess the selected predictive model to decide whether or not the predictive model is ready for use. At this point, you can decide to apply the predictive model to generate predictions, improve the predictive model by adding or removing data, or create a new predictive model from scratch.
Watch this video for an overview of the predictive model training and debriefing stages:
Click to go further:
- Looking for the best predictive model
- Training a predictive model
- Check out the Ultimate Guide to Enterprise Analytics to discover how analytics can help your entire organization.
By: Cindy Venet
Cindy Venet is the Documentation Lead for Smart Predict and SAP Predictive Analytics. She started at SAP first as a project manager for the SAP Language Services, before moving to User Assistance in 2014. Her role has been to coordinate the communication between multiple teams with the aim to deliver high-quality user assistance for our products. For any feedback regarding the Smart Predict and Predictive Analytics documentation, contact Cindy