Hi,

I tried running two of the Predix Machine Learning Analytics for available specifically those two for Predicting Energy Price (Training and Predict). I cutout the sample data set provided to produce a training set for training the model and removed the last 24hrs that has a energy price input (the last once has a value of "NA" so I decided to cut it off until the last 24 entries that has energy price that I can validate in the prediction analytic). The Training analytic will output a key variables and their coefficients which I then use in the prediction analytic as stated in the documentation provided as well. Next, I run the prediction analytics (Energy Price) using the cutout data from the original data set which should give me an output of the predicted next 24hrs energy price. However the output did not match the actual value of the 24 entries I provided from the cutout data set. This somehow gives me the impression that the model generated from the training analytic is bad which resulted to poor prediction result. Can anyone enlighten me on how this thing should work more accurately in Predix or is this the actual way the Predix works in terms of ML prediction? Is there anything wrong I did in the previously mentioned step?

Thanks

prediction.jpg
(340.1 kB)

screen-shot-2016-11-18-at-41620-pm.png
(450.2 kB)

Comment

**Answer** by Manohar Swamynathan
·
Nov 20, 2016 at 11:03 PM

As you might be aware that, to predict continues numbers (such as energy price) regression based models are used. A well fitting regression model's predicted values will be close to the actual values. Note that if a statistical model gives you 100% accurate prediction, then it is considered to be over-fitting to the data used for training and might not give you better results on an unseen data. You can read more about over-fitting here!

Root Mean Squared Error (RMSE) is one of the good metric to assess the performance of a regression model. RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. The lower values of RMSE indicate better model fit.

Mathematical notation :

where, yj is the actual value, ýj is the predicted value, and n is the total number of observations

Below is a sample Python code that I used to calculate RMSE on your data.

```
# import libraries
import numpy as np
import pandas as pd
# load data
df = pd.read_csv('data.csv')
# function to calculate rmse
def rmse(predictions, targets):
return np.sqrt(((predictions - targets) ** 2).mean())
rmse_val = rmse(df['Actual'], df['Predicted'])
print("rms error is: " + str(rmse_val))
# output
# rms error is: 0.00606619454222
```

OR

Python's sklearn package has built-in function to populate RMSE.

```
from sklearn.metrics import mean_squared_error
from math import sqrt
print sqrt(mean_squared_error(df['Actual'], df['Predicted']))
# output
# 0.00606619454222
```

From the RMSE value we can say that there is a deviation of 0.006 between actual and predicted, which says that the model has performed well on your data set.

Hope this helps.

rmse-formula.png
(3.9 kB)

Thank you very much Manohar for your reply, this is very informative and helpful. Really appreciate it

Please allow me to ask a follow up question.

Is the variance in my data (actual vs predicted) typical or the training set I used is too small?

I used the provided input data from the Energy Price Analytic catalog which has almost 3yrs worth of data.

I hope to hear it from you again, thanks

the variance of 0.006 is only for the 24 data points for which you had prediction values, and this will be different for the full data set. In real world, no statistical model will give you 100% accurate prediction, generally these models are there to help us to take informed business decision based on data. So acceptable model accuracy depends on the business context.

- Legal
- Cookies
- Forum Terms
- Contact Us
- Copyright © 2017 General Electric Company. All rights reserved.