Dear Dinu,
first, let me say that time series prediction is the most difficult thing to start with when it comes to NN since then you have to deal with time variant networks where the output does not only depend on the input but also on the history of the net (i.e. the calculation steps it already has performed since it was reset the last time).
There is one simple solution to get around a time variant network: You can equip the network with more than one input, i.e. one input for every past value. Let's assume you want to predict the next value based on the four previous ones then you could create a net with four inputs (t, t-1, t-2, t-3) and one output (t+1). This is still a time invariant network although you use it with data that represents a time series.
The disadvantage of this approach is that the net can only leverage from the past 4 values in order to predict the next one and that there is more efforts required for preparing the data accordingly. Still, this can be a good thing since it is much easier to handle!
However, for the moment let's assume you want to use a real time variant network for your solution.
In this case I have automized the most critical steps for you through the script file I have attached. You need to adjust some settings in the script file in order to match your needs. However, you may want to give it a try first as it is:
- Extract the attached file, it will create a separate folder.
- In MemBrain select 'Scripting' - Execute Script' and select the file 'TimeSeries.as' from the just created folder.
Here is what the script does:
- Load a training and a validation lesson
- Train five different net candidates with these data for a specified training time and determine the best candidate
- Merge training and validation data into one single new lesson that has all data points from the start up to the present
- Re-train the best net candidate with this combined lesson
- Record the final output of the net on this lesson plus an adjustable number of future extrapolated data points into a CSV file.
The data files (Train.mbl and Validate.mbl) in the example are taken from a sine wave. The script is adjusted to extrapolate 4 points of the sine wave into the future.
After the script has executed, open the file 'Extrapolated.csv' in Excel, add a column to the left in order to have an X-Axis. Then you can easily create a plot of the input and output data to the net over the whole time series, including the four extrapolated data points in the very end.
To perform the same with your data you need to do the following:
- Prepare at least one net with one input x(t) and one output x(t+1).
- Next, you need a training and a validation data set. Each of them needs to have two columns:
- X(t)
- X(t+1)
It is very important that the validation data makes up only a few data points and the majority of the data goes into the training lesson. Furthermore, the validation data points must take over seamlessly where the training data end, i.e. the validation represents the valid appendix of the training time series up to the last known data point in time.
In MemBrain, load your net, open the lesson editor and click 'Names from Net'. This sets up your lesson editor with the same I/O number and names as the net. Then, in the lesson editor select 'Raw CSV files' - 'Import Current Lesson (RAW CSV)'. Select the training csv file which is then loaded. In the Lesson Editor, select 'Lesson Files' - 'Save Current Lesson As...' and save the training data as 'Train.mbl'.
Do the same for the validation data, save it as 'Validate.mbl'.
Next, you need to ensure that the nets you want to use are adjusted to the correct Normalization Range:
Open each net candidate, select both the INPUT and the OUTPUT neuron, press ENTER, click on 'Normalization', enable the check box 'Use Normalization' and enter your specific valid data minimum and maximum limits according to the range of your data. Leave some headroom for future data!
- Copy the files you have created into the directory where the script resides
- In the script 'TimeSeries.as' adjust the following according to your needs:
-- NUMBER_OF_NETS
-- NET_FILE_NAME_BASE
-- EXTRAPOLATE_COUNT
-- LIST_SEPARATOR
-- DEC_SEPARATOR
Note that all your net candidates must have a name that comprises of the value assigned to NET_FILE_NAME_BASE plus a number that starts at '1' for the first net candidate.
Please note that most of the steps above are one-time steps. The onl ything you need to do for every data set is the creation of a training and a validation data set.
Instead of using the names 'Train.mbl' and 'Validate.mbl' you can also use different names. You can adjust the names in the script file instead:
- TRAIN_LESSON_NAME
- VALIDATION_LESSON_NAME
If you want to learn what happens in the script in details try to read through the code. Even if you are not a programmer you should be able to get an idea of what is going on according to the comments in the script and the names of the functions that are called. Watch out for the function 'void main()', this is the main program and entry point of the script.
When reading MemBrain script files then best use an Editor with syntax highlighting for C/C++ or Java. The scripting example download on the MemBrain homepage includes a description of how the free editor PSPad can be set up to best display MemBrain script files and even to support debugging them.
Note that one of the called functions actually is not located in the file 'TimeSeries.as' but in the file 'TrainAndExtrapolate.as'. This has historical reasons, these were separate scripts in the past. You could also cut/paste the function into the actual executable script file ''TimeSeries.as' and delete the second script file.
As you see there is quite a lot to do to handle time variant networks properly, that's why doing this manually is both tedious and error prone and I would have to write tons of pages to get this explained.
Hope that gives you a good start!
Regards
- TimeSeriesPrediction.zip
- Script that trains and validates several net variants for time series prediction, selects the best one and predicts an adjustable number of future values. Demo data contained in the sample is a sine wave.
- (10.61 KiB) Downloaded 1178 times