Page 1 of 1

Cross Entropy Error: User Defined Issue Maybe?

Posted: Wed 9. Mar 2011, 18:40
by MrCreosote
QUESTION: If user specified error function, aren't user specified partial derivatives of the error function also required for back-propagation?

First, it is difficult to find the actual formula for this error type. The one that I like the best is how MathWorks tells users how to user define it since it is not in their toolbox. http://www.mathworks.com/support/soluti ... on=1-1A4F6

NOTE: There is one requirement for this calculation and that is SoftMax Activation must be used on the output nodes. (This is a map into (0,1) with the additional constraint that all the outputs add up to 1. ...something about getting probability outputs so you have the right units for entropy (??? this is beyond my pay grade) )

From their definition:


The cross-entropy error, C is then expressed as follows:

C = - sum [all cases and outputs] (d*log(y) + (1-d)*log(1-y) )

The derivative of this error function for a given output and training example (this is the value we actually back-propagate) is as follows:

dC/dy = - d/y + (1-d)(1-y)


_____________________________

Definition of C in Membrain's new Scripted Elements would define the function CalcaulteNetErrorSummand as

return = - d*log(y) - (1-d)*log(1-y) where d = targetActivation and y = Activation

This appears to be the only required input which begs the question, do we also need user defined partial derivatives?
  • Does Membrain use them for BP?
  • Does Membrain calculate them numerically from the function itself?
Thanks so much for the new feature,
Tom

PS. Cross Entropy Error seems to be everywhere starting in the early 2000's. Would it be possible to build it in and give it a Radio Button?

Re: Cross Entropy Error: User Defined Issue Maybe?

Posted: Wed 9. Mar 2011, 22:17
by Admin
Tom,

without talking about any specific net error calculation method we seem to have a misunderstanding here:

For BP it's not the derivative of the Error function that is required. Rather than that it is the derivative of the Activation function of the corresponding nodes (neurons).

Moreover, the Net Error calculation function in MemBrain does not directly influence the training:
For BP purposes the simple delta between target and actual activation is used. The Net Error calculation is just something to obtain one single scalar error representation for all outputs of the net and all patterns in the data set.
As such it influences the point where the training is stopped (by comparing it to the target net error) but not the training itself.

Really wondering why MathWorks tells something about the derivative here, possibly need to dive deeper into your post and the link.

Different thoughts here? Then please let me know so that we can clarify.

Regards and many thanks for the feedback!

Re: Cross Entropy Error: User Defined Issue Maybe?

Posted: Wed 9. Mar 2011, 22:55
by Admin
Tom,

I certainly was wrong in what I said above, forgot about this after such a long time...

The derivative of the error function is certainly used if this is the function that shall be minimized. As such, the scripted net error calculation currently available in MemBrain is not fully implemented in terms of its internal use during backpropagation. I.e. for internal BP operation MemBrain still assumes that the net error function is Squared Error.

I'll consider to add this in the next MemBrain version as another scripted element but probably will need some more thoughts on it.

Many thanks for kicking off my thoughts on this again!

Regards,

Re: Cross Entropy Error: User Defined Issue Maybe?

Posted: Thu 10. Mar 2011, 01:32
by MrCreosote
I will try to find the reference, but I believe the derivative they use is a "combination of" the activation function AND the CEE.

NOTE: Just thinking: If it is d(E)/dw then the activation is involved since it is required to calculate E (or CEE in our case.)

It is made clear in a number of papers that for CEE to work, the activation has to be such that the network predicts probabilities.

What is ironic about the CEE is that they are after the simple deviation (delta between target and prediction) being used as the BP error correction - in other words, a simple partial derivative. Its almost like the actual CEE error is of secondary importance. I just found an internet page that explains this aspect - I'll post it tomorrow when I'm back in the office.

Since there is a requirement of the activation when CEE is used, making CEE a selectable option will require some additional checks and defaults in Membrain. As a User Defined Function, then it would be up to the User to do what is correct. One thing, if you implemented CEE in Membrain, it would be the only nnet code that has this implemented - as you saw, even MATLAB does not have it. That would be a great feature for Membrain!

Please keep in mind, this is all new to me so I could easily have something quite wrong.
______________________
EDIT: Here is a paper that may help: OPTIMIZATION IN COMPANION SEARCH SPACES: THE CASE OF CROSS-. ENTROPY AND THE LEVENBERG-MARQUARDT ALGORITHM. Craig L. Fancourt and Jose C. Principe http://citeseerx.ist.psu.edu/viewdoc/do ... 1&type=pdf