7

I have trained xor neural network in MATLAB and got these weights:

iw: [-2.162 2.1706; 2.1565 -2.1688]

lw: [-3.9174 -3.9183]

b{1} [2.001; 2.0033]

b{2} [3.8093]

Just from curiosity I have tried to write MATLAB code which computes the output of this network (two neurons in the hidden layer, and one in the output, TANSIG activation function).

Code that I got:

l1w = [-2.162 2.1706; 2.1565 -2.1688];
l2w = [-3.9174 -3.9183];
b1w = [2.001 2.0033];
b2w = [3.8093];

input = [1, 0];

out1 = tansig (input(1)*l1w(1,1) + input(2)*l1w(1,2) + b1w(1));
out2 = tansig (input(1)*l1w(2,1) + input(2)*l1w(2,2) + b1w(2));
out3 = tansig (out1*l2w(1) + out2*l2w(2) + b2w(1))

The problem is when input is lets say [1,1], it outputs -0.9989, when [0,1] 0.4902. While simulating network generated with MATLAB outputs adequately are 0.00055875 and 0.99943.

What am I doing wrong?

1
  • 2
    why dont you post the actual code you used to build and train the network? Commented Mar 10, 2010 at 20:25

2 Answers 2

11

I wrote a simple example of an XOR network. I used newpr, which defaults to tansig transfer function for both hidden and output layers.

input = [0 0 1 1; 0 1 0 1];               %# each column is an input vector
ouputActual = [0 1 1 0];

net = newpr(input, ouputActual, 2);       %# 1 hidden layer with 2 neurons
net.divideFcn = '';                       %# use the entire input for training

net = init(net);                          %# initialize net
net = train(net, input, ouputActual);     %# train
outputPredicted = sim(net, input);        %# predict

then we check the result by computing the output ourselves. The important thing to remember is that by default, inputs/outputs are scaled to the [-1,1] range:

scaledIn = (2*input - 1);           %# from [0,1] to [-1,1]
for i=1:size(input,2)
    in = scaledIn(:,i);             %# i-th input vector
    hidden(1) = tansig( net.IW{1}(1,1)*in(1) + net.IW{1}(1,2)*in(2) + net.b{1}(1) );
    hidden(2) = tansig( net.IW{1}(2,1)*in(1) + net.IW{1}(2,2)*in(2) + net.b{1}(2) );
    out(i) = tansig( hidden(1)*net.LW{2,1}(1) + hidden(2)*net.LW{2,1}(2) + net.b{2} );
end
scaledOut = (out+1)/2;              %# from [-1,1] to [0,1]

or more efficiently expressed as matrix product in one line:

scaledIn = (2*input - 1);           %# from [0,1] to [-1,1]
out = tansig( net.LW{2,1} * tansig( net.IW{1}*scaledIn + repmat(net.b{1},1,size(input,2)) ) + repmat(net.b{2},1,size(input,2)) );
scaledOut = (1 + out)/2;            %# from [-1,1] to [0,1]
Sign up to request clarification or add additional context in comments.

1 Comment

Excellent. Helped me a lot.
-1

You usually don't use a sigmoid on your output layer--are you sure you should have the tansig on out3? And are you sure you are looking at the weights of the appropriately trained network? It looks like you've got a network trained to do XOR on [1,1] [1,-1] [-1,1] and [-1,-1], with +1 meaning "xor" and -1 meaning "same".

4 Comments

Then how do you normalize your output if you don't use Sigmoid in the output layer? Furthermore how do you measure error if your output is not normalized?
For a classifier, you pick the output with the highest value (or toggle at the 50% point) to make your decision. You don't need the nonlinearity. In this case it's okay to do it, but it doesn't really add much.
the problem of using a linear function in the output layer becomes apparent when you want to get posterior probabilities of each class in addition to the classifications..
@Amro: Fair enough. If you want them to be forced into the range (0,1), then yes, you should use 1/(1+exp(-y)); you get approximate probabilities either way but you might exceed 1 (or fall below 0) if you just treat it as a function approximation. Whether that is a problem depends on the application.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.