4.K. Neural Nets

Artificial Intelligence (AI) has a long history of big breakthroughs being just over the 10 year horizon. Enthusiastic reports in the early 1960s predicted machine awareness by the early 1970s. So far as we know, this has not happened yet. Deep Blue notwithstanding, to date the most successful part of AI is neural nets, the part most directly inspired by biology.

A neural net is a computational architecture based on the design of the interconnections of neurons in our brains. There are many variations, but one of the simplest is a feed-forward net.

The input layer receives values, x_i that it feeds forward to the hidden layer through the links indicated in the diagram. Associated with each link is a weight, say w_i,j is the weight of the link between the i^th input neuron and the j^th hidden neuron. The input of the j^th hidden neuron is the sum

w_1,j*x₁ + ... + w_L,j*x_L

There are several schemes for how this input is handled by the hidden neuron. The simplest is called a threshold function. If the weighted sum of the inputs exceeds a value, called the threshold, then the value of y_j is set to 1; otherwise, it is set to 0. The other common approach is called a sigmoid function, pictured below. Note as the weighted sum of the input values increases, the value of y_j increases gradually, instead of abruptly with the threshold function. (The threshold function more closely follows the behavior of biological neurons.)

Another set of weights, v_i,j, connects the hidden neurons to the output neurons, and the same process feeds forward the values of the hidden neurons to the output neurons. For example, given the network and weights pictured below, suppose the inputs are x₁ = 0 and x₂ = 1. What will the feedforward process give for the output neuron?

First, compute the weighted inputs for the hidden neurons.

For y₁ the input is 2.5*0 + 1.2*1 = 1.2

For y₂ the input is -1.4*0 + 2.1*1 = 2.1

Using a threshold function with a threshold value of 0, we see y₁ = 1 and y₂ = 1. Now the weighted input for the output neuron is

-1.6*1 + 1.4*1 = -0.2

Consequently, the output value is z₁ = 0.

So what? The strength of neural nets lies not in their ability to compute in this fashion, but in their ability to learn, to generalize. Neural nets can be trained. Think of the training as learning to answer sample questions. We want the net to produce specific outputs for certain inputs. Training consists of adjusting the weights to match outputs and inputs. Typically, there is a training set, a collection of outputs paired to inputs. The weights are adjusted so the first input produces the first output. Next the weights are adjusted so the second input produces the second output. This continues until the last input gives the last output. By now, the weights have changed so much that the first input no longer produces the first output. The training process is repeated through all the input-output pairs. This is done again and again until the net gets them all right.

The remarkable thing about this process is that we have no idea of what the final weights mean. But often the net generalizes its training set: it can correctly answer questions not in the training set. We shall mention examples in a moment.

How are the weights adjusted to match the input-ouptut pairs? One of the most common methods is back-propagation. The difference between the feedforward value of z_i and the training set output value is the error, and the error is "propagated back" through the net, using the weights to compute errors at the hidden neurons, and ultimately to adjust the weights.

For example, suppose the training set for the net pictured above contains the input-output pair

(x₁ = 0, x₂ = 1; z₁ = 1)

The net picutured above is not trained for this input-output pair. The output error is

e_z₁ = 1 - 0 = 1

The weight between v_1,1 gives the error at y₁:

e_y₁ = e_z₁*v_1,1 = 1*(-1.6) = -1.6

Similarly

e_y₂ = e_z₁*v_2,1 = 1*1.4 = 1.4

With these error values, we compute the changes in the weights.

dv_1,1 = e_z₁*y₁ = 1*1 = 1

dv_2,1 = e_z₁*y₂ = 1*1 = 1

dw_1,1 = e_y₁*x₁ = -1.6*0 = 0

dw_2,1 = e_y₁*x₂ = -1.6*1 = -1.6

dw_1,2 = e_y₂*x₁ = 1.4*0 = 0

dw_2,2 = e_y₂*x₁ = 1.4*1 = 1.4

Now new weights are computed.

v_1,1 -> v_1,1 + dv_1,1 = -1.6 + 1 = -0.6

v_2,1 -> v_2,1 + dv_2,1 = 1.4 + 1 = 2.4

w_1,1 -> w_1,1 + dw_1,1 = 2.5 + 0 = 2.5

w_2,1 -> w_2,1 + dw_2,1 = 1.2 + (-1.6) = -0.4

w_1,2 -> w_1,2 + dw_1,2 = -1.4 + 0 = -1.4

w_2,2 -> w_2,2 + dw_2,2 = 2.1 + 1.4 = 3.5

Here is the new net. It is easy to verify this net is trained for the input-output pair (x₁ = 0, x₂ = 1; z₁ = 1).

Here is a more interesting example, generated using BrainMaker, a commercial neural net package. The goal is to teach the net to recognize digits presented as an 8 by 8 pixel array. This net has 64 input neurons, 10 output neurons (one for each digit), and 25 hidden neurons. The weights are randomized initially. For example, the input 0 produces the output shown here. The darkness of the box indicates the "certainty" the net has of the value of the digit. This net is reasonably confused: it is quite sure 0 is 0, 5, and 6. Click on the picture to see the initial outputs for a this randomized net.

After training, the net will recognize the test digits, but what happens if we present it with slightly modified test digits? Top left is the trianed response for the digit 9. Note how the net's interpretations of the pixel pattern changes as we go step by step from 8 to 9. The patterns for 8 and 9 are quite similar, so it is little surprise that small changes destroy the net's certainty.

On the other hand, here is the net's trained response to the pixel pattern for 0, and several variants. No other pattern resembles 0 very much, so the net is able to generalize. Small changes from the pattern of 0 it still recognizes as 0. With enough changes, the net begins to recognize 3.

Although the training process is completely deterministic, the initial weight space is so high-dimensional that predicting the outcome of training the randomized initial net is hopeless. Adding to the complications is the observation that usually there are many different combinations of weights that satisfy all the training set. Do the basins of attraction have fractal boundaries? We do not know.

Neural nets have many practical applications. Perhaps one of the most surprising is landing large jet airliners. Boeing and Airbus both have neural nets trained to land their largest planes. The nets can assimilate much more information more rapidly than a human pilot, and if it has been trained for hundreds of hours in conditions like those used to train people, we would expect it to perform well. How well is a matter of some disagreement, best summarized by this observation. In a difficult landing, the Boeing pilot can override the neural net, whereas the Airbus net cannot be overriden. Scary? Just wait.

Return to Cellular Autimata