Feed-forward neural network models approximate nonlinear functions connecting inputs to outputs. The cascade correlation (CC) learning algorithm allows networks to grow dynamically starting from the simplest network topology to solve increasingly more difficult problems. It has been demonstrated that the CC network can solve a wide range of problems including those for which other kinds of networks (e.g., back-propagation networks) have been found to fail. In this paper we show the mechanism and characteristics of nonlinear function learning and representations in CC networks, their generalization capabilities, the effects of environmental bias, etc., using a variety of knowledge representation analysis tools.
One of the widely acknowledged drawbacks of flexible statistical models is that the fitted models are often extremely difficult to interpret. However, if flexible models are constrained to be additive the fitted models are much easier to interpret, as each input can be considered independently. The problem with additive models is that they cannot provide an accurate model if the phenomenon being modeled is not additive. This paper shows that a tradeoff between accuracy and additivity can be implemented easily in Gaussian process models, which are a type of flexible model closely related to feedforward neural networks. One can fit a series of Gaussian process models that begins with the completely flexible and are constrained to be progressively more additive, and thus progressively more interpretable. Observations of how the degree of non-additivity and the test error change as the models become more additive give insight into the importance of interactions in a particular model. Fitted models in the series can also be interpreted graphically with a technique for visualizing the effects of inputs in nonadditive models that was adapted from plots for generalized additive models. This visualization technique shows the overall effects of different inputs and also shows which inputs are involved in interactions and how strong those interactions are.
Researchers in analog computation theory have shown that a recurrent neural network (RNN) can be built to simulate a Turing machine (Pollack, 1987b; Siegelmann & Sontag, 1995). Recently, we showed that it is possible to train RNNs which implement some aspects of analog computation theory-namely a network can develop trajectories that count symbols (Wiles & Elman, 1995). But what are the implications for psychological models of sequence processing based on RNNs ? As a first step toward answering this question, we investigate an RNN in a psycholinguistically motivated task : predict the next letter in a simple Deterministic Context Free Language that has one level of center-embedding. We demonstrate how the network develops simple coordination between trajectories that enable it to perform limited counting, and in some cases generalize to longer strings. We geometrically identify and analyze several properties relevant for this task, including information loss that results from approaching attractors, divergence in phase space that is used to split states, and difficulty in learning temporal dependencies when the input-output probabilities overlap for different input symbols.
Two methods of obtaining structured information from a trained feed-forward neural network are discussed. The first of these methods extracts structured representations from the hidden layer and tests these against a set of hypotheses. The second method uses a rule-based system based on multiple-valued logic that can be trained using gradient descent.
The neural network rule extraction problem is aimed at obtaining rules from an arbitrarily trained artificial neural network. Recently there have been several approaches to rule extraction. Approaches to rule extraction implement a priori knowledge of data or rule requirements into neural networks before the rules are extracted. Although this may lead to a simplified final phase of acquitting the rules from particular type of neural networks, it limits the methodologies for general-purpose use. This article approaches the neural network rule extraction problem in its essential and general form. Preference is given to multilayer perceptron networks (MLP networks) due to their universal approximation capabilities. The article establishes general theoretical grounds for rule extraction from trained artificial neural networks and further focuses on the problem of crisp rule extraction. The problem of crisp rule extraction from trained MLP networks is first approached on theoretical level. Presented theoretical results state conditions guaranteeing equivalence between classification by an MLP network and crisp logical formalism. Based on the theoretical results an algorithm for crisp rule extraction, independent of training strategy, is proposed. The rule extraction algorithm can be used even in cases where the theoretical conditions are not strictly satisfied ; by offering an approximate classification. An introduced rule extraction algorithm is experimentally demonstrated.
Artificial Neural Networks (ANNS) are able, in general and in principle, to learn complex tasks. Interpretation of models induced by ANNS, however, is often extremely difficult due to the non-linear and non-symbolic nature of the models. To enable better interpretation of the way knowledge is represented in ANNS, we present BP-SOM, a neural network architecture and learning algorithm. BP-SOM is a combination of a multi-layered feed-forward network (MFN) trained with the back-propagation learning rule (BP), and Kohonen's self-organising maps (sorts). The involvement of the sort in learning leads to highly structured knowledge representations both at the hidden layer and on the SOMS. We focus on a particular phenomenon within trained BP-sort networks, viz. that the sort part acts as an organiser of the learning material into instance subsets that tend to be homogeneous with respect to both class labelling and subsets of attribute values. We show that the structured knowledge representation can either be exploited directly for rule extraction, or be used to explain a generic type of checksum solution found by the network for learning M-of-N tasks.
Proposed is GLLL2, a hybrid architecture of a global and a local learning module, which learns default and exceptional knowledge respectively from noisy examples. The global learning module, which is a feedforward neural network, captures global trends gradually, while the local learning module stores local exceptions quickly. The latter module distinguishes noise from exceptions, and learns only exceptions, which makes GLLL2 noise-tolerant. The results of experiments show the process in which training examples are formed into default and exceptional knowledge, and demonstrate that the predictive accuracy, the space efficiency, and the training efficiency of GLLL2 is higher than those of each individual module.
The observed features of a given phenomenon are not all equally informative : some may be noisy, others correlated or irrelevant. The purpose of feature selection is to select a set of features pertinent to a given task. This is a complex process, but it is an important issue in many fields. In neural networks, feature selection has been studied for the last ten years, using conventional and original methods. This paper is a review of neural network approaches to feature selection. We first briefly introduce baseline statistical methods used in regression and classification. We then describe families of methods which have been developed specifically for neural networks. Representative methods are then compared on different test problems.