Abstract
Multi-layered perceptron, a kind of neural network, is suitable for modeling complex phenomena in a noisy and restricted data environment. In spatial interaction modeling, Openshaw (1993) and Fisher and Gopal (1994) verified better performance of perception than classical models. Openshaw wrote “If the dependent variable is binomial or Poisson then the net will deduce this from the training data, there is no need to be explict”. However, both residual patterns of perceptrons and a log-linear gravity model were similar in the results of Fisher and Gopal (1994). They performed log-transform for output data, and trained networks by the least square criterion, which is normal for most perceptrons. If we regard each output of perceptrons as probabilistic expectation, then both of their trained perceptrons and log-linear gravity models would be assumed as log-normal distributions for error term. The similarity of residual patterns (under-prediction in large flows) could be caused by the same assumption of error distribution. To overcome such well-known features of log-normal models, we might use the Poisson regression technique (Flowerdew and Aitkin, 1982).
In this paper, I specified PR-perceptron, which is a special type of two-layered perceptrons (Fig. 1). Its features are as follows. 1) By sigmoid hidden-units, PR-perceptrons can approximate any continuous mapping. 2) An exponential output-unit produces a positive real number with no upper limits, and 3) it also brings formal similarities between PR-perceptron and Poisson regression models. 4) Its training criteria is the maximum likelihood in the Poisson distribution. The PR-perceptron shares not simply the estimation process of parameters but also theories of undergoing process of spatial interaction with the Poisson regression, the family of maximum-entropy models, and discrete choice models, e. g. the multinominal logit model.
I applied the PR-perceptron to 1990 inter-prefectural migration data for Japan and compared results with those of Poisson regression models (un-constrained, singly-constrained, and doubly-constrained models). The following results were obtained: 1) the PR-perceptron, which is non-constrained, showed the same level of goodness-of-fit as singly-constrained Poisson regression models (Tables 1 and 2). We recognized the superiority of the PR-perceptron even if we consider the trade-off relation between the number of parameters and generality of training results. 2) There were similarities in residual distribution between the PR-perceptron and Poisson regression model (un-constrained type) (Fig. 5) and no observed underprediction in large flows as log-linear models. This could indicate that the assumption of error distribution affects outcomes of the perceptrons. 3) Visualization of contents of trained PR-perceptron was carried out (Figs. 6 and 7). It enables us to a) verify assumptions about forms of utility functions, e. g. distance-decay function, and b) investigate exploratory non-linear relations and interdependency among variables. In this case study, we recognized intermediate distance-decay effects between power and exponential function, and non-monotonic non-linear relations according to characteristics of origins and destinations.