Local governments establish ordinances and regulations (hereinafter collectively referred to as “statutes”). They are structured documents that possess a chapter >article >paragraph >item hierarchy. Since each local government establishes statutes in its councils independently, similar statutes on the same matter are often found in separate local governments (e.g. punishment for obscene habits). In legal education, legal research and legal works at local government and business enterprise, comparisons are made to clarify the differences between similar statutes. In the comparison of laws for practical purposes, article correspondence tables are normally created with pairs of corresponding articles aligned horizontally or vertically. The objective of our research is to use a computer to automatically generate the article correspondence tables that are currently created manually. In order to accomplish this objective, we have focused on the relationships between articles in article correspondence tables, which were modeled with directed bipartite graphs that used each article as a node. 96 methods based on the vector space model, longest common subsequence and sequence alignment were examined in order to clarify effective methods for searching for corresponding articles. In the course of the research, we automatically generated article correspondence tables of 22 statutes in total (11 statutes of Ehime and Kagawa Prefectures, respectively). Their accuracy rates were calculated based on article correspondence tables created by legal scholars. Consequently, the vector space model-based method proved the highest accuracy rate at 85%. Its targets were nouns, adverbs, adjectives, verbs and attributives. The sequence alignment-based method showed up to 81% of accuracy rate, while the rate with the longest common subsequence method was 75%. As the results of the computer-generated article correspondence tables are checked by legal scholars on a practical level, it is required to posess the degree of reliability for each relationships between articles. To meet the requirement, we examined two measurements for the three methods by receiver operating charasteristic curve. The results shows the ratio of the selected relations and the runner-up gives 0.80 AUC for longest common subsequences. In this research, the problem was defined by focusing on the correspondence relationship between articles in the article correspondence tables. For practical purposes, there is a need to focus not just on the correspondence relationship between articles, but also on the clarification of different words used in corresponding articles. Since vector space model cannot be used to clarify such differences, sequence alignment—with which it is feasible to clarify differing texts—is necessary. Composite methods that combine those two will therefore be required in the future.
View full abstract