This paper describes the impact of WWW on research areas of decision support based on decision support systems which acquire information from the Web. Up-to-date and rich information on the Web will make humans easy to construct decision problems systematically. In order to do so, however, information on the Web must be gathered and be integrated, then be interacted with the users. This paper particularly focuses on the relationship between the Web, data, and users with a viewpoint of human decision processes and decision support systems.
This paper proposes and evaluates a method for extracting personal web pages from a large number of unclassified web pages. We can use the method as a content filtering method for reputation searches. To extract personal pages from unclassified pages, the method focuses on four kinds of text features that appear at a personal page. The method quantitatively measures these features for each page and divides the pages into plural groups using k-means clustering based on the results of the measuring. From the groups the method finds groups that consist of personal web pages. We have evaluated the search performance of the method by measuring precisions. Experimental results have shown the average performance of the method is 2.1-times higher than the one of a keyword-based search engine.
We have developed a novel application called My Portal Viewer (MPV), which effectively integrates many articles collected from multiple news sites and presents these integrations through a familiar interface such as a page the user has often experienced. MPV dynamically determines the interest keywords that a user might potentially be interested in based on the articles that the user has read up to that time. And it creates categories based on these interest words dynamically. Users can, therefore, easily infer categories they are interested in from their category names, and read articles of interest. MPV and many other similar integration systems, however, have a problem that the systems cannot classify interesting articles for users and the other articles by a criteria based on frequency and cooccurence of interest words. This means that users must read too many articles in each category. We, therefore, propose a new method of selecting further articles from each category by using a user's impressions of articles. The improved MPV, called MPV Plus, selects and recommends more desirable articles using the method we propose. This paper presents the design concept and processing flow of MPV Plus, and reports how effectively it performed in evaluation experiments.
Since search engines are mainly used for Web page retrieval, the problems are pointed out that required Web pages are not displayed on a higher rank in retrieval result. One of the reasons is that, retrieval result is selected only by the reason that it includes searching key words in them. Even if the user uses same keywords, different kinds of pages tend to be mixed in retrieval result because of polysemy or ambiguity of words. In order to improve the retrieval result, these are same studies which classifies retrieval result to the group according to page contents using the vector space model method. The vector space model method is measuring the degree of similarity with other pages by the frequency of word appearance, however, this method has two problems. One is that, a cost of calculating the similarity is too high because all words appearing to even once in one page of all pages are used. The reduction of calculation cost should be considered because quick response is better for Web retrieval. In our system, we tried to reduce computational cost by selecting words using fuzzy reasoning. Another is that, it is difficult to show a group name or title to the user. Since this method only calculates the similarity of page, it cannot choose words representing a group. From the viewpoint of the user's convenience, it is desirable to add the technique of creating a group name or title automatically. In our method, we tried to create group names automatically by using the frequency of word co-occurrence. In this paper, we proposed the system which classifies retrieval result to the group according to page contents by some methods, that is, the fuzzy reasoning for selecting index words, the frequency of word co-occurrence for creating group indexes, and the vector space model method for classifying pages. From the experiments, we confirmed two points. One is that, 200 is better number of selected words from the viewpoint of calculation cost and classification accuracy. Another is that, proposed system performs similar classification for retrieval pages in terms of human sense.
We have studied ranking methods to retrieve web pages easily and precisely based on the mutual evaluation method we proposed. A problem in the method is that the precision of ranking of the retrieved web pages is still low when the percentage of pages which math the requirement in the set of them is under 15%. In this paper, we have improved the precision of it in the case mentioned above, introducing a heuristic function to evaluate the strength of match between a page and a requirement combining with a function to summarize texts included in pages to the conventional method we proposed. The improvement of the precision is validated by an objective experimental test which was proposed and developed in the third workshop on NTCIR. The result shows that the proposed ranking methods outputs better ranking of retrieved web pages than experiments of other ranking systems using the same test collection provided by the NTCIR, from the DCG's point of view.
As an attempt for discovering Web users of similar tastes, this paper proposes a method for discovering user communities from Web audience measurement data (Web log data). The method is based on an assumption that terms included in an URL often characterize the contents of the Web page pointed by the URL. Complete bipartite graphs are searched from user-term graph obtained from Web audience measurement data without analyzing the contents of Web pages. Experimental results show that our method succeeds in discovering many interesting user communities. Our approach based on graph search, which is common in Web structure mining, is effective also for Web usage mining. Terms attached to discovered user communities can be regarded as labels of the communities, and the terms make manual analysis of the communities easier.
This paper discusses a conference support system integrated with Web mining, a social networking service, and real-world interaction with IC cards. The system was operated in JSAI2003, JSAI2004 and JSAI2005. We focus on and analyze the data of user logs. Three kinds of user logs are obtained: relation of participants measured by Web mining, relation of participants registered by users themselves, and interaction data with information kiosks. Comparing the three kinds of data, we can see how Web information promotes social networking, and how social networking promotes the real-world interaction. That insight is a useful step as a foundation for design of real-world based interaction systems.
The primary mission of an NPO (NonProfit Organization) is the public benefits, not the profitability. However, sharing the mission becomes a challenge and weakens staffs' voluntarism as the number of staffs increases. To achieve a successful result, strong leadership behaviors are needed in managing staffs. In this paper, leadership behaviors are revealed by integrating the analysis of questionnaire survey and mailing list archives. Questionnaire survey is applied to 97 staffs in dot-jp, an NPO in Japan. Mailing list archives are analyzed by applying the IDM (Influence Diffusion Model) that measures the influential relationships between staffs. The results of this paper conclude that a leader should catch staffs' messages as well as throw messages to construct trust relationships. Otherwise, the leader becomes self-righteous, and makes members complaint about the organization.
This paper describes Parent-Child Agents information presentation model in our system, Interactive e-Hon, for helping children to understand difficult contents. It works by transforming text of electronic contents into an easily understandable “storybook world” with animations and dialogues. Then it explains about the content by Parent-child agent information presentation model. In this system, easy-to-understand content is created by a semantic tag generator through semantic information processing, an animation generator using an animation achieve and animation tables, a dialogue generator using semantic tag information, and concept explanation by metaphor using world-view databases. Through the results of experiments, this paper describes that Parent-child agents presentation model has a feature of not disturbing reception and understanding toward content for users.
The goal of the research presented in this paper is to support users in exploring a large amount of data for the purpose of decision-making and problem-solving. Our approach is to design human-computer interaction as a natural discourse between the user who explores the data, and the system that interprets the user's query, retrieves data based on the query, and presents the result. InTREND (an Interactive Tool for Reflective Exploration through Natural Discourse) supports this type of interaction by (1) interpreting the user's query represented in a natural language, (2) composing a graph, based on the interpreted query, for retrieved data, and (3) presenting an animated graph for the retrieval results. InTREND encourages iterative exploration by maintaining the context of past interactions and uses this context to improve discourse with the user. The paper describes our research motivation, presents a natural discourse framework, and explains the InTREND system. Our user studies evaluate the context preservation mechanisms of InTREND.
We propose a multiple-document summarization system with user interaction for coping appropriately with the user's varying summarization needs. Generally, automatic document summarization is a technology for producing a summary corresponding to a single document. However, in order that a person can better perform intellectual activities, a technology for producing a summary of more than one document (i.e. multiple-document summarization) becomes more important than producing a summary of a single document. Our multiple-document summarization system extracts keywords from the document set to be summarized and displays the k best keywords scored by our system to a user on the screen. From the displayed keywords, the user selects those reflecting the user's summarization needs. Here, in this paper, we define a user's “summarization needs” as content in which that particular user is interested. Our multiple-document summarization system produces a summary suitable for a user's summarization needs by using user-selected keywords. For evaluation of our method, we participated in the TSC3 of NTCIR4, an evaluation workshop for information retrieval and summarization held by National Institute of Informatics. We participated in this workshop by having our system select the 12 best keywords as scored by our system, and our entered system exhibited good performance in content evaluation of multiple-document summarization task. Moreover, we evaluate the effectiveness of user interaction, and the experimental results show that our user interaction system is effective.
There are a lot of opportunities to read or write documents in our social activities. A document, that is easy to comprehend, has a document stream throughout, and relationships among consecutive segments are clarity. If a document stream can be extracted and evaluated quantitatively, some support systems that enable us to comprehend or create documents will be suggested. In this paper, we propose a sub-topic model that models a document stream, and criterion to evaluate a document stream quantatively. Some preliminary experiments were executed to show validity of this model, and proposing criterion for document stream were evaluated by segments ordering experiments.
This paper proposes a method that analyzes textual data with time information. The method extracts events from the textual data by using a key concept dictionary which is a kind of thesaurus. The method also generates sequential event data from extracted events based on the time information and attributes of the textual data. Lastly, the method extracts sequential event patterns which are consistent with constraint sub-patterns designated by an analyst. The extracted patterns are used to support analysts' decision making, because the patterns can predict future events or propose events leading to a target. The analyst can use the patterns to his/her decision making. This paper verifies the effect of the method by applying the method to daily business reports collected by a Sales Force Automation system.
As syllabi are very important documents that inform people, particularly students, of the contents of curricula in detail, the techniques for efficient retrieval of them are in keen demand. It is necessary for efficient retrieval to arrange the results of retrieval in the form assisting users' selection of documents, as well as to achieve high performance in recall and precision. In this research, we developed a new retrieval system where search terms are expanded with their synonyms and the results are displayed in the form of tree-like structures expressing the relationships among those semantically related terms. We have shown its usefulness through experiments.
It is important for inquiry based learning to investigate learners' own interests or questions or to reconstruct information already got. We have implemented a learning environment which supports such inquiry based learning. The environment helps learners to add information got from web pages other than prepared learning materials into the materials and to reconstruct learning materials according to learners' thinking. This paper describes implementation of the environment and evaluations on usability.
Currently, research is in progress to display search results in groups for easy understanding for the users of search engines. Classification uses fixed hierarchical category labels as category names and dynamic clustering gives the category names extracted from search results and keywords. However, these approaches are not satisfactory for users in terms of the following: semantic validity, where category names and the categorizations are easy to understand and not redundant for the users; pertinence, where a group of web documents gives effective information for solutions in a user-selected category; formal validity, where undesired types of pages are not included; minimal cross-category redundancy, where necessary web documents do not exist across categories and target information can be found easily. Based on problem analysis of conventional techniques, this paper proposes a technique of adaptive classification according to the user's selective input with six groups of page types as candidate categories. In addition, a prototype system based on the proposed technique is evaluated by comparison with Yahoo and Vivisimo, representative open engines having functions of grouping and display. Compared with the conventional systems, the prototype system has gained up to 36.7% higher evaluation.