In this paper, we discuss variable and value symmetries in distributed constraint reasoning and efficient methods to represent and propagate them in distributed environments. Unlike commonly used centralised methods which detect symmetries according to their global definition, we suggest here to define them at the individual constraint level, then define operations on those symmetries in order to propagate them through the depth-first search tree that is generated in efficient distributed constraint reasoning algorithms. In our algorithm, we represent constraints (or utility functions) by a list of costs: while the usual representation lists one cost for one assignation, we drastically reduce the size of that list by keeping only one cost for one class of equivalence of assignations. In practice, for a constraint with n symmetric variables defined on a domain of n symmetric values, this approach cuts down the size of the list of costs from nn to p(n) (partition function of n), i.e., from 1010 to 42 when n = 10. We henceforth devised algorithms to process the sparse representations of utility functions and to propagate them along with symmetry information among distributed agents. We implemented this new representation of constraints and tested it with the DPOP algorithm on distributed graph colouring problems, rich in symmetries. Our evaluation shows that in 19% of execution instances we cut down 10 times the volume of communication spent, while no significant overhead appears in non symmetrical executions. These results open serious perspectives on a possible bounding of memory and communication bandwidth consumption in some subclass of distributed constraint reasoning problems.
We propose an active learning framework for sequence labeling tasks. In each iteration, a set of subsequences are selected and manually labeled, while the other parts of sequences are left unannotated. The learning will stop automatically when the training data between consecutive iterations does not significantly change. We evaluate the proposed framework on chunking and named entity recognition data provided by CoNLL. Experimental results show that we succeed in obtaining the supervised F1 only with 6.98%, and 7.01% of tokens being annotated, respectively.
This paper presents a study on the use of word context features for Word Sense Disambiguation (WSD). State-of-the-art WSD systems achieve high accuracy by using resources such as dictionaries, taggers, lexical analyzers or topic modeling packages. However, these resources are either too heavy or don't have sufficient coverage for large-scale tasks such as information retrieval. The use of local context for WSD is common, but the rationale behind the formulation of features is often based on trial and error. We therefore investigate the notion of relatedness of context words to the target word (the word to be disambiguated), and propose an unsupervised method for finding the optimal weights for context words based on their distance to the target word. The key idea behind the method is that the optimal weights should maximize the similarity of two context models constructed from different context samples of the same word. Our experimental results show that the strength of the relation between two words follows approximately a power law. The resulting context models are used in Naïve Bayes classifiers for word sense disambiguation. Our evaluation on Semeval WSD tasks in both English and Japanese show that our method can achieve state-of-the-art effectiveness even though it does not use external tools like most existing methods. The high efficiency makes it possible to use our method in large scale applications such as information retrieval.
An overview of the SemEval-2 Japanese WSD task is presented. The new characteristics of our task are (1) the task will use the first balanced Japanese sense-tagged corpus, and (2) the task will take into account not only the instances that have a sense in the given set but also the instances that have a sense that cannot be found in the set. It is a lexical sample task, and word senses are defined according to a Japanese dictionary, the Iwanami Kokugo Jiten. This dictionary and a training corpus were distributed to participants. The number of target words was 50, with 22 nouns, 23 verbs, and 5 adjectives. Fifty instances of each target word were provided, consisting of a total of 2,500 instances for the evaluation. Nine systems from four organizations participated in the task.
Web spamming has emerged to deceive search engines and obtain a higher ranking in search result lists which brings more traffic and profits to web sites. Link farm is one of the major spamming techniques, which creates a large set of densely inter-linked spam pages to deceive link-based ranking algorithms that regard incoming links to a page as endorsements to it. Those link farms need to be eliminated when we are searching, analyzing and mining the Web, but they are also interesting social activities in the cyberspace. Our purpose is to understand dynamics of link farms, such as, how much they are growing or shrinking, and how their topics change over time. Such information is helpful in developing new spam detection techniques and tracking spam sites for observing their topics. Especially, we are interested in where we can find emerging spam sites that is useful for updating spam classifiers. In this paper, we study overall size/topic distribution and evolution of link farms in large-scale Japanese web archives for three years containing four million hosts and 83 million links. As far as we know, the overall characteristics of link farms in a time-series of web snapshots of this scale have never been explored. We propose a method for extracting link farms and investigate their size distribution and topics. We observe the evolution of link farms from the perspective of size growth and change in topic distribution. We recursively decomposed host graphs into link farms and found that from 4% to 7% of hosts were members of link farms. This implies we can remove quite a number of spam hosts without contents analysis. We also found the two dominant topics, “Adult” and “Travel”, accounted for over 60% of spam hosts in link farms. The size evolution of link farms showed that many link farms maintained for years, but most of them did not grow. The distribution of topics in link farms was not significantly changed, but hosts and keywords related to each topic dynamically changed. These results suggest that we can observe topic changes in each link farm, but we cannot efficiently find emerging spam sites by monitoring link farms. This implies that to detect newly created spam sites, monitoring current link farm is not enough. Detecting sites that generate links to spam sites would be an effective approach.
Historically, various different notions of trust can be found, each addressing particular aspects of ICT systems, e.g., trust in electronic commerce systems based on reputation and recommendation, or trust in public key infrastructures. While these notions support the understanding of trust establishment and degrees of trustworthiness in their respective application domains, they are insufficient when addressing the more general notion of trust needed when reasoning about security in ICT systems. Furthermore, their purpose is not to elaborate on the security mechanisms used to substantiate trust assumptions and thus they do not support reasoning about security in ICT systems. In this paper, a formal notion of trust is presented that expresses trust requirements from the view of different entities involved in the system and that enables to relate, in a step-by-step process, high level security requirements to those trust assumptions that cannot be further substantiated by security mechanisms, thus supporting formal reasoning about system security properties. Integrated in the Security Modeling Framework SeMF this formal definition of trust can support security engineering processes and formal validation and verification by enabling reasoning about security properties with respect to trust.
Physical motion features that cause a difference in the perceived rhythmic sense of a dancer were identified. We compared the rhythmical movements of Latin American and Japanese people to find such features. A 2-D motion capture system was used to measure rhythmical movement, and two motion features were identified. The first was the phase shift of the rotational angles between the hips and chest, and the second was the phase shift between the hips' rotational angle and horizontal position. A psychophysical study demonstrated whether these features affected the perceived rhythmic sense of a dancer. The results showed that both motion features significantly affected perceived sense, and one of them also affected how a viewer guessed the home country or area of a dancer. Consequently, the rhythmic sense of a virtual character can be controlled easily by adding and removing features to and from the character's synthesized motion.
We propose a visible light road-to-vehicle communication system at intersections as an ITS(Intelligent Transport System) technique. In this system, the communication between a vehicle and an LED traffic light is conducted using an LED traffic light as a transmitter and an on-vehicle high-speed camera as a receiver. The LEDs in the transmitter emit light at high frequency, and those emitted light is captured by the high-speed camera for communication. Here, the luminance value of the LEDs in the transmitter should be captured in consecutive frames to achieve communication. For this purpose, first the transmitter should be found, and then it should be tracked in consecutive frames while the vehicle is moving by processing the images from the high-speed camera. In this paper, we propose new algorithms for finding and tracking the transmitter, which result in increased communication speed and data rate compared to the previous methods. Experiments using appropriate images showed the effectiveness of the proposals.
We propose a fast method of face detection that is based on the characteristics of a cascade classifier that is called a cascade step search (CSS). The proposed method has two features. First is a gradual classification that uses only a few layers of the cascade classifier to estimate the face-likelihood distribution. Second is an efficient search that uses the face likelihood distribution. The search is operated at intervals that are optimally changed according to the likelihood. This reduces the number of sub-windows that must be processed. The face-likelihood on the image window exposes the likelihood near the window and the next scaled one. These features can reduce the costs of classifications in the face detection process. Our experiments on face detection show that the proposed method is about five times faster than the traditional searches and maintains a high detection rate.
Accelerated by the technological advances in the biomedical domain, the size of its literature has been growing very rapidly. As a consequence, it is not feasible for individual researchers to comprehend and synthesize all the information related to their interests. Therefore, it is conceivable to discover hidden knowledge, or hypotheses, by linking fragments of information independently described in the literature. In fact, such hypotheses have been reported in the literature mining community; some of which have even been corroborated by experiments. This paper mainly focuses on hypothesis ranking and investigates an approach to identifying reasonable ones based on semantic similarities between events which lead to respective hypotheses. Our assumption is that hypotheses generated from semantically similar events are more reasonable. We developed a prototype system called, Hypothesis Explorer, and conducted evaluative experiments through which the validity of our approach is demonstrated in comparison with those based on term frequencies, often adopted in the previous work.
Evaluating credibility of online information is important as the Web starts playing major role in our lives. As it is often difficult for users to manually estimate the level of trustworthiness of encountered content, providing automatic support for assessment of credibility levels of online information should improve user satisfaction. In this paper, we approach the problem of credibility evaluation from the author viewpoint. In particular, we propose analyzing temporal characteristics of posting behavior of bloggers. Our hypothesis is that precursor bloggers who persistently post content ahead of the community are often credible authors. The topics discussed by such bloggers frequently become common topics inside the community of bloggers. We propose categorization of blogging behavior depending on the time delay and content similarity between published posts and demonstrate several methods for calculating blogger scores.
When a camera moves during its exposure time, the captured image is degraded by the motion. Despite the several decades of researches, image deconvolution to restore a blurred image still remains an issue, particularly in blind deconvolution cases in which the actual shape of the blur is unknown. The cepstral approaches have been used to estimate a linear motion. In this paper, we propose a Point Spread Function (PSF) estimation method from a single blurred image. We extend the classical cepstral approaches that have been used for Uniform Linear motion PSF estimation. Focusing on Uniform Non-Linear motion (UNLM) that goes one direction and potentially weakly curves, we solve the PSF estimation problem as a camera path estimation problem. To solve the ill-posed problem, we derive a constraint on the behavior of the cepstra of UNLM PSFs. In a first step, we estimate several PSF candidates from the cepstrum of a blurred image. Then, we select the best PSF candidate by evaluating the candidates based on the ringing artifacts on the restored images obtained using the candidates. The performance of the proposed method is verified using both synthetic images and real images.
Associative search is information retrieval based on the similarity between two different items of text information. This paper reports experiments on associative search on a large number of short documents containing a small set of words. We also show the extension of the set of words with a semantic relation, and investigate its effect. As an instance, experiments were performed on 49, 767 professional (non-handicapped) Shogi game records with 1, 923 next move problems for evaluation. The extension of the set of words by pairing under semantic relations, called semantic coupling, is examined to see the effect of enlarging the word space from unigrams to bigrams. Although the search results are not as precise as next move search, we observe improvement by filtering the unigram search result with the bigram search, especially in the early phase of Shogi games. This also fits our general feeling that the bigram search detects castle patterns well.
Empirical evidence has consistently shown that trust facilitates coordination, reduces conflicts and enhances longevity within cooperative relationships. Conditions leading to trust have been considered repeatedly in research papers. Whereas the link between reputation and trust, for example, has been extensively researched, the study of relational competence as a determinant of trust has largely been ignored. Although some academic articles naming the impact of competence on trust exist, the study of the mode of action of relational competence in the trust-developing process is underdeveloped. Therefore, the main purpose of this paper is to analyse the relationship between relational competence and trust. For this reason, a laboratory experiment was conducted. In its conclusion, the paper presents the empirically confirmed strong correlation between relational competence and trust within cooperative relationships by taking into account situational and personal factors.
Complex software-security policies are difficult to specify, understand, and update. The same is true for complex software in general, but while many tools and techniques exist for decomposing complex general software into simpler reusable modules (packages, classes, functions, aspects, etc.), few tools exist for decomposing complex security policies into simpler reusable modules. The tools that do exist for modularizing policies either encapsulate entire policies as atomic modules that cannot be decomposed or allow fine-grained policy modularization but require expertise to use correctly. This paper presents PoliSeer, a GUI-based tool designed to enable users who are not expert policy engineers to flexibly specify, visualize, modify, and enforce complex runtime policies on untrusted software. PoliSeer users rely on expert policy engineers to specify universally composable policy modules; PoliSeer users then build complex policies by composing those expert-written modules. This paper describes the design and implementation of PoliSeer and a case study in which we have used PoliSeer to specify and enforce a policy on PoliSeer itself.
The majority of recommender systems predict user preferences by relating users with similar attributes or taste. Prior research has shown that trust networks improve the accuracy of recommender systems, predominantly using algorithms devised by individual researchers. In this work, omitting any specific trust inference algorithm, we investigate how useful it might be if explicit trust relationships are used to select the best neighbors or predictors, to generate accurate recommendations. We conducted a series of evaluations using data from Epinions.com, a popular collaborative reviewing system. We find that, for highly active users, using trusted sources as predictors does not give more accurate recommendations compared to the classic similarity-based collaborative filtering scheme, except in improving the precision to recommend items that are of users' liking. This cautions against the intuition that inputs from trusted sources would always be more accurate or helpful. The use of explicit trust links, however, provides a slight gain in prediction accuracy when it comes to the less active users. These findings highlight the need and potential to adapt the use of trust information for different groups of users, besides to better understand trust when employing it in the recommender systems. Parallel to the trust criterion, we also investigated the effects of requiring the candidate predictors to have an equal or higher experience level.
The Wikipedia is a web-based encyclopedia, written and edited collaboratively by Internet users. The Wikipedia has an extremely open editorial policy that allows anybody, to create or modify articles. This has promoted a broad and detailed coverage of subjects, but also introduced problems relating to the quality of articles. The Wikipedia Recommender System (WRS) was developed to help users determine the credibility of articles based on feedback from other Wikipedia users. The WRS implements a collaborative filtering system with trust metrics, i.e., it provides a rating of articles which emphasizes feedback from recommenders that the user has agreed with in the past. This exposes the problem that most recommenders are not equally competent in all subject areas. The first WRS prototype did not include an evaluation of the areas of expertise of recommenders, so the trust metric used in the article ratings reflected the average competence of recommenders across all subject areas. We have now developed a new version of the WRS, which evaluates the expertise of recommenders within different subject areas. In order to do this, we need to identify a way to classify the subject area of all the articles in the Wikipedia. In this paper, we examine different ways to classify the subject area of Wikipedia article according to well established knowledge classification schemes. We identify a number of requirements that a classification scheme must meet in order to be useful in the context of the WRS and present an evaluation of four existing knowledge classification schemes with respect to these requirements. This evaluation helped us identify a classification scheme, which we have implemented in the current version of the Wikipedia Recommender System.
Network operators must understand the status of a network from various viewpoints to manage the changing conditions of the Internet and diagnose the cause of anomalies within it. For this purpose, analyzing information obtained from different sources, including network devices such as routers and network monitoring tools, is required. Of that information, route information exchanged by Border Gateway Protocol (BGP) is essential for analyzing and diagnosing the Internet at an inter-autonomous-system level. However, the information is not easily deployed for actual network management because of its enormous amount and the difficulties in manipulating it in an original format. To overcome these problems and support network analysis essential for actual network management, we propose IM-DB, a network information retrieval system for interactive analysis of the Internet. Our IM-DB is based on a relational database and can store large amounts of network information including BGP information. By providing several functions for manipulating stored network information, the IM-DB enables network operators to easily search, integrate, and explore information they require. We developed the first version of our IM-DB for storing BGP routing information and other application-level network information. The experimental results using this prototype demonstrate the feasibility of our IM-DB.
Device Comfort is a concept that uses an enhanced notion of trust to allow a personal (likely mobile) device to better reason about the state of interactions and actions between it, its owner, and the environment. This includes allowing a better understanding of how to manage information in fine-grained context as well as addressing the personal security of the user. To do this, it forms a unique relationship with the user, focusing on the device's judgment of user in context. This paper introduces and defines Device Comfort, including an examination of what makes up the comfort of a device in terms of trust and other considerations, and discusses the uses of such an approach. It also presents some ongoing developmental work in the concept, and an initial formal model of Device Comfort, its makeup and behaviour.
The necessary cooperation in packet forwarding by wireless mobile ad hoc network users can be achieved if nodes create a distributed cooperation enforcement mechanism. One of the most significant roles in this mechanism is played by a trust system, which enables forwarding nodes to distinguish between cooperative (therefore trustworthy) and selfish (untrustworthy) nodes. As shown in this paper, the performance of the system depends on the data classes describing the forwarding behaviour of nodes, which are used for the evaluation of their level of cooperation. The paper demonstrates that partition of such data into personal and general classes can help to create better protection against clique-building among nodes. Personal data takes into account the status of packets originated by a node itself, while general considers the status of packets originated by other nodes. Computational experiments demonstrate that, in the presence of a large number of selfish and colluding nodes, prioritising the personal data improves the performance of cooperative nodes and creates a better defence against colluding free-riders.
Computers connected to the Internet are a target for a myriad of complicated attacks. Companies can use sophisticated security systems to protect their computers; however, average users usually rely on built-in or personal firewalls to protect their computers while avoiding the more complicated and expensive alternatives. In this paper we study the feasibility — from the network performance point of view — of a VM configured as an integrated security appliance for personal computers. After discussing the main causes of network performance degradation, we use netperf on the host computer to find the network performance overhead when using a virtual appliance. We are mainly concerned with the bandwidth and the latency that would limit a network link. We compared the bandwidth and latency of this integrated security virtual appliance with current market products and in both cases, the performance of the virtual appliance was excellent compared with hardware counterparts. This security virtual appliance for example allows more than an 80Mbps data transfer rate for individual users, while security appliances generally allow only 150Mbps for small office users. In brief, our tests show that the network performance of a security virtual appliance is on par with the current security appliances available in the market; therefore, this solution is quite feasible.
“Anshin” is an emotion in Japanese that is difficult to translate because it is vague, varies from person to person, and is subjective. It means something like “a feeling of contentment”. The demand for Internet use with “Anshin” is high. We believe that the emotion and the demand could be universal. To study “Anshin, ” we conducted group interviews as our first step. We obtained 95/157 cases of “Anshin”/anxiety from 28 people. From the results, we found that studying anxiety is valuable. Anxiety is a kind of opposite concept to “Anshin” and controlling it leads to a kind of “Anshin.” To discuss this, we constructed a model of the process of anxiety generation and selected candidates for the related elements. After investigating obtained cases, we produced a questionnaire for Internet anxieties to prepare the evaluation of them.
In this paper, we investigate the awareness gaps between information security managers and workers with regard to the effectiveness of organizational information security measures in Japanese organizations by analyzing micro data from two Web-based surveys of information security managers and workers. As a result, we find that there are no awareness gaps between information security managers and workers with regard to the effects of the organizational information security in large companies. However, we find that awareness gaps between them tend to exist in small or medium-sized companies. Next, we argue how to bridge the gaps. We propose that information security managers could implement the two-sided organizational measures by communicating with workers in their organizations.
In modern information service architectures, many servers are involved in service building, in which servers must rely on the information provided by other servers thereby creating a trust. This trust relation is central to building services in distributed environments, and is closely related to information security. Almost every standard on information security is concerned with the internal control of an organization, and particularly with authentication. In this paper, we focus on a trust model of certificate authentication. Conventionally, a trust model of certificates is defined as a validation of chains of certificates. However, today, this trust model does not function well because of the fragmentation problem caused by complexities of paths and by fine a requirement at security levels. In this paper, we propose “dynamic path validation” together with another trust model of PKI for controlling this situation. First, we propose Policy Authority. Policy Authority assigns a level of compliance (LoC) to CAs in its trust domain. LoC is evaluated in terms of the certificate common criteria of Policy Authority. Moreover, it controls the path building with considerations of LoC. Therefore, we can flexibly evaluate levels of CP/CPS's in a single server. In a typical bridge model, we need as many bridge CAs as the number of required levels of CP/CPS's. In our framework, instead, we can do the same task in a single server, by which we can save costs of maintaining lists of trust anchors at multiple levels.
In contemporary society, many services are offered electronically. Electronically-available personal identification is used to identify the users of these services. e-Money, a potential medium that contains an eID, is widely used in Japan. Service providers encounter certain limitations both when collecting the attribute values related to such eIDs and when using them for analysis because of privacy concerns. A survey was conducted to clarify which of these trade-offs to consider before deploying e-Money privacy, economic value, benefit, or services. Regression analysis and conjoint analysis were performed. The results of the analyses of the questionnaires revealed that there was a preference for economic value, service, and privacy, in this order even though many people were anxious about privacy.
We propose an identity management system that supports role-based pseudonyms that are bound to a given set of services (service contexts) and support the use of reputation systems. Our proposal offers a solution for the problem of providing privacy protection and reputation mechanisms concurrently. The trust information used to evaluate the reputation of users is dynamic and associated to their pseudonyms. In particular, our solution does not require the support or assistance from central authorities during the operation phase. Moreover, the presented scheme provides inherent detection and mitigation of Sybil attacks. Finally, we present an attacker model and evaluate the security and privacy properties and robustness of our solution.
The purpose of the paper is to elucidate the formation of “trust” in Internet society in the context of the relationship between the real social system and “trust” and between Internet space and “trust”. Although it is based on the dualities of real-world “system trust” and Internet “technological trust” and real-world “human trust” and Internet “personality trust” in this paper, in order for trust in the Internet space not to be limited to a personal issue of name and anonymity before trust for the space system, we intend to discuss trust in the Internet space as “communications trust” (duality of system and personality reliabilities). This study suggests that trust in the Internet space lies in the joint composition of technology (civilization) and society (culture) and clearly exists as complex of security (info-tech) and humanity (info-arts). Based on the above idea the final purpose of this study is to shows a path towards forming new human trust (internal controls) in the “information security” fabricated from the viewpoint of human mind controls (laws, morality, ethics, and custom) and information engineering. This course signifies the “security arts” (the study of trust) that incorporate information arts and info-tech in the Internet space.