We present a system that assists Japanese-language teachers with the creation of electronic reading materials which contain glosses. Although glosses are traditionally generated for only content words, we propose a hybrid method for identifying and glossing functional expressions and conjugations, which enables the system to both generate more glosses and display them in an appropriate manner for learners. Coverage analysis and empirical evaluations show that our hybrid method allows our system to cover more functional expressions and conjugations than previous systems while maintaining good performance. Feedback received during interviews with Japanese-language teachers show that they react very positively toward both the new glosses and the system in general. Finally, results from a survey of Japanese as a foreign language learners show that they find the new glosses for functional expressions and conjugations to be very helpful.
Currently, many universities use Web services, such as SlideShare and edubase, to store presentation files. These files provide varying levels of knowledge, and are useful and valuable to students. However, self-learners retrieving such files still lack support in identifying which slides meet their specific needs. This is because, amongst presentation files intended for different levels of expertise, it is difficult to understand the context, and thus identify relevant information, of a user query in a slide. We describe a novel browsing method for e-learning by generating snippets for target slides. For this, we consider the relevant information between slides and identify the portions of the slides that are relevant to the query. By analyzing the keyword conceptual structure on the basis of semantic relations, and the document structure on the basis of the indent levels in the slides, not only can target slides be precisely retrieved, but their relevant portions can also be brought to the attention of the user. This is done by focusing on portions from either detailed or generalized slides at the conceptual level; this gives the surrounding context to help users easily determine which slides are useful. We also present a prototype system and the results of an evaluation of its effectiveness.
Modern embedded processors commonly use a set-associative scheme to reduce cache misses. However, a conventional set-associative cache has its drawbacks in terms of power consumption because it has to probe all ways to reduce the access time, although only the matched way is used. The energy spent in accessing the other ways is wasted, and the percentage of such energy will increase as cache associativity increases. Previous research, such as phased caches, way prediction caches and partial tag comparison, have been proposed to reduce the power consumption of set-associative caches by optimizing the cache access mode. However, these methods are not adaptable according to the program behavior because of using a single access mode throughout the program execution. In this paper, we propose a behavior-based adaptive access-mode for set-associative caches in embedded systems, which can dynamically adjust the access modes during the program execution. First, a program is divided into several phases based on the principle of program behavior repetition. Then, an off-system pre-analysis is used to exploit the optimal access mode for each phase so that each phase employs the different optimal access mode to meet the application's demand during the program execution. Our proposed approach requires little hardware overhead and commits most workload to the software, so it is very effective for embedded processors. Simulation by using Spec 2000 shows that our proposed approach can reduce roughly 76.95% and 64.67% of power for an instruction cache and a data cache, respectively. At the same time, the performance degradation is less than 1%.
In this work, Symbiotic Computing (SC) based solution to combat the problem of Information Explosion is addressed. Symbiotic Computing was proposed to bridge the gap between the Real Space (RS) and the Digital Space (DS) by creating symbiotic relations among users in the RS and the information resources such as software, data, etc. in the DS. SC is realized by adding a new axis, S/P computing (Social and Perceptual Computing), to the advanced ubiquitous computing consisting of ambient and web computing. Here, a new framework of SC based on Symbiotic Space (SS) and Symbiotic Space Platform (SSP) has been designed to construct and maintain Symbiotic Relations for S/P computing in order to reduce the burden of Information Explosion. Finally the feasibility of our proposal has been tested by bench-top simulation through applying logical model of Symbiotic Computing to a typical example of Information Explosion.
Convergence between cyber and physical spaces is accelerating due to the penetration of various ubiquitous services based on sensors and actuators. Effective sensors such as ultra low-cost wireless sensors, smart phones and pads allow us to couple real objects, people, places, and environment with the corresponding entities in the cyber space. Similarly, soft sensors such as blog, facebook, twitter, foursqure and other applications create new types of sensed data for cyber-physical coupling. for In this paper, we describe sensor enabled cyber-physical coupling for creating ubiquitous services in the SenseCampus project. We first classify cyber-physical coupling and ubiquitous services in the project. Several ubiquitous services, such as SensingCloud, DIY smart object services, Twitthings, Airy Notes, and Mebius Ring, are described. We then address the challenges in cyber-physical coupling for creating advanced ubiquitous services particularly for educational facilities.
Two Phase geographic Greedy Forwarding (TPGF) is a pure on-demand geographic greedy forwarding protocol for transmitting multimedia streams in wireless multimedia sensor networks (WMSNs), which has explicit route discovery, i.e., a node greedily forwards a routing packet to the neighbor that is the closest one to the destination to build a route. Like most geographic routing protocols, TPGF is vulnerable to some greedy forwarding attacks, e.g., spoofing or modifying control packets. As the first research effort that investigates the secure routing protocol in WMSNs, in this paper, we identify vulnerabilities in TPGF and propose corresponding countermeasures, e.g., secure neighbor discovery and route discovery, and propose the SecuTPGF, an extended version of TPGF, which exactly follows the original TPGF protocol's routing mechanism but with enhanced security and reliability. The effectiveness of SecuTPGF is proved by conducting security analysis and evaluation experiments.
With the unprecedented growth of data generated by mankind nowadays, it has become critical to develop efficient techniques for processing these massive data sets. To tackle such challenges, analytical data processing systems must be extremely efficient, scalable, and flexible as well as economically effective. Recently, Hadoop, an open-source implementation of MapReduce, has gained interests as a promising big data processing system. Although Hadoop offers the desired flexibility and scalability, its performance has been noted to be suboptimal when it is used to process complex analytical tasks. This paper presents E3, an elastic and efficient execution engine for scalable data processing. E3 adopts a “middle” approach between MapReduce and Dryad in that E3 has a simpler communication model than Dryad yet it can support multi-stages job better than MapReduce. E3 avoids reprocessing intermediate results by adopting a stage-based evaluation strategy and collocating data and user-defined (map or reduce) functions into independent processing units for parallel execution. Furthermore, E3 supports block-level indexes, and built-in functions for specifying and optimizing data processing flows. Benchmarking on an in-house cluster shows that E3 achieves significantly better performance than Hadoop, or put it another way, building an elastically scalable and efficient data processing system is possible.
Graph patterns are able to represent the complex structural relations among objects in many applications in various domains. The objective of graph summarization is to obtain a concise representation of a single large graph, which is interpretable and suitable for analysis. A good summary can reveal the hidden relationships between nodes in a graph. The key issue is how to construct a high-quality and representative super-graph, GS, in which a super-node summarizes a collection of nodes based on the similarity of attribute values and neighborhood relationships associated with nodes in G, and a super-edge summarizes the edges between nodes in G that are represented by two different super-nodes in GS. We propose an entropy-based unified model for measuring the homogeneity of the super-graph. The best summary in terms of homogeneity could be too large to explore. By using the unified model, we relax three summarization criteria to obtain an approximate homogeneous summary of reasonable size. We propose both agglomerative and divisive algorithms for approximate summarization, as well as pruning techniques and heuristics for both algorithms to save computation cost. Experimental results confirm that our approaches can efficiently generate high-quality summaries.
Threads share a single address space with each other. On the other hand, a process has its own address space. Since whether to share or not to share the address space depends on each data structure in the whole program, the choice of “a thread or a process” for the whole program is too much “all-or-nothing.” With this motivation, this paper proposes a half-process, a process partially sharing its address space with other processes. This paper describes the design and the kernel-level implementation of the half-process and discusses the potential applicability of the half-process for multi-thread programming with thread-unsafe libraries, intra-node communications in parallel programming frameworks and transparent kernel-level thread migration. In particular, the thread migration based on the half-process is the first work that achieves transparent kernel-level thread migration by solving the problem of sharing global variables between threads.
In this paper, we propose DATFM/DA (Data Acquisition and Transmission with Fixed and Mobile node with Deployment Adjusting), which is an extension of our previous mobile sensor control method, DATFM. DATFM/DA uses two types of sensor nodes, fixed node and mobile node. The data acquired by the nodes are accumulated on a fixed node before being transferred to the sink node. DATFM/DA divides the target region into multiple areas, and statically deploys mobile nodes to each divided area. In addition, DATFM/DA adjusts the number of mobile nodes deployed in each area based on the analysis of the performance. We also conduct simulation experiments to verify that this method further improves the performances of sensing and data transfer.
Keyword search in relational databases has been widely studied in recent years because it requires users neither to master a certain structured query language nor to know the complex underlying database schemas. Most existing methods focus on answering snapshot keyword queries in static databases. In practice, however, databases are updated frequently, and users may have long-term interests on specific topics. To deal with such situations, it is necessary to build effective and efficient facilities in a database system to support continual keyword queries. In this paper, we propose an efficient method for answering continual keyword queries over relational databases. The proposed method consists of two core algorithms. The first one computes a set of potential top-k results by evaluating the range of the future relevance score for every query result and creates a light-weight state for each keyword query. The second one uses these states to maintain the top-k results of keyword queries while the database is continually being updated. Experimental results validate the effectiveness and efficiency of the proposed method.
Parallel programming/execution frameworks for many/multi-core platforms should support as many applications as possible. In general, work-stealing frameworks provide efficient load balancing even for irregular parallel applications. Unfortunately, naïve parallel programs which traverse graph-based data structures (e.g., for constructing spanning trees) cause stack overflow or unacceptable load imbalance. In this study, we develop parallel programs to perform probabilistically balanced divide-and-conquer graph traversals. We propose a programming technique for accumulating overflowed calls for the next iteration of repeated parallel stages. In an emerging backtracking-based work-stealing framework called “Tascell, ” which features on-demand concurrency, we propose a programming technique for long-term exclusive use of workspaces, leading to a similar technique also in the Cilk framework.
In order to improve the resource utilization of clusters and supercomputers and thus deliver application results to users faster, it is essential for a job scheduler to expand and shrink parallel computations flexibly. In order to enable the flexible job scheduling, the parallel computations have to be reconfigurable. With this motivation, this paper proposes, implements and evaluates DMI, a global-view-based PGAS framework that enables easy programming of reconfigurable and high-performance parallel iterative computations. DMI provides programming interfaces with which a programmer can program the reconfiguration easily with a global-view. Our performance evaluations showed that DMI can efficiently adapt the parallelism of long-running parallel iterative computations, such as a real-world finite element method and large-scale iterative graph search, to the dynamic increase and decrease of available resources through the reconfiguration.
In this paper, we investigate the limitation caused by reception bandwidth in current overlay streaming, and propose a novel overlay network architecture for high-quality, real-time streaming. The proposed architecture consists of two components: 1) join and retransmission control (JRC), and 2) redundant node selection (RNS). The JRC dynamically adjusts the number of join and data retransmission requests based on the network condition and reception packet fluctuation of the receiver. The RNS selects retransmission nodes based on the retention probability of lost packets requested by the receivers. We have designed and implemented the algorithm of the proposed architecture. According to our evaluation, our approach results in an additional 1-2Mbps reception bandwidth from existing overlay streaming applications.
Wireless sensor network technologies have attracted a lot of attention in recent years. In this paper, we propose an energy-efficient data gathering mechanism using traveling wave and spatial interpolation for wireless sensor networks. In our proposed mechanism, sensor nodes schedule their message transmission timing in a fully-distributed manner such that they can gather sensor data over a whole wireless sensor network and transmit that data to a sink node while switching between a sleep state and an active state. In addition, each sensor node determines the redundancy of its sensor data according to received messages so that only necessary sensor data are gathered and transmitted to the sink node. Our proposed mechanism does not require additional control messages and enables both data traffic and control traffic to be drastically reduced. Through simulation experiments, we confirmed that with our proposed mechanism, the number of message transmissions can be reduced by up to 77% and the amount of transmitted data can be reduced by up to 13% compared to a conventional mechanism.
In this paper, we introduce a large-scale activity gathering system with mobile sensor devices such as smart phones and accelerometers. We gathered over 35, 000 activity data from more than 200 people over approximately 13 months. We describe the design rationale of the system, analyze the gathered data through statistics and clustering, and application of an existing activity recognition method. From the recognition, the performance of existing algorithm drastically deteriorated using the gathered data as training data. These results show that we were still able to find a challenging field for activity recognition in larger-scale activity data.
High-quality and high-performance real-time interactive video streaming requires both keeping the highest data transmission rate and minimizing data packet loss to achieve the best possible streaming quality. TCP-friendly rate control (TFRC) is the most widely recognized mechanism for achieving relatively smooth data transmission while competing fairly with TCP flows. However, because its data transmission rate depends largely on packet loss conditions, high-quality real-time streaming suffers from a significant degradation of streaming quality due to both a reduction in the data transmission rate and data packet losses. This paper proposes the dynamic probing forward error correction (DP-FEC) mechanism that is effective for high-quality real-time streaming to maximize the streaming quality in a situation in which competing TCP flows pose packet losses to the streaming flow. DP-FEC estimates the network condition by dynamically adjusting the degree of FEC redundancy while trying to recover lost data packets. It effectively utilizes network resources and adjusts the degree of FEC redundancy to improve the playback quality at the user side while minimizing the performance impact of competing TCP flows. We describe the DP-FEC algorithm and evaluate its effectiveness using an NS-2 simulator. The results show that by effectively utilizing network resources, DP-FEC enables to retain higher streaming quality while minimizing the adverse condition on TCP performance, thus achieving TCP friendliness.
A policy provisioning framework is described that supports management of the lifecycle of personal information and its data-handling policies distributed beyond security domains. A model for creating data-handling policies reflecting the intentions of its system administrator and the privacy preferences of the data owner is explained. Also, algorithms for systematically propagating and integrating data-handling policies from system entities in different administrative domains are presented. This framework enables data-handling policies to be properly deployed and enforced in a way that enhances security and privacy.
We propose a novel method for the incremental construction of causal networks to clarify the relationships among news events. We propose the Topic-Event Causal (TEC) model as a causal network model and an incremental constructing method based on it. In the TEC model, a causal relation is expressed using a directed graph and a vertex representing an event. A vertex contains structured keywords consisting of topic keywords and an SVO tuple. An SVO tuple, which consists of a tuple of subject, verb and object keywords represent the details of the event. To obtain a chain of causal relations, vertices representing a similar event need to be detected. We reduce the time taken to detect them by restricting the calculation to topics using topic keywords. We detect them on a concept level. We propose an identification method that identifies the sense of the keywords and introduce three semantic distance methods to compare keywords. Our method detects vertices representing similar events more precisely than conventional methods. We carried out experiments to validate the proposed methods.
Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new technology, we are running an open search engine infrastructure, TSUBAKI, on a high-performance computing environment. In this paper, we describe TSUBAKI infrastructure.
This paper proposes a method that speeds up a classifier trained with many conjunctive features: combinations of (primitive) features. The key idea is to precompute as partial results the weights of primitive feature vectors that represent fundamental classification problems and appear frequently in the target task. A prefix tree (trie) compactly stores the primitive feature vectors with their weights, and it enables the classifier to find for a given feature vector its longest prefix feature vector whose weight has already been computed. Experimental results on base phrase chunking and dependency parsing demonstrated that our method speeded up the SVM and LLM classifiers by a factor of 1.8 to 10.6.
We address the problem of improving variable-length-to-fixed-length codes (VF codes). A VF code that we deal here with is an encoding scheme that parses an input text into variable length substrings and then assigns a fixed length codeword to each parsed substring. VF codes have favourable properties for fast decoding and fast compressed pattern matching, but they are worse in compression ratio than the latest compression methods. The compression ratio of a VF code depends on the parse tree used as a dictionary. To gain a better compression ratio we present several improvement methods for constructing parse trees. All of them are heuristical solutions since it is intractable to construct the optimal parse tree. We compared our methods with the previous VF codes, and showed experimentally that their compression ratios reach to the level of state-of-the-art compression methods.
In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks is the lack of integrity and diversity of features. We propose to solve this problem by utilizing 43 image features that cover the holistic content of the image from global to subject, background and scene. In our approach, salient regions and the background are separated without prior knowledge. Each of them together with the whole image are treated independently for feature extraction. Extensive experiments were designed to show the efficiency and the effectiveness of our approach. We chose two publicly available datasets manually annotated with diverse nature of images for our experiments, namely, the Corel5K and ESP Game datasets. We confirm the superior performance of our approach over the use of a single whole image using sign test with p-value < 0.05. Furthermore, our combined feature set gives satisfactory performance compared to recently proposed approaches especially in terms of generalization even with just a simple combination. We also obtain a better performance with the same feature set versus the grid-based approach. More importantly, when using our features with the state-of-the-art technique, our results show higher performance in a variety of standard metrics.
The spatio-temporal correlation analysis between visual saliency and eye movements is presented for the estimation of the mental focus toward videos. We extract spatio-temporal dynamics patterns of saliency areas from the videos, which we refer to as saliency-dynamics patterns, and evaluate eye movements based on their correlation with the saliency-dynamics patterns in view. Experimental results using TV commercials demonstrate the effectiveness of the proposed method for the mental-focus estimation.
Watching a real teacher in a real environment from a good distance and with a clear viewing angle has a significant effect on learning physical tasks. This applies to physical-task learning in a mixed-reality environment as well. Observing and imitating body motion is important for learning some physical tasks, including spatial collaborative work. When people learn a task with physical objects, they want to try and practice the task with the actual objects. They also want to keep the referential behavior model close to them at all times. Showing the virtual teacher by using mixed-reality technology can create such an environment, and thus has been researched in this study. It is known that a virtual teacher-model's position and orientation influence (a) the number of errors, and (b) the accomplishment time in physical-task learning using mixed-reality environments. This paper proposes an automatic adjustment method governing the virtual teacher's horizontal rotation angle, so that the learner can easily observe important body motions. The method divides the whole task motion into fixed duration segments, and seeks the most important moving part of the body in each segment, and then rotates the virtual teacher to show the most important part to the learner accordingly. To evaluate the method, a generic physical-task learning experiment was conducted. The method was revealed to be effective for motions that gradually reposition the most important moving part, such as in some manufacturing and cooking tasks. This study is therefore considered likely to enhance the transference of physical-task skills.
This paper presents a robust hybrid approach to Predictive Lane Detection - PLD, which utilizes information from digital map to improve efficiency and accuracy to vision-based lane detector. Traditional approaches are mostly designed for well maintained and simple road conditions like motorway or interstate road with clear lane markers, to solve out the estimation problems of coming road shape as well as vehicle's position and ego-state, which however becomes ambiguous or unavailable in the complicated road environment and under difficult weather or illumination conditions. In this paper, the proposed approach refers to vehicle localization on digital map for road geometry estimation, which gives strong cues for vision-based detector to limit the search region of road candidates and suppress noise. In addition, other information from digital map like lane marker painting color and categories is utilized in the high level of data fusion on road geometry estimation. Real and synthesized road experiment results verified the effectiveness and efficiency of our approach.
The penetration rate is one of the most important factors that affects the effectiveness of the mobile phone-based traffic state estimation. This article thoroughly investigates the influence of the penetration rate on the traffic state estimation using mobile phones as traffic probes and proposes reasonable solutions to minimize such influence. In this research, the so-called “acceptable” penetration rate, at which the estimation accuracy is kept as an “acceptable” level, is identified. This recognition is important to bring the mobile phone-based traffic state estimation systems into realization. In addition, two novel “velocity-density inference” models, namely the “adaptive” and the “adaptive feedback” velocity-density inference circuits, are proposed to improve the effectiveness of the traffic state estimation. Furthermore, an artificial neural network-based prediction approach is introduced to a the effectiveness of the velocity and the density estimation when the penetration rate degrades to 0%. These improvements are practically meaningful since they help to guarantee a high accurate traffic state estimation, even in cases of very low penetration rate. The experimental evaluations reveal the effectiveness as well as the robustness of the proposed solutions.
The rapid spread of smart phones causes the explosion of media traffic. This encourages mobile network operators (MNOs) to coordinate multiple access networks. For such an MNO, the IP Multimedia Subsystem (IMS) is a promising service control infrastructure to ensure a QoS guaranteed communication path for the media in the multi-access network. IMS-based service continuity enables user equipments (UEs) to continuously use IMS-based services (e.g., VoIP) even when the UEs make handovers between different access networks where the UE is assigned the different IP address, respectively. In the case where the UE cannot simultaneously use multiple wireless devices, there is the possibility of a long media disruption time during handovers. This is caused by several consecutive handovers as a result of attempting to discover the access network where the UE can have the QoS-guaranteed communications. In this paper, we propose a method for reducing the media disruption time when the UE makes handovers between different access networks. In the proposal, the UE proactively performs the service continuity procedure, and selects an access network that can provide the required network resources to the UE. We implement and evaluate the proposed method, and show how the media disruption time can be reduced.
This paper proposes a distributed algorithm to calculate a subnetwork of a given wireless sensor network (WSN) connecting a set of sources and a set of sinks, in such a way that: 1) the length of one of the shortest paths connecting from a source to a sink in the subgraph does not exceed the distance from the source to the farthest sink in the original graph, and 2) the number of links contained in the subgraph is smallest. The proposed algorithm tries to improve an initial solution generated by a heuristic scheme by repeatedly applying a local search. The result of simulations indicates that: 1) using a heuristic to generate an initial solution, the size of the initial solution is reduced by 10% compared with a simple shortest path tree; and 2) the local search reduces the size of the resultant subgraph by 20% and the cost required for such an improvement by the local search can be recovered by utilizing the resultant subgraph for a sufficiently long time such as a few days.
A majority of research efforts in the domain of Electronic Health Records concentrate on standardization and related issues. The earlier forms of medical records did not permit a high level of exchange, interoperability or extensive search and querying. Recent research has focused on the development of open standards for life-time long health record archives for individual patients. This facilitates the extensive use of data mining and querying techniques for the analysis. These efforts can increase the depth and the extent of the utilization of patient data. For example, association analysis can be used to identify common features among disparate patients to check whether diagnoses or procedures are effective. Pattern discovery techniques can also be used to create the census reports and generate a meaningful visualization of summary data at hospitals. For handling the large volume of data, there is a need to focus on improving the usability. The current study proposes a model for the development of EHR support systems. It aims to capture the health worker's needs in a scientific way, on a continuous basis. The proposal has been evaluated for the accuracy of knowledge discovery to improve the usability.