Suppose that we have a DTD and XML documents valid against the DTD, and consider writing an XPath query to the documents. Unfortunately, a user often does not understand the entire structure of the documents exactly, especially in the case where the documents are very large and/or complex, or the DTD has been updated but the user misses it. In such cases, the user tends to write an invalid XPath query. However, it is difficult for the user to correct the query by hand due to his/her lack of exact knowledge about the entire structure of the documents. In this paper, we propose an algorithm that finds, for an XPath query q, a DTD D, and a positive integer K, top-K XPath queries most syntactically close to q among the XPath queries conforming to D, so that a user select an appropriate query among the K queries. We also present some experimental studies.
Most implementations of regular expression matching in programming languages are based on backtracking. With this implementation strategy, matching may not be achieved in linear time with respect to the length of the input. In the worst case, it may take exponential time. In this paper, we propose a method of checking whether or not regular expression matching runs in linear time. We construct a top-down tree transducer with regular lookahead that translates the input string into a tree corresponding to the execution steps of matching based on backtracking. The regular expression matching then runs in linear time if the tree transducer is of linear size increase. To check this property of the tree transducer, we apply a result of Engelfriet and Maneth. We implemented the method in OCaml and conducted experiments that checked the time linearity of regular expressions appearing in several popular PHP programs. Our implementation showed that 47 of 393 regular expressions were not linear.
This paper proposes a parallel algorithm to extract all connected subgraphs, each of which shares a common itemset whose size is not less than a given threshold, from a given graph in which each vertex is associated to an itemset. We also propose implementations of this algorithm using the task-parallel language Tascell. This kind of graph mining can be applied to analysis of social or biological networks. We have already proposed an efficient sequential search algorithm called COPINE for this problem. COPINE reduces the search space of a dynamically growing tree structure by pruning its branches corresponding to the following subgraphs; already visited, having itemsets smaller than the threshold, and having already-visited supergraphs with identical itemsets. For the third pruning, we use a table associating already-visited subgraphs and their itemsets. To avoid excess pruning in a parallel search where a unique set of subtrees (tasks) is assigned to each worker, we should put a certain restriction on a worker when it is referring to a table entry registered by another worker. We designed a parallel algorithm as an extension of COPINE by introducing this restriction. A problem of the implementation is how workers efficiently share the table entries so that a worker can safely use as many entries registered by other workers as possible. We implemented two sharing methods: (1) a victim worker makes a copy of its own table and passes it to a thief worker when the victim spawns a task by dividing its task and assigns it to the thief, and (2) a single table controlled by locks is shared among workers. We evaluated these implementations using a real protein network. As a result, the single table implementation achieved a speedup of approximately a factor four with 16 workers.
Tree data such as XML trees have recently been getting larger and larger. Parallel and distributed processing is a promising way of dealing with big data, but we need to divide the data in the first step. Since computation over trees often requires relationships between parents and children and/or among siblings, we should pay attention to such relationships. There is a technique called the “m-bridge” for dividing trees. We can easily compute m-bridges for trees of any shape. However, division with the m-bridge technique is sometimes unsatisfactory for shallow XML trees. We propose a method of tree division for XML trees in this study, in which we apply the m-bridge technique to a one-to-one corresponding binary tree. We implement the tree division algorithm using the Simple API for XML (SAX) Parser. An important feature of our algorithm is that we transform and divide XML trees in the order that the SAX parser reads the trees. We carried out experiments and discuss the properties of the tree division algorithm we propose. In addition, we discuss how we can use the divided trees with query examples.
Assurance cases are documented body of evidence that provide valid and convincing argument that the system is adequately dependable in a given application and an environment. Assurance cases are widely required as a regulation for safety-critical systems in EU. There have been several graphical notations for assurance cases. GSN (Goal Structuring Notation) and CAE (Claim, Argument, Evidence) are such two notations. However, these notations have not been defined in a formal way. This paper presents a formal definition of GSN and its pattern extensions. We take the framework of functional programming language as the basis of our study. The implementation has been done on an Eclipse based GSN editor. We report case studies on previous works about GSN and show the applicability of the design and implementation. This is a step toward developing an assurance case language.
In statistical machine translation, Chinese and Japanese is a well-known long-distance language pair that causes difficulties to word alignment techniques. Pre-reordering methods have been proven efficient and effective; however, they need reliable parsers to extract the syntactic structure of the source sentences. On one hand, we propose a framework in which only part-of-speech (POS) tags and unlabeled dependency parse trees are used to minimize the influence of parse errors, and linguistic knowledge on structural difference is encoded in the form of reordering rules. We show significant improvements in translation quality of sentences in the news domain over state-of-the-art reordering methods. On the other hand, we explore the relationship between dependency parsing and our pre-reordering method from two aspects: POS tags and dependencies. We observe the effects of different parse errors on reordering performance by combining empirical and descriptive approaches. In the empirical approach, we quantify the distribution of general parse errors along with reordering quality. In the descriptive approach, we extract seven influential error patterns and examine their correlations with reordering errors.
We propose a method of collective sentiment classification that assumes dependencies among labels of an input set of reviews. The key observation behind our method is that the distribution of polarity labels over reviews written by each user or written on each product is often skewed in the real world; intolerant users tend to report complaints while popular products are likely to receive praise. We encode these characteristics of users and products (referred to as user leniency and product popularity) by introducing global features in supervised learning. To resolve dependencies among labels of a given set of reviews, we explore two approximated decoding algorithms, “easiest-first decoding” and “two-stage decoding.” Experimental results on real-world datasets with user and/or product information confirm that our method contributed greatly to classification accuracy.
A fundamental problem in conventional photography is that movement of the camera or captured object causes motion blur in the image. In this research, we propose coding motion-invariant blur using a programmable aperture camera. The camera realizes virtual camera motion by translating the opening, and as a result, we obtain a coded image in which motion blur is invariant with respect to object velocity. Therefore, we can reduce motion blur without having to estimate motion blur kernels or requiring knowledge of the object speed. We model a projection of the programmable aperture camera and also demonstrate that our proposed coding works using a prototype camera.
This paper presents a novel image processing method to enhance appearance of micro-structure of a living-organ mucosa using polarized lighting and imaging. A new technique that uses two pairs of parallel and crossed nicol polarimetric images captured under two different linearly polarized lightings are presented, and an averaged subtracted polarization image (AVSPI) which is calculated from the above four images is introduced. Feasibility experiments were performed using the prototype of polarimetric endoscope hardware using excised porcine stomachs.
It is known that time-to-contact toward objects can be estimated just from changes in the object size in camera images, and we do not need any additional information, such as distance toward objects, camera speed and camera parameters. However, the existing methods cannot compute time-to-contact, if there are no geometric features in the images. In this paper, we propose a new method for computing time-to-contact by using photometric information. When a light source moves in the scene, an observed intensity changes according to the motion of the light source. In this paper, we analyze the change in intensity in camera images, and show that the time-to-contact can be estimated just from the change in intensity in images. Our method does not need any additional information, such as radiance of light source, reflectance of object and orientation of object surface. The proposed method can be used in various applications, such as vehicle driver assistance.
The paper addresses the recognition problem of defocused patterns. Though recognition algorithms assume that the input images are focused and sharp, it does not always hold on actual camera-captured images. Thus, a recognition method that can recognize defocused patterns is required. In this paper, we propose a novel recognition framework for defocused patterns, relying on a single camera without a depth sensor. The framework is based on the coded aperture which can recover a less-degraded image from a defocused image if depth is available. However, in the problem setting of “a single camera without a depth sensor, ” estimating depth is ill-posed and an assumption is required to estimate the depth. To solve the problem, we introduce a new assumption suitable for pattern recognition; templates are known. It is based on the fact that in pattern recognition, all templates must be available in advance for training. The experiments confirmed that the proposed method is fast and robust to defocus and scaling, especially for heavily defocused patterns.
This paper describes a quality-dependent score-level fusion framework of face, gait, and the height biometrics from a single walking image sequence. Individual person authentication accuracies by face, gait, and the height biometrics, are in general degraded when spatial resolution (image size) and temporal resolution (frame-rate) of the input image sequence decrease and the degree of such accuracy degradation differs among the individual modalities. We therefore set the optimal weights of the individual modalities based on linear logistic regression framework depending on a pair of the spatial and temporal resolutions, which are called qualities in this paper. On the other hand, it is not a realistic solution to compute and store the optimal weights for all the possible qualities in advance, and also the optimal weights change across the qualities in a nonlinear way. We thus propose a method to estimate the optimal weights for arbitrary qualities from a limited training pairs of the optimal weights and the qualities, based on Gaussian process regression with a nonlinear kernel function. Experiments using a publicly available large population gait database with 1, 935 subjects under various qualities, showed that the person authentication accuracy improved by successfully estimating the weights depending on the qualities.
We seek to localize a query panorama with a wide field of view given a large database of street-level geotagged imagery. This is a challenging task because of significant changes in appearance due to viewpoint, season, occluding people or newly constructed buildings. An additional key challenge is the computational and memory efficiency due to the planet-scale size of the available geotagged image databases. The contributions of this paper are two-fold. First, we develop a compact image representation for scalable retrieval of panoramic images that represents each panorama as an ordered set of vertical image tiles. Two panoramas are matched by efficiently searching for their optimal horizontal alignment, while respecting the tile ordering constraint. Second, we collect a new challenging query test dataset from Shibuya, Tokyo containing more than thousand panoramic and perspective query images with manually verified ground truth geolocation. We demonstrate significant improvements of the proposed method compared to the standard bag-of-visual-words and VLAD baselines.
We propose a novel method to estimate the head orientation of a pedestrian. There have been many methods for head orientation estimation based on facial textures of pedestrians. It is, however, impossible to apply these methods to low-resolution images which are captured by a surveillance camera at a distance. To deal with the problem, we construct a method that is not based on facial textures but on gait features, which are robustly obtained even from low-resolution images. In our method, first, size-normalized silhouette images of pedestrians are generated from captured images. We then obtain the Gait Energy Image (GEI) from the silhouette images as a gait feature. Finally, we generate a discriminant model to classify their head orientation. For this training step, we build a dataset consisting of gait images of over 100 pedestrians and their head orientations. In evaluation experiments using the dataset, we classified their head orientation by the proposed method. We confirmed that gait changes of the whole body were efficient for the estimation in quite low-resolution images which existing methods cannot deal with due to the lack of facial textures.
This paper proposes a background estimation method from a single omnidirectional image sequence for removing undesired regions such as moving objects, specular regions, and uncaptured regions caused by the camera's blind spot without manual specification. The proposed method aligns multiple frames using a reconstructed 3D model of the environment and generates background images by minimizing an energy function for selecting a frame for each pixel. In the energy function, we introduce patch similarity and camera positions to remove undesired regions more correctly and generate high-resolution images. In experiments, we demonstrate the effectiveness of the proposed method by comparing the result given by the proposed method with those from conventional approaches.
In this paper, we propose a method to achieve positions and poses of multiple cameras and temporal synchronization among them by using blinking calibration patterns. In the proposed method, calibration patterns are shown on tablet PCs or monitors, and are observed by multiple cameras. By observing several frames from the cameras, we can obtain the camera positions, poses and frame correspondences among cameras. The proposed calibration patterns are based on pseudo random volumes (PRV), a 3D extension of pseudo random sequences. Using PRV, we can achieve the proposed method. We believe our method is useful not only for multiple camera systems but also for AR applications for multiple users.
In this work, to the best of our knowledge, we propose a stand-alone large-scale image classification system running on an Android smartphone. The objective of this work is to prove that mobile large-scale image classification requires no communication to external servers. To do that, we propose a scalar-based compression method for weight vectors of linear classifiers. As an additional characteristic, the proposed method does not need to uncompress the compressed vectors for evaluation of the classifiers, which brings the saving of recognition time. We have implemented a large-scale image classification system on an Android smartphone, which can perform 1000-class classification for a given image in 0.270 seconds. In the experiment, we show that compressing the weights to 1/8 leaded to only 0.80% performance loss for 1000-class classification with the ILSVRC2012 dataset. In addition, the experimental results indicate that weight vectors compressed in low bits, even in the binarized case (bit =1), are still valid for classification of high dimensional vectors.
For 3D active measurement methods using video projector, there is the implicit limitation that the projected patterns must be in focus on the target object. Such limitation set a severe constraints on possible range of the depth for reconstruction. In order to overcome the problem, Depth from Defocus (DfD) method using multiple patterns with different in-focus depth is proposed to expand the depth range in the paper. With the method, not only the range of the depth is extended, but also the shape can be recovered even if there is an obstacle between the projector and the target, because of the large aperture of the projector. Furthermore, thanks to the advantage of DfD which does not require baseline between the cameras and the projector, occlusion does not occur with the method. In order to verify the effectiveness of the method, several experiments using the actual system was conducted to estimate the depth of several objects.
This paper introduces a novel method for image classification using local feature descriptors. The method utilizes linear subspaces of local descriptors for characterizing their distribution and extracting image features. The extracted features are transformed into more discriminative features by the linear discriminant analysis and employed for recognizing their categories. Experimental results demonstrate that this method is competitive with the Fisher kernel method in terms of classification accuracy.