In recent years, dramatic improvements have been made to computer hardware. In particular, the number of cores on a chip has been growing exponentially, enabling an ever-increasing number of processes to be executed in parallel. Having been originally developed for single-core processors, database (DB) management systems (DBMSs) running on multicore processors suffer from cache conflicts as the number of concurrently executing DB processes (DBPs) increases. Therefore, a cache-efficient solution for arranging the execution of concurrent DBPs on multicore platforms would be highly attractive for DBMSs. In this paper, we propose CARIC-DA, middleware for achieving higher performance in DBMSs on multicore processors, by reducing cache misses with a new cache-conscious dispatcher for concurrent queries. CARIC-DA logically range-partitions the dataset into multiple subsets. This enables different processor cores to access different subsets by ensuring that different DBPs are pinned to different cores and by dispatching queries to DBPs according to the data-partitioning information. In this way, CARIC-DA is expected to achieve better performance via a higher cache hit rate for the private cache of each core. It can also balance the loads between cores by changing the range of each subset. Note that CARIC-DA is pure middleware, meaning that it avoids any modification to existing operating systems (OSs) and DBMSs, thereby making it more practical. This is important because the source code for existing DBMSs is large and complex, making it very expensive to modify. We implemented a prototype that uses unmodified existing Linux and PostgreSQL environments, and evaluated the effectiveness of our proposal on three different multicore platforms. The performance evaluation against benchmarks revealed that CARIC-DA achieved improved cache hit rates and higher performance.
Power-aware distributed file systems for efficient Big Data processing are increasingly moving towards power-proportional designs. However, current data placement methods for such systems have not given careful consideration to the effect of gear-shifting during operations. If the system wants to shift to a higher gear, it must reallocate the updated datasets that were modified in a lower gear when a subset of the nodes was inactive, but without disrupting the servicing of requests from clients. Inefficient gear-shifting that requires a large amount of data reallocation greatly degrades the system performance. To address this challenge, this paper proposes a data placement method known as Accordion, which uses data replication to arrange the data layout comprehensively and provide efficient gear-shifting. Compared with current methods, Accordion reduces the amount of data transferred, which significantly shortens the period required to reallocate the updated data during gear-shifting then able to improve the performance of the systems. The effect of this reduction is larger with higher gears, so Accordion is suitable for smooth gear-shifting in multigear systems. Moreover, the times when the active nodes serve the requests are well distributed, so Accordion is capable of higher scalability than existing methods based on the I/O throughput performance. Accordion does not require any strict constraint on the number of nodes in the system therefore our proposed method is expected to work well in practical environments. Extensive empirical experiments using actual machines with an Accordion prototype based on the Hadoop Distributed File System demonstrated that our proposed method significantly reduced the period required to transfer updated data, i.e., by 66% compared with an existing method.
Filtering uninteresting data is important to utilize “big data”. Skyline query is popular technique to filter uninteresting data, in which it selects a set of objects that are not dominated by another from a given large database. However, a skyline query often retrieves too many objects to analyze intensively especially for high-dimensional dataset. To solve the problem, k-dominant skyline queries have been introduced. The size of databases sometimes become too large to compute in a centralized environment. Conventional algorithms for computing k-dominant skyline queries are not well suited for parallel and distributed environments, such as the MapReduce framework. In this paper, we consider an efficient parallel algorithm to process k-dominant skyline query in MapReduce framework. Extensive experiments demonstrate the scalability of proposed algorithm for synthetic big datasets under different settings of data distribution, dimensionality, and cardinality.
Objects tracking methods have been wildly used in the field of video surveillance, motion monitoring, robotics and so on. Particle filter is one of the promising methods, but it is difficult to apply to real-time objects tracking because of its high computation cost. In order to reduce the processing cost without sacrificing the tracking quality, this paper proposes a new method for real-time 3D objects tracking, using parallelized particle filter algorithms by MapReduce architecture which is running on GPGPU. Our methods are as follows. First, we use a Kinect to get the 3D information of objects. Unlike the conventional 2D-based objects tracking, 3D objects tracking adds depth information. It can track not only from the x and y axis but also from the z axis, and the depth information can correct some errors in 2D objects tracking. Second, to solve the high computation cost problem, we use the MapReduce architecture on GPGPU to parallelize the particle filter algorithm. We implement the particle filter algorithms on GPU and evaluate the performance by actually running a program on CUDA5.5.
Developing a practical and accurate statistical parser for low-resourced languages is a hard problem, because it requires large-scale treebanks, which are expensive and labor-intensive to build from scratch. Unsupervised grammar induction theoretically offers a way to overcome this hurdle by learning hidden syntactic structures from raw text automatically. The accuracy of grammar induction is still impractically low because frequent collocations of non-linguistically associable units are commonly found, resulting in dependency attachment errors. We introduce a novel approach to building a statistical parser for low-resourced languages by using language parameters as a guide for grammar induction. The intuition of this paper is: most dependency attachment errors are frequently used word orders which can be captured by a small prescribed set of linguistic constraints, while the rest of the language can be learned statistically by grammar induction. We then show that covering the most frequent grammar rules via our language parameters has a strong impact on the parsing accuracy in 12 languages.
Memory leak occurs when useless objects cannot be released for a long time during program execution. Memory leaked objects may cause memory overflow, system performance degradation and even cause the system to crash when they become serious. This paper presents a dynamic approach for detecting and measuring memory leaked objects in Java programs. First, our approach tracks the program by JDI and records heap information to find out the potentially leaked objects. Second, we present memory leaking confidence to measure the influence of these objects on the program. Finally, we select three open-source programs to evaluate the efficiency of our approach. Furthermore, we choose ten programs from DaCapo 9.12 benchmark suite to reveal the time overhead of our approach. The experimental results show that our approach is able to detect and measure memory leaked objects efficiently.
Ontology mapping is important in many areas, such as information integration, semantic web and knowledge management. Thus the effectiveness of ontology mapping needs to be further studied. This paper puts forward a mapping method between different ontology concepts in the same field. Firstly, the algorithms of calculating four individual similarities (the similarities of concept name, property, instance and structure) between two concepts are proposed. The algorithm features of four individual similarities are as follows: a new WordNet-based method is used to compute semantic similarity between concept names; property similarity algorithm is used to form property similarity matrix between concepts, then the matrix will be processed into a numerical similarity; a new vector space model algorithm is proposed to compute the individual similarity of instance; structure parameters are added to structure similarity calculation, structure parameters include the number of properties, instances, sub-concepts, and the hierarchy depth of two concepts. Then similarity of each of ontology concept pairs is represented by a vector. Finally, Support Vector Machine (SVM) is used to accomplish mapping discovery by training and learning the similarity vectors. In this algorithm, Harmony and reliability are used as the weights of the four individual similarities, which increases the accuracy and reliability of the algorithm. Experiments achieve good results and the results show that the proposed method outperforms many other methods of similarity-based algorithms.
Recently, the ratio of probability density functions was demonstrated to be useful in solving various machine learning tasks such as outlier detection, non-stationarity adaptation, feature selection, and clustering. The key idea of this density ratio approach is that the ratio is directly estimated so that difficult density estimation is avoided. So far, parametric and non-parametric direct density ratio estimators with various loss functions have been developed, and the kernel least-squares method was demonstrated to be highly useful both in terms of accuracy and computational efficiency. On the other hand, recent study in pattern recognition exhibited that deep architectures such as a convolutional neural network can significantly outperform kernel methods. In this paper, we propose to use the convolutional neural network in density ratio estimation, and experimentally show that the proposed method tends to outperform the kernel-based method in outlying image detection.
Tracking algorithms for arbitrary objects are widely researched in the field of computer vision. At the beginning, an initialized bounding box is given as the input. After that, the algorithms are required to track the objective in the later frames on-the-fly. Tracking-by-detection is one of the main research branches of online tracking. However, there still exist two issues in order to improve the performance. 1) The limited processing time requires the model to extract low-dimensional and discriminative features from the training samples. 2) The model is required to be able to balance both the prior and new objectives' appearance information in order to maintain the relocation ability and avoid the drifting problem. In this paper, we propose a real-time tracking algorithm called coupled randomness tracking (CRT) which focuses on dealing with these two issues. One randomness represents random projection, and the other randomness represents online random forests (ORFs). In CRT, the gray-scale feature is compressed by a sparse measurement matrix, and ORFs are used to train the sample sequence online. During the training procedure, we introduce a tree discarding strategy which helps the ORFs to adapt fast appearance changes caused by illumination, occlusion, etc. Our method can constantly adapt to the objective's latest appearance changes while keeping the prior appearance information. The experimental results show that our algorithm performs robustly with many publicly available benchmark videos and outperforms several state-of-the-art algorithms. Additionally, our algorithm can be easily utilized into a parallel program.
Topic features are useful in improving text summarization. However, independency among topics is a strong restriction on most topic models, and alleviating this restriction can deeply capture text structure. This paper proposes a hybrid topic model to generate multi-document summaries using a combination of the Hidden Topic Markov Model (HTMM), the surface texture model and the topic transition model. Based on the topic transition model, regular topic transition probability is used during generating summary. This approach eliminates the topic independence assumption in the Latent Dirichlet Allocation (LDA) model. Meanwhile, the results of experiments show the advantage of the combination of the three kinds of models. This paper includes alleviating topic independency, and integrating surface texture and shallow semantic in documents to improve summarization. In short, this paper attempts to realize an advanced summarization system.
This report describes a robust method of instantaneous heart rate (IHR) extraction from noisy electrocardiogram (ECG) signals. Generally, R-waves are extracted from ECG using a threshold to calculate the IHR from the interval of R-waves. However, noise increases the incidence of misdetection and false detection in wearable healthcare systems because the power consumption and electrode distance are limited to reduce the size and weight. To prevent incorrect detection, we propose a short-time autocorrelation (STAC) technique. The proposed method extracts the IHR by determining the search window shift length which maximizes the correlation coefficient between the template window and the search window. It uses the similarity of the QRS complex waveform beat-by-beat. Therefore, it has no threshold calculation process. Furthermore, it is robust against noisy environments. The proposed method was evaluated using MIT-BIH arrhythmia and noise stress test databases. Simulation results show that the proposed method achieves a state-of-the-art success rate of IHR extraction in a noise stress test using a muscle artifact and a motion artifact.
To enhance the privacy of vehicle owners, combinatorial certificate management schemes assign each certificate to a large enough group of vehicles so that it will be difficult to link a certificate to any particular vehicle. When an innocent vehicle shares a certificate with a misbehaving vehicle and the certificate on the misbehaving vehicle has been revoked, the certificate on the innocent vehicle also becomes invalid and is said to be covered. When a group of misbehaving vehicles collectively share all the certificates assigned to an innocent vehicle and these certificates are revoked, the innocent vehicle is said to be covered. We point out that the previous analysis of the vehicle cover probability is not correct and then provide a new and exact analysis of the vehicle cover probability.
This paper presents a novel peer-to-peer protocol to efficiently distribute virtual machine images in a datacenter. A primary idea of it is to improve the performance of peer-to-peer content delivery by employing deduplication to take advantage of similarity both among and within VM images in cloud datacenters. The efficacy of the proposed scheme is validated through an evaluation that demonstrates substantial performance gains.
Appearance changes conform to certain rules for a same person,while for different individuals the changes are uncontrolled. Hence, this paper studies the age progression rules to tackle face verification task. The age progression rules are discovered in the difference space of facial image pairs. For this, we first represent an image pair as a matrix whose elements are the difference of a set of visual words. Thereafter, the age progression rules are trained using Support Vector Machine (SVM) based on this matrix representation. Finally, we use these rules to accomplish the face verification tasks. The proposed approach is tested on the FGnet dataset and a collection of real-world images from identification card. The experimental results demonstrate the effectiveness of the proposed method for verification of identity.
Robust crater recognition is a research focus on deep space exploration mission, and sparse representation methods can achieve desirable robustness and accuracy. Due to destruction and noise incurred by complex topography and varied illumination in planetary images, a robust crater recognition approach is proposed based on dictionary learning with a low-rank error correction model in a sparse representation framework. In this approach, all the training images are learned as a compact and discriminative dictionary. A low-rank error correction term is introduced into the dictionary learning to deal with gross error and corruption. Experimental results on crater images show that the proposed method achieves competitive performance in both recognition accuracy and efficiency.