This paper reports on the trending literature of occlusion handling in the task of online visual tracking. The discussion first explores visual tracking realm and pinpoints the necessity of dedicated attention to the occlusion problem. The findings suggest that although occlusion detection facilitated tracking impressively, it has been largely ignored. The literature further showed that the mainstream of the research is gathered around human tracking and crowd analysis. This is followed by a novel taxonomy of types of occlusion and challenges arising from it, during and after the emergence of an occlusion. The discussion then focuses on an investigation of the approaches to handle the occlusion in the frame-by-frame basis. Literature analysis reveals that researchers examined every aspect of a tracker design that is hypothesized as beneficial in the robust tracking under occlusion. State-of-the-art solutions identified in the literature involved various camera settings, simplifying assumptions, appearance and motion models, target state representations and observation models. The identified clusters are then analyzed and discussed, and their merits and demerits are explained. Finally, areas of potential for future research are presented.
MapReduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.
Adherence to coding conventions during the code production stage of software development is essential. Benefits include enabling programmers to quickly understand the context of shared code, communicate with one another in a consistent manner, and easily maintain the source code at low costs. In reality, however, programmers tend to doubt or ignore the degree to which the quality of their code is affected by adherence to these guidelines. This paper addresses research questions such as “Do violations of coding conventions affect the readability of the produced code?”, “What kinds of coding violations reduce code readability?”, and “How much do variable factors such as developer experience, project size, team size, and project maturity influence coding violations?” To respond to these research questions, we explored 210 open-source Java projects with 117 coding conventions from the Sun standard checklist. We believe our findings and the analysis approach used in the paper will encourage programmers and QA managers to develop their own customized and effective coding style guidelines.
Graphical User Interfaces (GUIs) are critical for the security, safety and reliability of software systems. Injection attacks, for instance via SQL, succeed due to insufficient input validation and can be avoided if contract-based approaches, such as Design by Contract, are followed in the software development lifecycle of GUIs. This paper proposes a model-based testing approach for detecting GUI data contract violations, which may result in serious failures such as system crash. A contract-based model of GUI data specifications is used to develop test scenarios and to serve as test oracle. The technique introduced uses multi terminal binary decision diagrams, which are designed as an integral part of decision table-augmented event sequence graphs, to implement a GUI testing process. A case study, which validates the presented approach on a port scanner written in Java programming language, is presented.
We propose a method of spread spectrum digital watermarking with quantization index modulation (QIM) and evaluate the method on the basis of IHC evaluation criteria. The spread spectrum technique can make watermarks robust by using spread codes. Since watermarks can have redundancy, messages can be decoded from a degraded stego-image. Under IHC evaluation criteria, it is necessary to decode the messages without the original image. To do so, we propose a method in which watermarks are generated by using the spread spectrum technique and are embedded by QIM. QIM is an embedding method that can decode without an original image. The IHC evaluation criteria include JPEG compression and cropping as attacks. JPEG compression is lossy compression. Therefore, errors occur in watermarks. Since watermarks in stego-images are out of synchronization due to cropping, the position of embedded watermarks may be unclear. Detecting this position is needed while decoding. Therefore, both error correction and synchronization are required for digital watermarking methods. As countermeasures against cropping, the original image is divided into segments to embed watermarks. Moreover, each segment is divided into 8×8 pixel blocks. A watermark is embedded into a DCT coefficient in a block by QIM. To synchronize in decoding, the proposed method uses the correlation between watermarks and spread codes. After synchronization, watermarks are extracted by QIM, and then, messages are estimated from the watermarks. The proposed method was evaluated on the basis of the IHC evaluation criteria. The PSNR had to be higher than 30 dB. Ten 1920×1080 rectangular regions were cropped from each stego-image, and 200-bit messages were decoded from these regions. Their BERs were calculated to assess the tolerance. As a result, the BERs were less than 1.0%, and the average PSNR was 46.70 dB. Therefore, our method achieved a high image quality when using the IHC evaluation criteria. In addition, the proposed method was also evaluated by using StirMark 4.0. As a result, we found that our method has robustness for not only JPEG compression and cropping but also additional noise and Gaussian filtering. Moreover, the method has an advantage in that detection time is small since the synchronization is processed in 8×8 pixel blocks.
A two-handed distance control method is proposed for precisely and efficiently manipulating a virtual 3D object by hand in an immersive virtual reality environment. The proposed method enhances direct manipulation by hand and is used to precisely control and efficiently adjust the position of an object and the viewpoint using the distance between the two hands. The two-handed method is evaluated and compared with the previously proposed one-handed speed control method, which adjusts the position of an object in accordance with the speed of one hand. The results from experimental evaluation show that two-handed methods, which make position and viewpoint adjustments, are the best among six combinations of control and adjustment methods.
Digital video signals are subject to several distortions due to compression processes, transmission over noisy channels or video processing. Therefore, the video quality evaluation has become a necessity for broadcasters and content providers interested in offering a high video quality to the customers. Thus, an objective no-reference video quality assessment metric is proposed based on the sigmoid model using spatial-temporal features weighted by parameters obtained through the solution of a nonlinear least squares problem using the Levenberg-Marquardt algorithm. Experimental results show that when it is applied to MPEG-2 streams our method presents better linearity than full-reference metrics, and its performance is close to that achieved with full-reference metrics for H.264 streams.
A cascaded video denoising method based on frame averaging is proposed in this paper. A novel segmentation approach using intensity and structure tensor is used for change compensation, which can effectively suppress noise while preserving the structure of an image. The cascaded framework solves the problem of noise residual caused by single-frame averaging. The classical Wiener filter is used for spatial denoising in changing areas. Our algorithm works in real-time on an FPGA, since it does not involve future frames. Experiments on standard grayscale videos for various noise levels demonstrate that the proposed method is competitive with current state-of-the-art video denoising algorithms on both peak signal-to-noise ratio and structural similarity evaluations, particularly when dealing with large-scale noise.
In this paper, we propose a method for reconstructing 3D sequential patterns from multiple images without knowing exact image correspondences and without calibrating linear camera sensitivity parameters on intensity. The sequential pattern is defined as a series of colored 3D points. We assume that the series of the points are obtained in multiple images, but the correspondence of individual points is not known among multiple images. For reconstructing sequential patterns, we consider a camera projection model which combines geometric and photometric information of objects. Furthermore, we consider camera projections in the frequency space. By considering the multi-view relationship on the new projection model, we show that the 3D sequential patterns can be reconstructed without knowing exact correspondence of individual image points in the sequential patterns; moreover, the recovered 3D patterns do not suffer from changes in linear camera sensitivity parameters. The efficiency of the proposed method is tested using real images.
We propose a fully automatic method for detecting the carotid artery from volumetric ultrasound images as a preprocessing stage for building three-dimensional images of the structure of the carotid artery. The proposed detector utilizes support vector machine classifiers to discriminate between carotid artery images and non-carotid artery images using two kinds of LBP-based features. The detector switches between these features depending on the anatomical position along the carotid artery. We evaluate our proposed method using actual clinical cases. Accuracies of detection are 100%, 87.5% and 68.8% for the common carotid artery, internal carotid artery, and external carotid artery sections, respectively.
Breast cancer is a serious disease across the world, and it is one of the largest causes of cancer death for women. The traditional diagnosis is not only time consuming but also easily affected. Hence, artificial intelligence (AI), especially neural networks, has been widely used to assist to detect cancer. However, in recent years, the computational ability of a neuron has attracted more and more attention. The main computational capacity of a neuron is located in the dendrites. In this paper, a novel neuron model with dendritic nonlinearity (NMDN) is proposed to classify breast cancer in the Wisconsin Breast Cancer Database (WBCD). In NMDN, the dendrites possess nonlinearity when realizing the excitatory synapses, inhibitory synapses, constant-1 synapses and constant-0 synapses instead of being simply weighted. Furthermore, the nonlinear interaction among the synapses on a dendrite is defined as a product of the synaptic inputs. The soma adds all of the products of the branches to produce an output. A back-propagation-based learning algorithm is introduced to train the NMDN. The performance of the NMDN is compared with classic back propagation neural networks (BPNNs). Simulation results indicate that NMDN possesses superior capability in terms of the accuracy, convergence rate, stability and area under the ROC curve (AUC). Moreover, regarding ROC, for continuum values, the existing 0-connections branches after evolving can be eliminated from the dendrite morphology to release computational load, but with no influence on the performance of classification. The results disclose that the computational ability of the neuron has been undervalued, and the proposed NMDN can be an interesting choice for medical researchers in further research.
We evaluated software maintenance of an open source cloud platform system we developed using an agile software development method. We previously reported on a rapid service launch using the agile software development method in spite of large-scale development. For this study, we analyzed inquiries and the defect removal efficiency of our recently developed software throughout one-year operation. We found that the defect removal efficiency of our recently developed software was 98%. This indicates that we could achieve sufficient quality in spite of large-scale agile development. In term of maintenance process, we could answer all enquiries within three business days and could conduct version-upgrade fast. Thus, we conclude that software maintenance of agile software development is not ineffective.
In this Letter, we investigate the outage performance of MIMO amplify-and-forward (AF) multihop relay networks with maximum ratio transmission/receiver antenna selection (MRT/RAS) over Nakagami-m fading channels in the presence of co-channel interference (CCI) or not. In particular, the lower bounds for the outage probability of MIMO AF multihop relay networks with/without CCI are derived, which provides an efficient means to evaluate the joint effects of key system parameters, such as the number of antennas, the interfering power, and the severity of channel fading. In addition, the asymptotic behavior of the outage probability is investigated, and the results reveal that the full diversity order can be achieved regardless of CCI. In addition, simulation results are provided to show the correctness of our derived analytical results.
In the localization systems based on time difference of arrival (TDOA), multipath fading and the interference source will deteriorate the localization performance. In response to this situation, TDOA estimation based on blind beamforming is proposed in the frequency domain. An additional constraint condition is designed for blind beamforming based on maximum power collecting (MPC). The relationship between the weight coefficients of the beamformer and TDOA is revealed. According to this relationship, TDOA is estimated by discrete Fourier transform (DFT). The efficiency of the proposed estimator is demonstrated by simulation results.
We present a new framework for embedding holographic halftone watermarking data into images by fusion of scale-related wavelet coefficients. The halftone watermarking image is obtained by using error-diffusion method and converted into Fresnel hologram, which is considered to be the initial password. After encryption, a scrambled watermarking image through Arnold transform is embedded into the host image during the halftoning process. We characterize the multi-scale representation of the original image using the discrete wavelet transform. The boundary information of the target image is fused by correlation of wavelet coefficients across wavelet transform layers to increase the pixel resolution scale. We apply the inter-scale fusion method to gain fusion coefficient of the fine-scale, which takes into account both the detail of the image and approximate information. Using the proposed method, the watermarking information can be embedded into the host image with recovery against the halftoning operation. The experimental results show that the proposed approach provides security and robustness against JPEG compression and different attacks compared to previous alternatives.
An appropriate similarity measure between images is one of the key techniques in search-based image annotation models. In order to capture the nonlinear relationships between visual features and image semantics, many kernel distance metric learning(KML) algorithms have been developed. However, when challenged with large-scale image annotation, their metrics can't explicitly represent the similarity between image semantics, and their algorithms suffer from high computation cost. Therefore, they always lose their efficiency. In this paper, we propose a manifold kernel metric learning (M_KML) algorithm. Our M_KML algorithm will simultaneously learn the manifold structure and the image annotation metrics. The main merit of our M_KML algorithm is that the distance metrics are builded on image feature's interior manifold structure, and the dimensionality reduction on manifold structure can handle the high dimensionality challenge faced by KML. Final experiments verify our method's efficiency and effectiveness by comparing it with state-of-the-art image annotation approaches.
Using traditional single-layer dictionary learning methods, it is difficult to reveal the complex structures hidden in the hyperspectral images. Motivated by deep learning technique, a deep dictionary learning approach is proposed for hyperspectral image denoising, which consists of hierarchical dictionary learning, feature denoising and fine-tuning. Hierarchical dictionary learning is helpful for uncovering the hidden factors in the spectral dimension, and fine-tuning is beneficial for preserving the spectral structure. Experiments demonstrate the effectiveness of the proposed approach.
This paper describes an evaluation of a temporally stable spectral envelope estimator proposed in our past research. The past research demonstrated that the proposed algorithm can synthesize speech that is as natural as the input speech. This paper focuses on an objective comparison, in which the proposed algorithm is compared with two modern estimation algorithms in terms of estimation performance and temporal stability. The results show that the proposed algorithm is superior to the others in both aspects.
In traditional speech emotion recognition systems, when the training and testing utterances are obtained from different corpora, the recognition rates will decrease dramatically. To tackle this problem, in this letter, inspired from the recent developments of sparse coding and transfer learning, a novel sparse transfer learning method is presented for speech emotion recognition. Firstly, a sparse coding algorithm is employed to learn a robust sparse representation of emotional features. Then, a novel sparse transfer learning approach is presented, where the distance between the feature distributions of source and target datasets is considered and used to regularize the objective function of sparse coding. The experimental results demonstrate that, compared with the automatic recognition approach, the proposed method achieves promising improvements on recognition rates and significantly outperforms the classic dimension reduction based transfer learning approach.
Barrel distortion is a critical problem that can hinder the successful application of wide-angle cameras. This letter presents an implementation method for fast correction of the barrel distortion. In the proposed method, the required scaling factor is obtained by interpolating a mapping polynomial with a non-uniform spline instead of calculating it directly, which reduces the number of computations required for the distortion correction. This reduction in the number of computations leads to faster correction while maintaining quality: when compared to the conventional method, the reduction ratio of the correction time is about 89%, and the correction quality is 35.3 dB in terms of the average peak signal-to-noise ratio.
In this letter, we study the R-D properties of independent sources based on MSE and SSIM, and compare the bit allocation performance under the MINAVE and MINMAX criteria in video encoding. The results show that MINMAX has similar results in terms of average distortion with MINAVE by using SSIM, which illustrates the consistency between these two criteria in independent perceptual video coding. Further more, MINMAX results in lower quality fluctuation, which shows its advantage for perceptual video coding.
Texture feature descriptors such as local binary patterns (LBP) have proven effective for ground-based cloud classification. Traditionally, these texture feature descriptors are predefined in a handcrafted way. In this paper, we propose a novel method which automatically learns discriminative features from labeled samples for ground-based cloud classification. Our key idea is to learn these features through mutual information maximization which learns a transformation matrix for local difference vectors of LBP. The experimental results show that our learned features greatly improves the performance of ground-based cloud classification when compared to the other state-of-the-art methods.
The paper proposes an algorithm to expose spliced photographs. Firstly, a graph-based segmentation, which defines a predictor to measure boundary evidence between two neighbor regions, is used to make greedy decision. Then the algorithm gets prediction error image using non-negative linear least-square prediction. For each pair of segmented neighbor regions, the proposed algorithm gathers their statistic features and calculates features of gray level co-occurrence matrix. K-means clustering is applied to create a dictionary, and the vector quantization histogram is taken as the result vector with fixed length. For a tampered image, its noise satisfies Gaussian distribution with zero mean. The proposed method checks the similarity between noise distribution and a zero-mean Gaussian distribution, and follows with the local flatness and texture measurement. Finally, all features are fed to a support vector machine classifier. The algorithm has low computational cost. Experiments show its effectiveness in exposing forgery.
It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.
In this letter, we propose a new semantic parts learning approach to address the object detection problem with only the bounding boxes of object category labels. Our main observation is that even though the appearance and arrangement of object parts might have variations across the instances of different object categories, the constituent parts still maintain geometric consistency. Specifically, we propose a discriminative clustering method with sparse representation refinement to discover the mid-level semantic part set automatically. Then each semantic part detector is learned by the linear SVM in a one-vs-all manner. Finally, we utilize the learned part detectors to score the test image and integrate all the response maps of part detectors to obtain the detection result. The learned class-generic part detectors have the ability to capture the objects across different categories. Experimental results show that the performance of our approach can outperform some recent competing methods.