The purpose of our research is to reproduce the appearance of frangible historical ink paintings for preserving frangible historical documents and illustrations. We, then, propose a method to reproduce both reflectance and transmittance of ink paintings simultaneously by stacking multiple sheets of printed paper. First, we acquire the relationship between printed ink patterns and the optical properties. Then, stacking printed multiple papers with acquired ink pattern according to the measurement, we realize to fabricate a photo-realistic duplication.
We propose a method for extracting multi-view images from a light field (plenoptic) camera that accurately handles the physical pixel arrangement of this camera. We use a Lytro Illum camera to obtain 4D light field data (a set of multi-viewpoint images) through a micro-lens array. The light field data are multiplexed on a single image sensor, and thus, the data is first demultiplexed into a set of multi-viewpoint (sub-aperture) images. However, the demultiplexing process usually includes interpolation of the original data such as demosaicing for a color filter array and pixel resampling for the hexagonal pixel arrangement of the original sub-aperture images. If this interpolation is performed, some information is added or lost to/from the original data. In contrast, we preserve the original data as faithfully as possible, and use them directly for the super resolution reconstruction, where the super-resolved image and the corresponding depth map are alternatively refined. We experimentally demonstrate the effectiveness of our method in resolution enhancement through comparisons with Light Field Toolbox and Lytro Desktop Application. Moreover, we also mention another type of light field cameras, a Raytrix camera, and describe how it can be handled to extract high-quality multi-view images.
Due to the need to protect personal information and the impracticality of exhaustive data collection, there is increasing need to deal with datasets with various levels of granularity, such as user-individual data and user-group data. In this study, we propose a new method for jointly analyzing multiple datasets with different granularity. The proposed method is a probabilistic model based on nonnegative matrix factorization, which is derived by introducing latent variables that indicate the high-resolution data underlying the low-resolution data. Experiments on purchase logs show that the proposed method has a better performance than the existing methods. Furthermore, by deriving an extension of the proposed method, we show that the proposed method is a new fundamental approach for analyzing datasets with different granularity.
Malware-infected hosts have typically been detected using network-based Intrusion Detection Systems on the basis of characteristic patterns of HTTP requests collected with dynamic malware analysis. Since attackers continuously modify malicious HTTP requests to evade detection, novel HTTP requests sent from new malware samples need to be exhaustively collected in order to maintain a high detection rate. However, analyzing all new malware samples for a long period is infeasible in a limited amount of time. Therefore, we propose a system for efficiently collecting HTTP requests with dynamic malware analysis. Specifically, our system analyzes a malware sample for a short period and then determines whether the analysis should be continued or suspended. Our system identifies malware samples whose analyses should be continued on the basis of the network behavior in their short-period analyses. To make an accurate determination, we focus on the fact that malware communications resemble natural language from the viewpoint of data structure. We apply the recursive neural network, which has recently exhibited high classification performance in the field of natural language processing, to our proposed system. In the evaluation with 42,856 malware samples, our proposed system collected 94% of novel HTTP requests and reduced analysis time by 82% in comparison with the system that continues all analyses.
To solve the low accuracy problem of the recommender system for long term users, in this paper, we propose a top-N-balanced sequential recommendation based on recurrent neural network. We postulated and verified that the interactions between users and items is time-dependent in the long term, but in the short term, it is time-independent. We balance the top-N recommendation and sequential recommendation to generate a better recommender list by improving the loss function and generation method. The experimental results demonstrate the effectiveness of our method. Compared with a state-of-the-art recommender algorithm, our method clearly improves the performance of the recommendation on hit rate. Besides the improvement of the basic performance, our method can also handle the cold start problem and supply new users with the same quality of service as the old users.
In recent years we have witnessed an increasing demand to process queries on large datasets in Non-ordered Discrete Data Spaces (NDDS). In particular, one type of query in an NDDS, called box queries, is used in many emerging applications including error corrections in bioinformatics and network intrusion detection in cybersecurity. Effective indexing methods are necessary for efficiently processing queries on large datasets in disk. However, most existing NDDS indexing methods were not designed for box queries. Several recent indexing methods developed for box queries on a large NDDS dataset in disk are based on the popular data-partitioning approach. Unfortunately, a space-partitioning based indexing scheme, which is more effective for box queries in an NDDS, has not been studied before. In this paper, we propose a novel indexing method based on space-partitioning, called the BINDS-tree, for supporting efficient box queries on a large NDDS dataset in disk. A number of effective strategies such as node split based on minimum span and cross optimal balance, redundancy reduction utilizing a singleton dimension inheritance property, and a space-efficient structure for the split history are incorporated in the constructing algorithm for the BINDS-tree. Experimental results demonstrate that the proposed BINDS-tree significantly improves the box query I/O performance, comparing to that of the state-of-the-artdata-partitioning based NDDS indexing method.
There are different types of social ties among people, and recognizing specialized types of relationship, such as family or friend, has important significance. It can be applied to personal credit, criminal investigation, anti-terrorism and many other business scenarios. So far, some machine learning algorithms have been used to establish social relationship inferencing models, such as Decision Tree, Support Vector Machine, Naive Bayesian and so on. Although these algorithms discover family members in some context, they still suffer from low accuracy, parameter sensitive, and weak robustness. In this work, we develop a Novel Family Relationship Recognition (NFRR) algorithm on telecom dataset for identifying one's family members from its contact list. In telecom dataset, all attributes are divided into three series, temporal, spatial and behavioral. First, we discover the most probable places of residence and workplace by statistical models, then we aggregate data and select the top-ranked contacts as the user's intimate contacts. Next, we establish Relational Spectrum Matrix (RSM) of each user and its intimate contacts to form communication feature. Then we search the user's nearest neighbors in labelled training set and generate its Specialized Family Spectrum (SFS). Finally, we decide family relationship by comparing the similarity between RSM of intimate contacts and the SFS. We conduct complete experiments to exhibit effectiveness of the proposed algorithm, and experimental results also show that it has a lower complexity.
In the library, recognizing the activity of the reader can better uncover the reading habit of the reader and make book management more convenient. In this study, we present the design and implementation of a reading activity recognition approach based on passive RFID tags. By collecting and analyzing the phase profiling distribution feature, our approach can trace the reader's trajectory, recognize which book is picked up, and detect the book misplacement. We give a detailed analysis of the factors that can affect phase profiling in theory and combine these factors with relevant activities. The proposed approach recognizes the activities based on the amplitude of the variation of phase profiling, so that the activities can be inferred in real time through the phase monitoring of tags. We then implement our approach with off-the-shelf RFID equipment, and the experiments show that our approach can achieve high accuracy and efficiency in activity recognition in a real-world situation. We conclude our work and further discuss the necessity of a personalized book recommendation system in future libraries.
We propose a method to find assembly models contained in another assembly model given as a query from a set of 3D CAD assembly models. A 3D CAD assembly model consists of multiple components and is constructed using a 3D CAD software. The proposed method distinguishes assembly models which consist of a subset of components constituting the query model and also whose components have the same layout as the subset of the components. We compute difference between the shapes and the layouts of the components from the sinograms which are constructed by the Radon transform of their projections from various angles. We evaluate the proposed method experimentally using the assembly models which we prepare as a benchmark. The proposed method can also be used to find the database models which contains a query model.
As big data attracts attention in a variety of fields, research on data exploration for analyzing large-scale scientific data has gained popularity. To support exploratory analysis of scientific data, effective summarization and visualization of the target data as well as seamless cooperation with modern data management systems are in demand. In this paper, we focus on the exploration-based analysis of scientific array data, and define a spatial V-Optimal histogram to summarize it based on the notion of histograms in the database research area. We propose histogram construction approaches based on a general hierarchical partitioning as well as a more specific one, the l-grid partitioning, for effective and efficient data visualization in scientific data analysis. In addition, we implement the proposed algorithms on the state-of-the-art array DBMS, which is appropriate to process and manage scientific data. Experiments are conducted using massive evacuation simulation data in tsunami disasters, real taxi data as well as synthetic data, to verify the effectiveness and efficiency of our methods.
In the current era of data science, data quality has a significant and critical impact on business operations. This is no different for the meteorological data encountered in the field of meteorology. However, the conventional methods of meteorological data quality control mainly focus on error detection and null-value detection; that is, they only consider the results of the data output but ignore the quality problems that may also arise in the workflow. To rectify this issue, this paper proposes the Total Meteorological Data Quality (TMDQ) framework based on the Total Quality Management (TQM) perspective, especially considering the systematic nature of data warehousing and process focus needs. In practical applications, this paper uses the proposed framework as the basis for the development of a system to help meteorological observers improve and maintain the quality of meteorological data in a timely and efficient manner. To verify the feasibility of the proposed framework and demonstrate its capabilities and usage, it was implemented in the Tamsui Meteorological Observatory (TMO) in Taiwan. The four quality dimension indicators established through the proposed framework will help meteorological observers grasp the various characteristics of meteorological data from different aspects. The application and research limitations of the proposed framework are discussed and possible directions for future research are presented.
Multi-attributed graphs, in which each node is characterized by multiple types of attributes, are ubiquitous in the real world. Detection and characterization of communities of nodes could have a significant impact on various applications. Although previous studies have attempted to tackle this task, it is still challenging due to difficulties in the integration of graph structures with multiple attributes and the presence of noises in the graphs. Therefore, in this study, we have focused on clusters of attribute values and strong correlations between communities and attribute-value clusters. The graph clustering methodology adopted in the proposed study involves Community detection, Attribute-value clustering, and deriving Relationships between communities and attribute-value clusters (CAR for short). Based on these concepts, the proposed multi-attributed graph clustering is modeled as CAR-clustering. To achieve CAR-clustering, a novel algorithm named CARNMF is developed based on non-negative matrix factorization (NMF) that can detect CAR in a cooperative manner. Results obtained from experiments using real-world datasets show that the CARNMF can detect communities and attribute-value clusters more accurately than existing comparable methods. Furthermore, clustering results obtained using the CARNMF indicate that CARNMF can successfully detect informative communities with meaningful semantic descriptions through correlations between communities and attribute-value clusters.
This study presents a joint dictionary learning approach for speech emotion recognition named locality preserved joint nonnegative matrix factorization (LP-JNMF). The learned representations are shared between the learned dictionaries and annotation matrix. Moreover, a locality penalty term is incorporated into the objective function. Thus, the system's discriminability is further improved.
Given a graph G=(V,E), where V and E are vertex and edge sets of G, and a subset VNT of vertices called a non-terminalset, the minimumspanningtreewithanon-terminalsetVNT, denoted by MSTNT, is a connected and acyclic spanning subgraph of G that contains all vertices of V with the minimum weight where each vertex in a non-terminal set is not a leaf. On general graphs, the problem of finding an MSTNT of G is NP-hard. We show that if G is a series-parallel graph then finding an MSTNT of G is linearly solvable with respect to the number of vertices.
Most recent work used raw electroencephalograph (EEG) data to train deep learning (DL) models, with the assumption that DL models can learn discriminative features by itself. It is not yet clear what kind of RSVP specific features can be selected and combined with EEG raw data to improve the RSVP classification performance of DL models. In this paper, we tried to extract RSVP specific features and combined them with EEG raw data to capture more spatial and temporal correlations of target or non-target event and improve the EEG-based RSVP target detection performance. We tested on X2 Expertise RSVP dataset to show the experiment results. We conducted detailed performance evaluations among different features and feature combinations with traditional classification models and different CNN models for within-subject and cross-subject test. Compared with state-of-the-art traditional Bagging Tree (BT) and Bayesian Linear Discriminant Analysis (BLDA) classifiers, our proposed combined features with CNN models achieved 1.1% better performance in within-subject test and 2% better performance in cross-subject test. This shed light on the ability for the combined features to be an efficient tool in RSVP target detection with deep learning models and thus improved the performance of RSVP target detection.
The linguistic Multi-Criteria Group Decision-Making (MCGDM) problem involves various types of uncertainties. To deal with this problem, a new linguistic MCGDM method combining cloud model and evidence theory is thus proposed. Cloud model is firstly used to handle the fuzziness and randomness of the linguistic concept, by taking both the average level and fluctuation degree of the linguistic concept into consideration. Hence, a method is presented to transform linguistic variables into clouds, and then an asymmetrical weighted synthetic cloud is proposed to aggregate the clouds of decision makers on each criterion. Moreover, evidence theory is used to handle the imprecision and incompleteness of the group assessment, with the belief degree and the ignorance degree. Hence, the conversion from the cloud to the belief degree is investigated, and then the evidential reasoning algorithm is adopted to aggregate the criteria values. Finally, the average utility is applied to rank the alternatives. A numerical example, which is given to confirm the validity and feasibility, also shows that the proposed method can take advantage of cloud model and evidence theory to efficiently deal with the uncertainties caused by both the linguistic concept and group assessment.
We examine the feasibility of Deutsch-Jozsa Algorithm, a basic quantum algorithm, on a machine learning-based logistic regression problem. Its major property to distinguish the function type with an exponential speedup can help identify the feature unsuitability much more quickly. Although strict conditions and restrictions to abide exist, we reconfirm the quantum superiority in many aspects of modern computing.
In this letter, a flexible and compatible with fine resolution radar frequency measurement receiver is designed. The receiver is implemented on the platform of Virtex-5 Field Programmable Grid Array (FPGA) from Xilinx. The Digital Down Conversion (DDC) without mixer based on polyphase filter has been successfully introduced in this receiver to obtain lower speed data flow and better resolution. This receiver can adapt to more modulation types and higher density of pulse flow, up to 200000 pulses per second. The measurement results indicate that the receiver is capable of detecting radar pulse signal of 0.2us to 2.5ms width with a major frequency root mean square error (RMSE) within 0.44MHz. Moreover, the wider pulse width and the higher decimation rate of DDC result in better performance. This frequency measurement receiver has been successfully used in a spaceborne radar system.
A spectrum-based fault localization technique (SBFL), which identifies fault location(s) in a buggy program by comparing the execution statistics of the program spectra of passed executions and failed executions, is a popular automatic debugging technique. However, the usefulness of SBFL is mainly affected by the following two factors: accuracy and fault understanding in reality. To solve this issue, we propose a SBFL framework to support fault understanding. In the framework, we firstly localize a suspicious fault module to start debugging and then generate a weighted fault propagation graph (WFPG) for the hypothesis fault module, which weights the suspiciousness for the nodes to further perform block-level fault localization. In order to evaluate the proposed framework, we conduct a controlled experiment to compare two different module-level SBFL approaches and validate the effectiveness of WFPG. According to our preliminary experiments, the results are promising.
Very few existing works about inertial sensor based air-writing focused on writing constraints' effects on recognition performance. We proposed a LSTM-based system and made several quantitative analyses under different constraints settings against CHMM, DTW-AP and CNN. The proposed system shows its advantages in accuracy, real-time performance and flexibility.
This letter proposes a fast superpixel segmentation method based on boundary sampling and interpolation. The basic idea is as follow: instead of labeling local region pixels, we estimate superpixel boundary by interpolating candidate boundary pixel from a down-sampling image segmentation. On the one hand, there exists high spatial redundancy within each local region, which could be discarded. On the other hand, we estimate the labels of candidate boundary pixels via sampling superpixel boundary within corresponding neighbour. Benefiting from the reduction of candidate pixel distance calculation, the proposed method significantly accelerates superpixel segmentation. Experiments on BSD500 benchmark demonstrate that our method needs half the time compared with the state-of-the-arts while almost no accuracy reduction.