Industrial and scientific datasets have been growing enormously in size and complexity in recent years. The largest transactional databases and data warehouses can no longer be hosted cost-effectively in off-the-shelf commercial database management systems. There are other forums for discussing databases and data warehouses, but they typically deal with problems occurring at smaller scales and do not always focus on practical solutions or influencing DBMS vendors. Given the relatively small (but highly influential and growing) number of users with these databases and the relatively small number of opportunities to exchange practical information related to DBMSes at extremely large scale, a workshop on extremely large databases was organized. This paper is the final report of the discussions and activities at the workshop.
Patent information is a derivative product from the legal patent system. This information, which includes patent applications, patent descriptions, patent gazettes, patent abstracts, and patent data, is prepared in exact compliance with the regulations and specifications of the patent acts. Patent information, different from other published circulating information, is legally well protected. For convenience, this study classifies patent information into bibliographic and numeric data to create a patent map.
This research investigates the applicability of Davis's Technology Acceptance Model (TAM) to agriculturist's acceptance of a knowledge management system (KMS), developed by the authors. It is called AGROWIT. Although the authors used previous Technology Acceptance Model user acceptance research as a basis for investigation of user acceptance of AGROWIT, the model had to be extended and constructs from the Triandis model that were added increased the predictive results of the TAM, but only slightly. Relationships among primary TAM constructs used are in substantive agreement with those characteristic of previous TAM research. Significant positive relationships between perceived usefulness, ease of use, and system usage were consistent with previous TAM research. The observed mediating role of perceived usefulness in the relationship between ease of use and usage was also in consonance with earlier findings. The findings are significant because they suggest that the considerable body of previous TAM-related information technology research may be usefully applied to the knowledge management domain to promote further investigation of factors affecting the acceptance and usage of knowledge management information systems such as AGROWIT by farmers, extension workers, and agriculture researchers.
With the development of computer graphics and digitalizing technologies, 3D model databases are becoming ubiquitous. This paper presents a method for content-based searching for similar 3D models in databases. To assess the similarity between 3D models, shape feature information of models must be extracted and compared. We propose a new 3D shape feature extraction algorithm. Experimental results show that the proposed method achieves good retrieval performance with short computation time.
By distinguishing nested attributes as Decomposable and Non-Decomposable, it is proved that for all nested relations, unnesting and then renesting on the same attribute yields the original relation subject only to the elimination of duplicate data. Therefore, the statement that was popular in nested relations research: "Unnesting and then nesting on the same attribute of a nested relation does not always yield the original relation" is reconsidered.
In this paper, we have obtained the Bayes Estimator of Generalized-Exponential scale and shape parameter using Lindley's approximation (L-approximation) under asymmetric loss functions. The proposed estimators have been compared with the corresponding MLE for their risks based on simulated samples from the Generalized-Exponential distribution.
This work proposes a data mining algorithm called Unordered Rule Sets using a continuous Ant-Miner algorithm. The goal of this work is to extract classification rules from data. Swarm intelligence (SI) is a technique whereby rules may be discovered through the study of collective behavior in decentralized, self-organized systems, such as ants. The Ant-Miner algorithm, first proposed by Parpinelli and his colleagues (2002), applies an ant colony optimization (ACO) heuristic to the classification task of data mining to discover an ordered list of classification rules. Ant-Miner is a rule-induction algorithm that uses SI techniques to form rules. Ant-Miner uses a discretization process to deal with continuous attributes in the data. Discretization transforms numeric attributes into nominal attributes. Discretization may suffer from a loss of information, as the real relationship underlying individual values of a numeric attribute is unknown. The objective of this work is to apply ACO heuristic techniques to discover unordered rule sets for mixed variables in a data set. The proposed algorithm handles both nominal and continuous attributes using multimodal functions. It has the advantage of discovering more modular rules, i.e., rules that can be interpreted independently from other rules - unlike the rules in an ordered list, where the interpretation of a rule requires knowledge of the previous rules in the list. The results provide evidence that the accuracy of the Unordered Rule Set Continuous Ant-Miner algorithm is competitive with other Ant-Miner versions and generates simpler rule sets.
A mini-workshop with representatives from the data-driven science and database research communities was organized in response to suggestions at the first XLDB Workshop. The goal was to develop common requirements and primitives for a next-generation database management system that scientists would use, including those from high-energy physics, astronomy, biology, geoscience and fusion, in order to stimulate research and advance technology. These requirements were thought by the database researchers to be novel and unlikely to be fully met by current commercial vendors. The two groups accordingly decided to explore building a new open source DBMS. This paper is the final report of the discussions and activities at the workshop
This paper describes an approach to visualizing concurrency control (CC) algorithms for real-time database systems (RTDBs). This approach is based on the principle of software visualization, which has been applied in related fields. The Model-View-controller (MVC) architecture is used to alleviate the black box syndrome associated with the study of algorithm behaviour for RTDBs Concurrency Controls. We propose a Visualization "exploratory" tool that assists the RTDBS designer in understanding the actual behaviour of the concurrency control algorithms of choice and also in evaluating the performance quality of the algorithm. We demonstrate the feasibility of our approach using an optimistic concurrency control model as our case study. The developed tool substantiates the earlier simulation-based performance studies by exposing spikes at some points when visualized dynamically that are not observed using usual static graphs. Eventually this tool helps solve the problem of contradictory assumptions of CC in RTDBs.
This paper provides the Bayes estimators of the failure rate and reliability function for a one-parameter, exponential distribution by utilizing a point guess estimate of the parameter. For deriving the Bayes estimators, the prior distributions are chosen such that they are centered at the known prior values of parameters. The validity of proposed estimators is examined with respect to their maximum likelihood estimators (MLE) and Thompson's Shrinkage estimator on the basis of Monte Carlo simulations of 1000 samples.
On March 4-5, 2008, the CODATA Task Group for Exchangeable Material Data Representation to Support Research and Education held a two day seminar cum meeting at the National Physical Laboratory (NPL), New Delhi, India, with NPL materials researchers and task group members representing material activities and databases from seven countries: European Union (The Czech Republic, France, and the Netherlands), India, Korea, Japan, and the United States. The NPL seminar included presentations about the researchers' work. The Task Group meeting included presentations about current data related activities of the members. Joint discussions between NPL researchers and CODATA task group members began an exchange of viewpoints among materials data producers, users, and databases developers. The seminar cum meeting included plans to continue and expand Task Group activities at the 2008 CODATA 21st Meeting in Kyiv, Ukraine.
.In the present article, some shrinkage testimators for the scale parameter of a two-parameter Weibull life testing model have been suggested under the LINEX loss function assuming the shape parameter is to be known. The comparisons of the proposed testimators have been made with the improved estimator.
In an e-Science environment, large-scale distributed resources in autonomous domains are aggregated by unified collaborative platforms to support scientific research across organizational boundaries. In order to enhance the scalability of access management, an integrated approach for decentralizing the task from resource owners to administrators on the platform is needed. We propose an extensible access management framework to meet this requirement by supporting an administrative delegation policy. This feature allows administrators on the platform to make new policies based on the original policies made by resources owners. An access protocol that merges SAML and XACML is also included in the framework. It defines how distributed parties operate with each other to make decentralized authorization decisions.
Bayes' estimators of the traffic intensity r and various queue characteristics in an M/M/1 queue have been derived under the assumptions of different priors for r and the quadratic error loss function (QELF). Finally, a numerical example is given to illustrate the results
Some aspects of financial tools for countering climate change under flexible Kyoto mechanisms are studied. Within industry sectors and production processes, data of National GG Cadastre (period 1998 ? 2005) on energy consumption and GG emissions are processed by means of an information-analytical system constructed on the Microstrategy platform. Analysis of the rating of the industrial sectors relative to saved emission allowances enables distributing investment financial flows toward development of innovative technologies with respect to the estimated contribution of each industrial sector to the emission allowances total for the country.
In this paper, we calculate the dependencies of spin wave reflection intensity on frequency and external magnetic field for a ferrogarnet structure in an exchange mode, for which the influence of the magnetostatic part of the energy is neglected as compared with the exchange part. A ferrogarnet structure is chosen because it has a very small damping parameter and provides high-quality transmission of data.
In traditional risk evaluation, the weight of a risk index is given in advance, so it lacks objectivity. Using weights and properties generated by entropy concepts, including the idea of information entropy, the comprehensive weight, which can be combined with entropy weight, is calculated. A grey evaluation model of a project risk evaluation index based on comprehensive entropy weight is built. Further, we present empirical research on a real project, which indicates that this approach calculates easily, gives weight scientifically, and provides evaluation accurately.
Utilization of XML techniques is seen as a necessary step towards more powerful ways of incorporating semantics into data exchange used by heterogeneous systems. In this paper various techniques are studied and tried, such as XSL transformations (XSLT) and ways of extending the contents of XML Schemas, the final aim being in creating an understanding of the possibilities, and a roadmap that could possibly lead to some useful real-world applications. Based on a materials database an XML Schema is specified that defines the structure of an XML document capable of representing quite complex materials test data together with mandatory metadata. Some approaches are discussed and some of them implemented in prototypes to study the possibilities to comply with and use MatML in order to support sharing of experimentally measured materials data.
The complexity and sophistication of large scale analytics in science and industry have advanced dramatically in recent years. Analysts are struggling to use complex techniques such as time series analysis and classification algorithms because their familiar, powerful tools are not scalable and cannot effectively use scalable database systems. The 2nd Extremely Large Databases (XLDB) workshop was organized to understand these issues, examine their implications, and brainstorm possible solutions. The design of a new open source science database, SciDB that emerged from the first workshop in this series was also debated. This paper is the final report of the discussions and activities at this workshop.