This paper discusses the usability of convertibility, a principle for data quality used by the Xplain-DBMS. Convertibility (uniqueness) of type definitions is a helpful criterion for database design, whereas convertibility of instances is a criterion for the uniqueness of instances (records). However, in many situations with or without generalization/specialization, convertibility appears to be an insufficient criterion for correctness of instances, which is illustrated by many examples. In order to be able to specify more rigorous rules for correctness of instances we propose to use new concepts such as 'identifying property'. These new concepts also facilitate the transformation of relational databases into Xplain databases.
The 5th XLDB workshop brought together scientific and industrial users, developers, and researchers of extremely large data and focused on emerging challenges in the healthcare and genomics communities, spreadsheet-based large scale analysis, and challenges in applying statistics to large scale analysis, including machine learning. Major problems discussed were the lack of scalable applications, the lack of expertise in developing solutions, the lack of respect for or attention to big data problems, data volume growth exceeding Moore's Law, poorly scaling algorithms, and poor data quality and integration. More communication between users, developers, and researchers is sorely needed. A variety of future work to help all three groups was discussed, ranging from collecting challenge problems to connecting with particular industrial or academic sectors.
We document the history and progress of two international ocean data management projects. The "Global Oceanographic Data Archaeology and Rescue" project was initiated in 1993 under the auspices of the UNESCO Intergovernmental Oceanographic Commission (IOC). The project has the goal of locating (archaeology) and digitizing or copying to modern electronic media (rescuing) historical (pre-1992) oceanographic data that exist in manuscript or electronic media form that are at risk of loss due to media decay. The IOC "World Ocean Database" project initiated in 2001 focuses on encouraging international data exchange for the post-1991 period and the development of regional atlases.
In this paper, we consider the Bayesian estimation of parameters in the proportional hazards model of random censorship for the Weibull distribution under different asymmetric loss functions. It is well-known for the Weibull distribution that a joint conjugate prior on the parameters does not exist; we use both the informative and noninformative priors on the model parameters. Bayes estimates under LINEX and general entropy loss functions are obtained using the Gibbs sampling scheme. A simulation study is carried out to observe the behavior of the proposed estimators for different sample sizes and for different censoring parameters. It is observed that the Bayes estimators under LINEX and general entropy loss functions can be used effectively with the appropriate choice of respective loss function parameters. One real data set is analyzed for illustrative purposes.
In a research project funded by the German Research Foundation, meteorologists, data publication experts, and computer scientists optimised the publication process of meteorological data and developed software that supports metadata review. The project group placed particular emphasis on scientific and technical quality assurance of primary data and metadata. At the end, the software automatically registers a Digital Object Identifier at DataCite. The software has been successfully integrated into the infrastructure of the World Data Center for Climate, but a key objective was to make the results applicable to data publication processes in other sciences as well.
Patent network analysis, an advanced method of patent analysis, is a useful tool for technology management. This method visually displays all the relationships among the patents and enables the analysts to intuitively comprehend the overview of a set of patents in the field of the technology being studied. Although patent network analysis possesses relative advantages different from traditional methods of patent analysis, it is subject to several crucial limitations. To overcome the drawbacks of the current method, this study proposes a novel patent analysis method, called the intelligent patent network analysis method, to make a visual network with great precision. Based on artificial intelligence techniques, the proposed method provides an automated procedure for searching patent documents, extracting patent keywords, and determining the weight of each patent keyword in order to generate a sophisticated visualization of the patent network. This study proposes a detailed procedure for generating an intelligent patent network that is helpful for improving the efficiency and quality of patent analysis. Furthermore, patents in the field of Carbon Nanotube Backlight Unit (CNT-BLU) were analyzed to verify the utility of the proposed method.
This article is devoted to general problems of development of reference data on properties of nanosized objects. It has been shown that the peculiar features of physical characteristics of nanostructures influence the behavior of an expert engaged in building the relevant computer database of property data. The building procedure includes comprehensive data systematization on the basis of classification of nanostructures and detailed identification of a nano-inherent object within the selected class. The key features of data on nanosized objects are discussed, including variation of property nomenclature, dimensional effects, and a high level of data uncertainty. The approaches to data systematization proposed in the article are considered in terms of ISO recommendations. Along with systematization, we propose a procedure for data certification taking into account a quantitative statement of uncertainty as well as quality indicators. The latter indications address the completeness of the description of both an object and a measurement method as well as the reproducibility of results. As an example, property data of carbon nanoforms (nanotubes, graphene, etc.) are analyzed.
Tackling the global challenges relating to health, poverty, business, and the environment is heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination, and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised, information society remains digitally divided. On the African continent in particular, this division has resulted in a gap between the knowledge generation and its transformation into tangible products and services. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the people's quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi-disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability. The paper's conclusions include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies among the private sector, academic, and research institutions within and among countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the people's overall quality of life. To avoid running high implementation costs, selected open source tools are recommended for developing and sustaining the system.
In most scientific fields, significant improvements have been made in terms of data sharing among scientists and researchers. Although there are clear benefits to data sharing, there is at least one field where this norm has yet to be developed: the behavioural sciences. In this paper, we propose an innovative methodology as a means to change existing norms within the behavioural sciences and move towards increased data sharing. Based on recent advances in social psychology, we theorize that a Survey Research Instrument that takes into account basic psychological processes can be effective in promoting data sharing norms.
Fundamental in building any materials database is the capability to describe the materials whose data are contained therein accurately. While many systems exist for describing traditional materials, such as metals, polymers, ceramics, and others, the evolving field of nanotechnology presents new challenges. In this paper, we define the goals of a materials description system and the information categories used to describe traditional materials. We then discuss the challenges presented by materials on the nanoscale and suggest ways of overcoming those challenges.
Several challenges are involved in developing and maintaining materials property databases, including improvements in measurement procedures, the changing nature of materials, access to proprietary data on new materials, and the need for quality evaluation. In this paper we discuss each of these issues and their impact on the availability of high quality material property data, using ceramics as an example material.
With ICT Standards playing a key role in support of research and development in many disciplines, the European Commission Institute for Energy and Transport is keen to promote the development and adoption of ICT Standards for engineering data. In this respect, its MatDB Online facility is a Standards-based system for preserving, managing, and exchanging engineering materials test data. While MatDB Online has evolved over more than 30 years to incorporate the latest innovations in data preservation and exchange, such as XML-based data transfer and data citation using digital object identifiers, it continues to rely on a robust data model developed more than 30 years ago through the joint efforts of the National Research Institute for Metals (the predecessor to NIMS, the National Institute for Materials Science), the European Commission Joint Research Centre, and the National Institute of Standards and Technology. While this data model has endured over many years, there is no corresponding Standard. Similarly, related efforts by the engineering materials community to deliver a Standard representation for engineering materials, such as MatML, have failed to be ratified. In consequence of the continued absence of a Standard representation for engineering materials data, there is no common mechanism for preserving and exchanging materials data and no formal means of maintaining a data model to support advances in materials technology, such as the emergence of nanomaterials. It is for these reasons that the European Commission Institute for Energy and Transport is supporting SERES, a CEN Workshop on Standards for Electronic Reporting in the Engineering Sector. As one of more than thirty organisations supporting the SERES Workshop, the Institute for Energy and Transport will make the MatDB XML schema available as one of several resources that will be taken into consideration when the prenormative Standard for representing engineering materials data is formulated. With the participation of the Institute for Energy and Transport in the SERES Workshop taking place in parallel with a related project with Oak Ridge National Laboratory, there is good reason to expect that a Standard representation for engineering materials, which has so far eluded the materials community, will be realised. This paper describes MatDB support for engineering materials Standards and related innovative features.
Many relationships between parameters and physical properties in materials science and engineering are represented as mathematical expressions, such as empirical equations and regression expressions. Some materials databases handle such information with indirect methods: as a table of sets of parameters, as a list of statements of programming languages, and other ways. There is no standardized way to represent mathematical relationships, and that makes it difficult to exchange, process, and display such information.
The AIST (National Institute of Advanced Industrial Science and Technology in Japan) thermophysical property database manages sets of parameter values for expressions and Fortran statements that represent relationships between physical parameters, e.g., temperature, pressure, etc. and thermophysical properties. However, in this method, it is not easy to add new parameters, to process expressions, and exchange information with other software tools. In this paper, we describe the current implementation of representing mathematical knowledge in the AIST thermophysical property database, and we also discuss its problems, sample implementations, and definitions of the OpenMath content dictionary for materials science and engineering.
The Semantic Web is a W3C approach that integrates the different sources of semantics within documents and services using ontology-based techniques. The main objective of this approach in the geoscience domain is the improvement of understanding, integration, and usage of Earth and space science related web content in terms of data, information, and knowledge for machines and people. The modeling and representation of semantic attributes and relations within and among documents can be realized by human readable concept maps and machine readable OWL documents. The objectives for the usage of the Semantic Web approach in the GFZ data center ISDC project are the design of an extended classification of metadata documents for product types related to instruments, platforms, and projects as well as the integration of different types of metadata related to data product providers, users, and data centers. Sources of content and semantics for the description of Earth and space science product types and related classes are standardized metadata documents (e.g., DIF documents), publications, grey literature, and Web pages. Other sources are information provided by users, such as tagging data and social navigation information. The integration of controlled vocabularies as well as folksonomies plays an important role in the design of well formed ontologies.