Access to and sharing of data is essential for biodiversity conservation. However, workers from developing nations that harbor rich biodiversity often do not have access to biodiversity information and often are not keen on making what data they have accessible to others. Open access initiatives offer a great opportunity to make the world's biodiversity information accessible to anyone, at any time and in any place. This article reviews the state of open access in the developing world and argues for the increase of data on biodiversity in the public domain. It makes specific suggestions about how the developing world can reap the benefits of this global S&T movement to better conserve and sustain biotic resources through the creation of a "virtual biodiversity research space".
The number of online resources for biodiversity information is growing. Names of organisms underpin access to information but present a number of unique problems when used as search terms. We examine these problems and assert that a taxonomic name-server or thesaurus is necessary to enable optimal retrieval of records from multiple datasets. A simple solution is presented, based upon our experience working with "real-world" data in the National Biodiversity Network (NBN) in the United Kingdom. The NBN provides access to over 18 million observational records and incorporates a nomenclator covering 198,000 names.
With the increasing importance of computer-based communication technologies, communication networks are becoming crucial in supply chain management. Given the objectives of the supply chain, supply chain management is situated at the intersection of different professional sectors, each of them with its own vocabulary, its own knowledge and rules. This paper provides a review of the main approaches to supply chain communications through the analysis of different ways of modelling a supply chain and the presentation of new semantic-based approaches that have been and are being developed to improve the quality of the information exchanges within the supply chain.
A standardized data schema for material properties in XML is under development to establish a common and exchangeable expression. The next stage toward the management of knowledge about material usage, selection or processing is to define an ontology that represents the structure of concepts related to materials, e.g., definition, classification or properties of material.
Material selection for designing artifacts is a process that translates required material properties into material substances, which in turn requires a definition of data analysis and rules to interpret the result. In this paper, an ontology structure to formalize this kind of process is discussed using an example of the translation of creep property data into design data.
In developed countries, nowadays we live in a networked society: a society of information, knowledge and services (Castells, 1996), with strong specificities in the Health field (Bourret, 2003, Silber, 2003). The World Health Organization (WHO) has outlined the importance of information for improving health for all. However, financial resources remain limited. Health costs represent 11% of GNP in France, Germany, Switzerland and Canada, 14% in the USA, and 7.5% in Spain and the United Kingdom. Governments, local powers, health or insurance organizations therefore face difficult choices in terms of opportunities and priorities, and for that they need specific and valuable data. Firstly, this paper provide a comprehensive overview of our networked society and the appointment of ICT (Information and Communication Technologies) and Health (in other words e-Health) in a perspective of needs and uses at the micro, meso, and macro levels. We point out the main challenges of development of Nationwide Health Information Network both in the US, UK and France. Then we analyze the main issues about data for Decision Making in Networked Health: coordination and evaluation. In the last sections, we use an Information System perspective to investigate the three interoperability layers (micro, meso and macro). We analyze the requirements and challenges to design an interoperability global architecture which supports different kinds of interactions; then we focus on the harmonization efforts provided at several levels. Finally, we identify common methodological and engineering issues.
The 'Berlin Declaration' was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the 'Berlin Declaration' should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community.
Access to information is necessary, but not sufficient in our digital era. The challenge is to objectively integrate digital resources based on user-defined objectives for the purpose of discovering information relationships that facilitate interpretations and decision making. The Antarctic Treaty Searchable Database (http://aspire.nvi.net), which is in its sixth edition, provides an example of digital integration based on the automated generation of information granules that can be dynamically combined to reveal objective relationships within and between digital information resources. This case study further demonstrates that automated granularity and dynamic integration can be accomplished simply by utilizing the inherent structure of the digital information resources. Such information integration is relevant to library and archival programs that require long-term preservation of authentic digital resources.
This study investigates the effect of bias-corrected estimators in analyzing real-world skewed data where categorization and transformation are necessary. It also reports a small-scale simulation study to indicate factors which can influence the bias correction to be small or large. For the complete data-set, it is observed that the maximum likelihood estimates and Schaefer's bias-corrected estimates are not greatly different. However, when the original sample size is reduced by about 50%, the difference between the estimates is found to be much larger, possibly even large enough to influence the conclusions drawn. The impact of transformation and categorization is visibly present. However, the broad impression gained in categorization is the same though difference in types of categorizations can not be overlooked. A factor which seems to influence the size of the bias correction is identified.
Scientific publications written in natural language still play a central role as our knowledge source. However, due to the flood of publications, the literature survey process has become a highly time-consuming and tangled process, especially for novices of the discipline. Therefore, tools supporting the literature-survey process may help the individual scientist to explore new useful domains. Natural language processing (NLP) is expected as one of the promising techniques to retrieve, abstract, and extract knowledge. In this contribution, NLP is firstly applied to the literature of chemical vapor deposition (CVD), which is a sub-discipline of materials science and is a complex and interdisciplinary field of research involving chemists, physicists, engineers, and materials scientists. Causal knowledge extraction from the literature is demonstrated using NLP.
Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. However, the available raw medical data are widely distributed, heterogeneous in nature, and voluminous. These data need to be collected in an organized form. This collected data can be then integrated to form a hospital information system. Data mining technology provides a user-oriented approach to novel and hidden patterns in the data. Data mining and statistics both strive towards discovering patterns and structures in data. Statistics deals with heterogeneous numbers only, whereas data mining deals with heterogeneous fields. We identify a few areas of healthcare where these techniques can be applied to healthcare databases for knowledge discovery. In this paper we briefly examine the impact of data mining techniques, including artificial neural networks, on medical diagnostics.
Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases.
It is always a major demand to provide efficient retrieving and storing of data and information in a large database system. For this purpose, many file organization techniques have already been developed, and much additional research is still going on. Hashing is one developed technique. In this paper we propose an enhanced hashing technique that uses a hash table combined with a binary tree, searching on the binary representation of a portion the primary key of records that is associated with each index of the hash table. The paper contains numerous examples to describe the technique. The technique shows significant improvements in searching, insertion, and deletion for systems with huge amounts of data. The paper also presents the mathematical analysis of the proposed technique and comparative results.
A primary goal of the International Virtual Observatory Alliance, which brings together Virtual Observatory Projects from 16 national and international development projects, is to develop, evaluate, test, and agree upon standards for astronomical data formatting, data discovery, and data delivery. In the three years that the IVOA has been in existence, substantial progress has been made on standards for tabular data, imaging data, spectroscopic data, and large-scale databases and on managing the metadata that describe data collections and data access services. In this paper, I describe how the IVOA operates and give my views as to why such a broadly based international collaboration has been able to make such rapid progress.
The Crystallographic Information File (CIF), owned by the International Union of Crystallography, is a file structure based on tag-value ASCII pairs with tags defined in machine-readable dictionaries. The crystallographic community publishes and archives large quantities of numeric information generated by crystal structure determinations, and CIF's acceptance was assured by its adoption as the submission format for Acta Crystallographica and by the obvious needs of the community. CIF's strength lies in its dictionaries, which define most of the concepts of crystallography; its weakness is the difficulty of writing software that exploits its full potential.
Geography Markup Language (GML) is an XML application that provides a standard way to represent geographic information. GML is developed and maintained by the Open Geospatial Consortium (OGC), which is an international consortium consisting of more than 250 members from industry, government, and university departments. Many of the conceptual models described in the ISO 19100 series of geomatics standards have been implemented in GML, and it is itself en route to becoming an ISO Standard (TC/211 CD 19136). An overview of GML together with its implications for the geospatial web is given in this paper.
In its 2004 report "Data and information", the International Council for Science (ICSU) strongly recommended a new strategic framework for scientific data and information. On an initiative from a working group from the Committee on Data for Science and Technology (CODATA), the German Research Foundation (DFG) has started the project "Publication and Citation of Scientific Primary Data" as part of the program "Information-infrastructure of network -based scientific-cooperation and digital publication" in 2004. Starting with the field of earth science, the German National Library of Science and Technology (TIB) is now established as a registration agency for scientific primary data as a member of the International DOI Foundation (IDF).
This paper discusses MathML, an XML-based standard for expressing mathematics - everything from elementary mathematics to undergraduate college-level mathematics. Limitations of pre-existing options led to the creation of MathML. MathML is designed to be useful for authoring and publishing, for creating online interactive math resources, and as a non-proprietary approach for archiving. MathML is supported by the W3C and by multiple scholarly publishers and vendors of computer-based mathematics software. The question now is whether MathML can achieve greater acceptance among authors and better integration with standards in related domains, either through cross-walks or through direct incorporation into other domain schemas.
Network approaches in Current Research Information Systems support the shift from a document-centric to a data-centric view, which acknowledges the primacy of data in the scientific process. E-science holds the promise of a complete, data-centred documentation of the scientific process.
There is clear demand for a global spatial public domain roads data set with improved geographic and temporal coverage, consistent coding of road types, and clear documentation of sources. The currently best available global public domain product covers only one-quarter to one-third of the existing road networks, and this varies considerably by region. Applications for such a data set span multiple sectors and would be particularly valuable for the international economic development, disaster relief, and biodiversity conservation communities, not to mention national and regional agencies and organizations around the world. The building blocks for such a global product are available for many countries and regions, yet thus far there has been neither strategy nor leadership for developing it. This paper evaluates the best available public domain and commercial data sets, assesses the gaps in global coverage, and proposes a number of strategies for filling them. It also identifies stakeholder organizations with an interest in such a data set that might either provide leadership or funding for its development. It closes with a proposed set of actions to begin the process.