Data Science Journal
Online ISSN : 1683-1470
5 巻
選択された号の論文の22件中1~22を表示しています
Contents of Volume 5, 2006
  • Jitendra Gaikwad, Vishwas Chavan
    2006 年 5 巻 p. 1-17
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    Access to and sharing of data is essential for biodiversity conservation. However, workers from developing nations that harbor rich biodiversity often do not have access to biodiversity information and often are not keen on making what data they have accessible to others. Open access initiatives offer a great opportunity to make the world's biodiversity information accessible to anyone, at any time and in any place. This article reviews the state of open access in the developing world and argues for the increase of data on biodiversity in the public domain. It makes specific suggestions about how the developing world can reap the benefits of this global S&T movement to better conserve and sustain biotic resources through the creation of a "virtual biodiversity research space".
  • C Hussey, S Wilkinson, J Tweddle
    2006 年 5 巻 p. 18-28
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    The number of online resources for biodiversity information is growing. Names of organisms underpin access to information but present a number of unique problems when used as search terms. We examine these problems and assert that a taxonomic name-server or thesaurus is necessary to enable optimal retrieval of records from multiple datasets. A simple solution is presented, based upon our experience working with "real-world" data in the National Biodiversity Network (NBN) in the United Kingdom. The NBN provides access to over 18 million observational records and incorporates a nomenclator covering 198,000 names.
  • A F Cutting-Decelle, B P Das, R I Young, K Case, S Rahimifard, C J Anu ...
    2006 年 5 巻 p. 29-51
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    With the increasing importance of computer-based communication technologies, communication networks are becoming crucial in supply chain management. Given the objectives of the supply chain, supply chain management is situated at the intersection of different professional sectors, each of them with its own vocabulary, its own knowledge and rules. This paper provides a review of the main approaches to supply chain communications through the analysis of different ways of modelling a supply chain and the presentation of new semantic-based approaches that have been and are being developed to improve the quality of the information exchanges within the supply chain.
  • Toshihiro Ashino, Mitsutane Fujita
    2006 年 5 巻 p. 52-63
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    A standardized data schema for material properties in XML is under development to establish a common and exchangeable expression. The next stage toward the management of knowledge about material usage, selection or processing is to define an ontology that represents the structure of concepts related to materials, e.g., definition, classification or properties of material.

    Material selection for designing artifacts is a process that translates required material properties into material substances, which in turn requires a definition of data analysis and rules to interpret the result. In this paper, an ontology structure to formalize this kind of process is discussed using an example of the translation of creep property data into design data.
  • Christian Bourret, Gabriella Salzano
    2006 年 5 巻 p. 64-78
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    In developed countries, nowadays we live in a networked society: a society of information, knowledge and services (Castells, 1996), with strong specificities in the Health field (Bourret, 2003, Silber, 2003). The World Health Organization (WHO) has outlined the importance of information for improving health for all. However, financial resources remain limited. Health costs represent 11% of GNP in France, Germany, Switzerland and Canada, 14% in the USA, and 7.5% in Spain and the United Kingdom. Governments, local powers, health or insurance organizations therefore face difficult choices in terms of opportunities and priorities, and for that they need specific and valuable data. Firstly, this paper provide a comprehensive overview of our networked society and the appointment of ICT (Information and Communication Technologies) and Health (in other words e-Health) in a perspective of needs and uses at the micro, meso, and macro levels. We point out the main challenges of development of Nationwide Health Information Network both in the US, UK and France. Then we analyze the main issues about data for Decision Making in Networked Health: coordination and evaluation. In the last sections, we use an Information System perspective to investigate the three interoperability layers (micro, meso and macro). We analyze the requirements and challenges to design an interoperability global architecture which supports different kinds of interactions; then we focus on the harmonization efforts provided at several levels. Finally, we identify common methodological and engineering issues.
  • Jens Klump, Roland Bertelmann, Jan Brase, Michael Diepenbroek, Hannes ...
    2006 年 5 巻 p. 79-83
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    The 'Berlin Declaration' was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the 'Berlin Declaration' should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community.
  • Paul Athur Berkman, George James Morgan III, Reagan Moore, Babak Hamid ...
    2006 年 5 巻 p. 84-99
    発行日: 2006年
    公開日: 2006/06/28
    ジャーナル フリー
    Access to information is necessary, but not sufficient in our digital era. The challenge is to objectively integrate digital resources based on user-defined objectives for the purpose of discovering information relationships that facilitate interpretations and decision making. The Antarctic Treaty Searchable Database (http://aspire.nvi.net), which is in its sixth edition, provides an example of digital integration based on the automated generation of information granules that can be dynamically combined to reveal objective relationships within and between digital information resources. This case study further demonstrates that automated granularity and dynamic integration can be accomplished simply by utilizing the inherent structure of the digital information resources. Such information integration is relevant to library and archival programs that require long-term preservation of authentic digital resources.
  • M.A. Matin
    2006 年 5 巻 p. 100-107
    発行日: 2006年
    公開日: 2006/07/06
    ジャーナル フリー
    This study investigates the effect of bias-corrected estimators in analyzing real-world skewed data where categorization and transformation are necessary. It also reports a small-scale simulation study to indicate factors which can influence the bias correction to be small or large. For the complete data-set, it is observed that the maximum likelihood estimates and Schaefer's bias-corrected estimates are not greatly different. However, when the original sample size is reduced by about 50%, the difference between the estimates is found to be much larger, possibly even large enough to influence the conclusions drawn. The impact of transformation and categorization is visibly present. However, the broad impression gained in categorization is the same though difference in types of categorizations can not be overlooked. A factor which seems to influence the size of the bias correction is identified.
  • Yuya Kajikawa, Yoshihide Sugiyama, Hideki Mima, Katsumori Matsushima
    2006 年 5 巻 p. 108-118
    発行日: 2006年
    公開日: 2006/11/28
    ジャーナル フリー
    Scientific publications written in natural language still play a central role as our knowledge source. However, due to the flood of publications, the literature survey process has become a highly time-consuming and tangled process, especially for novices of the discipline. Therefore, tools supporting the literature-survey process may help the individual scientist to explore new useful domains. Natural language processing (NLP) is expected as one of the promising techniques to retrieve, abstract, and extract knowledge. In this contribution, NLP is firstly applied to the literature of chemical vapor deposition (CVD), which is a sub-discipline of materials science and is a complex and interdisciplinary field of research involving chemists, physicists, engineers, and materials scientists. Causal knowledge extraction from the literature is demonstrated using NLP.
  • Siri Krishan Wasan, Vasudha Bhatnagar, Harleen Kaur
    2006 年 5 巻 p. 119-126
    発行日: 2006年
    公開日: 2006/11/28
    ジャーナル フリー
    Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. However, the available raw medical data are widely distributed, heterogeneous in nature, and voluminous. These data need to be collected in an organized form. This collected data can be then integrated to form a hospital information system. Data mining technology provides a user-oriented approach to novel and hidden patterns in the data. Data mining and statistics both strive towards discovering patterns and structures in data. Statistics deals with heterogeneous numbers only, whereas data mining deals with heterogeneous fields. We identify a few areas of healthcare where these techniques can be applied to healthcare databases for knowledge discovery. In this paper we briefly examine the impact of data mining techniques, including artificial neural networks, on medical diagnostics.
  • Mingzhen Wei, Andrew H. Sung, Martha E. Cather
    2006 年 5 巻 p. 127-142
    発行日: 2006年
    公開日: 2006/11/28
    ジャーナル フリー
    Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases.
  • Mehedi Masud, Gopal Chandra Das, Anisur Rahman, Arunashis Ghose
    2006 年 5 巻 p. 143-161
    発行日: 2006年
    公開日: 2006/11/28
    ジャーナル フリー
    It is always a major demand to provide efficient retrieving and storing of data and information in a large database system. For this purpose, many file organization techniques have already been developed, and much additional research is still going on. Hashing is one developed technique. In this paper we propose an enhanced hashing technique that uses a hash table combined with a binary tree, searching on the binary representation of a portion the primary key of records that is associated with each index of the hash table. The paper contains numerous examples to describe the technique. The technique shows significant improvements in searching, insertion, and deletion for systems with huge amounts of data. The paper also presents the mathematical analysis of the proposed technique and comparative results.
Special Issue "Thousand Words"
Contents of Volume 5, 2006
  • Andrew Nelson, Alexander de Sherbinin, Francesca Pozzi
    2006 年 5 巻 p. 223-265
    発行日: 2006年
    公開日: 2006/12/07
    ジャーナル フリー
    There is clear demand for a global spatial public domain roads data set with improved geographic and temporal coverage, consistent coding of road types, and clear documentation of sources. The currently best available global public domain product covers only one-quarter to one-third of the existing road networks, and this varies considerably by region. Applications for such a data set span multiple sectors and would be particularly valuable for the international economic development, disaster relief, and biodiversity conservation communities, not to mention national and regional agencies and organizations around the world. The building blocks for such a global product are available for many countries and regions, yet thus far there has been neither strategy nor leadership for developing it. This paper evaluates the best available public domain and commercial data sets, assesses the gaps in global coverage, and proposes a number of strategies for filling them. It also identifies stakeholder organizations with an interest in such a data set that might either provide leadership or funding for its development. It closes with a proposed set of actions to begin the process.
feedback
Top