Data Science Journal

Contents of Volume 5, 2006

Open access and biodiversity conservation: challenges and potentials for the developing world

Jitendra Gaikwad, Vishwas Chavan

2006 年 5 巻 p. 1-17
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.1

ジャーナルフリー

抄録を表示する抄録を非表示にする

Access to and sharing of data is essential for biodiversity conservation. However, workers from developing nations that harbor rich biodiversity often do not have access to biodiversity information and often are not keen on making what data they have accessible to others. Open access initiatives offer a great opportunity to make the world's biodiversity information accessible to anyone, at any time and in any place. This article reviews the state of open access in the developing world and argues for the increase of data on biodiversity in the public domain. It makes specific suggestions about how the developing world can reap the benefits of this global S&T movement to better conserve and sustain biotic resources through the creation of a "virtual biodiversity research space".

抄録全体を表示

PDF形式でダウンロード (115K)
Delivering a name-server for biodiversity information

C Hussey, S Wilkinson, J Tweddle

2006 年 5 巻 p. 18-28
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.18

ジャーナルフリー

抄録を表示する抄録を非表示にする

The number of online resources for biodiversity information is growing. Names of organisms underpin access to information but present a number of unique problems when used as search terms. We examine these problems and assert that a taxonomic name-server or thesaurus is necessary to enable optimal retrieval of records from multiple datasets. A simple solution is presented, based upon our experience working with "real-world" data in the National Biodiversity Network (NBN) in the United Kingdom. The NBN provides access to over 18 million observational records and incorporates a nomenclator covering 198,000 names.

抄録全体を表示

PDF形式でダウンロード (215K)
Building supply chain communication systems: a review of methods and techniques

A F Cutting-Decelle, B P Das, R I Young, K Case, S Rahimifard, C J Anu ...

2006 年 5 巻 p. 29-51
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.29

ジャーナルフリー

抄録を表示する抄録を非表示にする

With the increasing importance of computer-based communication technologies, communication networks are becoming crucial in supply chain management. Given the objectives of the supply chain, supply chain management is situated at the intersection of different professional sectors, each of them with its own vocabulary, its own knowledge and rules. This paper provides a review of the main approaches to supply chain communications through the analysis of different ways of modelling a supply chain and the presentation of new semantic-based approaches that have been and are being developed to improve the quality of the information exchanges within the supply chain.

抄録全体を表示

PDF形式でダウンロード (627K)
Definition of a web ontology for design-oriented material selection

Toshihiro Ashino, Mitsutane Fujita

2006 年 5 巻 p. 52-63
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.52

ジャーナルフリー

抄録を表示する抄録を非表示にする

A standardized data schema for material properties in XML is under development to establish a common and exchangeable expression. The next stage toward the management of knowledge about material usage, selection or processing is to define an ontology that represents the structure of concepts related to materials, e.g., definition, classification or properties of material.

Material selection for designing artifacts is a process that translates required material properties into material substances, which in turn requires a definition of data analysis and rules to interpret the result. In this paper, an ontology structure to formalize this kind of process is discussed using an example of the translation of creep property data into design data.

抄録全体を表示

PDF形式でダウンロード (201K)
Data for decision making in networked health

Christian Bourret, Gabriella Salzano

2006 年 5 巻 p. 64-78
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.64

ジャーナルフリー

抄録を表示する抄録を非表示にする

In developed countries, nowadays we live in a networked society: a society of information, knowledge and services (Castells, 1996), with strong specificities in the Health field (Bourret, 2003, Silber, 2003). The World Health Organization (WHO) has outlined the importance of information for improving health for all. However, financial resources remain limited. Health costs represent 11% of GNP in France, Germany, Switzerland and Canada, 14% in the USA, and 7.5% in Spain and the United Kingdom. Governments, local powers, health or insurance organizations therefore face difficult choices in terms of opportunities and priorities, and for that they need specific and valuable data. Firstly, this paper provide a comprehensive overview of our networked society and the appointment of ICT (Information and Communication Technologies) and Health (in other words e-Health) in a perspective of needs and uses at the micro, meso, and macro levels. We point out the main challenges of development of Nationwide Health Information Network both in the US, UK and France. Then we analyze the main issues about data for Decision Making in Networked Health: coordination and evaluation. In the last sections, we use an Information System perspective to investigate the three interoperability layers (micro, meso and macro). We analyze the requirements and challenges to design an interoperability global architecture which supports different kinds of interactions; then we focus on the harmonization efforts provided at several levels. Finally, we identify common methodological and engineering issues.

抄録全体を表示

PDF形式でダウンロード (151K)
Data publication in the open access initiative

Jens Klump, Roland Bertelmann, Jan Brase, Michael Diepenbroek, Hannes ...

2006 年 5 巻 p. 79-83
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.79

ジャーナルフリー

抄録を表示する抄録を非表示にする

The 'Berlin Declaration' was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the 'Berlin Declaration' should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community.

抄録全体を表示

PDF形式でダウンロード (188K)
Automated granularity to integrate digital information: the "Antarctic Treaty Searchable Database" case study

Paul Athur Berkman, George James Morgan III, Reagan Moore, Babak Hamid ...

2006 年 5 巻 p. 84-99
発行日: 2006年
公開日: 2006/06/28

DOIhttps://doi.org/10.2481/dsj.5.84

ジャーナルフリー

抄録を表示する抄録を非表示にする

Access to information is necessary, but not sufficient in our digital era. The challenge is to objectively integrate digital resources based on user-defined objectives for the purpose of discovering information relationships that facilitate interpretations and decision making. The Antarctic Treaty Searchable Database (http://aspire.nvi.net), which is in its sixth edition, provides an example of digital integration based on the automated generation of information granules that can be dynamically combined to reveal objective relationships within and between digital information resources. This case study further demonstrates that automated granularity and dynamic integration can be accomplished simply by utilizing the inherent structure of the digital information resources. Such information integration is relevant to library and archival programs that require long-term preservation of authentic digital resources.

抄録全体を表示

PDF形式でダウンロード (427K)
Effect of using bias-corrected estimators in logistic regression model in small samples: prostate-specific antigen (PSA) data

M.A. Matin

2006 年 5 巻 p. 100-107
発行日: 2006年
公開日: 2006/07/06

DOIhttps://doi.org/10.2481/dsj.5.100

ジャーナルフリー

抄録を表示する抄録を非表示にする

This study investigates the effect of bias-corrected estimators in analyzing real-world skewed data where categorization and transformation are necessary. It also reports a small-scale simulation study to indicate factors which can influence the bias correction to be small or large. For the complete data-set, it is observed that the maximum likelihood estimates and Schaefer's bias-corrected estimates are not greatly different. However, when the original sample size is reduced by about 50%, the difference between the estimates is found to be much larger, possibly even large enough to influence the conclusions drawn. The impact of transformation and categorization is visibly present. However, the broad impression gained in categorization is the same though difference in types of categorizations can not be overlooked. A factor which seems to influence the size of the bias correction is identified.

抄録全体を表示

PDF形式でダウンロード (75K)
Causal knowledge extraction by natural language processing in material science: a case study in chemical vapor deposition

Yuya Kajikawa, Yoshihide Sugiyama, Hideki Mima, Katsumori Matsushima

2006 年 5 巻 p. 108-118
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.108

ジャーナルフリー

抄録を表示する抄録を非表示にする

Scientific publications written in natural language still play a central role as our knowledge source. However, due to the flood of publications, the literature survey process has become a highly time-consuming and tangled process, especially for novices of the discipline. Therefore, tools supporting the literature-survey process may help the individual scientist to explore new useful domains. Natural language processing (NLP) is expected as one of the promising techniques to retrieve, abstract, and extract knowledge. In this contribution, NLP is firstly applied to the literature of chemical vapor deposition (CVD), which is a sub-discipline of materials science and is a complex and interdisciplinary field of research involving chemists, physicists, engineers, and materials scientists. Causal knowledge extraction from the literature is demonstrated using NLP.

抄録全体を表示

PDF形式でダウンロード (272K)
The impact of data mining techniques on medical diagnostics

Siri Krishan Wasan, Vasudha Bhatnagar, Harleen Kaur

2006 年 5 巻 p. 119-126
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.119

ジャーナルフリー

抄録を表示する抄録を非表示にする

Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. However, the available raw medical data are widely distributed, heterogeneous in nature, and voluminous. These data need to be collected in an organized form. This collected data can be then integrated to form a hospital information system. Data mining technology provides a user-oriented approach to novel and hidden patterns in the data. Data mining and statistics both strive towards discovering patterns and structures in data. Statistics deals with heterogeneous numbers only, whereas data mining deals with heterogeneous fields. We identify a few areas of healthcare where these techniques can be applied to healthcare databases for knowledge discovery. In this paper we briefly examine the impact of data mining techniques, including artificial neural networks, on medical diagnostics.

抄録全体を表示

PDF形式でダウンロード (214K)
Improving database quality through eliminating duplicate records

Mingzhen Wei, Andrew H. Sung, Martha E. Cather

2006 年 5 巻 p. 127-142
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.127

ジャーナルフリー

抄録を表示する抄録を非表示にする

Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases.

抄録全体を表示

PDF形式でダウンロード (204K)
A hashing technique using separate binary tree

Mehedi Masud, Gopal Chandra Das, Anisur Rahman, Arunashis Ghose

2006 年 5 巻 p. 143-161
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.143

ジャーナルフリー

抄録を表示する抄録を非表示にする

It is always a major demand to provide efficient retrieving and storing of data and information in a large database system. For this purpose, many file organization techniques have already been developed, and much additional research is still going on. Hashing is one developed technique. In this paper we propose an enhanced hashing technique that uses a hash table combined with a binary tree, searching on the binary representation of a portion the primary key of records that is associated with each index of the hash table. The paper contains numerous examples to describe the technique. The technique shows significant improvements in searching, insertion, and deletion for systems with huge amounts of data. The paper also presents the mathematical analysis of the proposed technique and comparative results.

抄録全体を表示

PDF形式でダウンロード (360K)

Special Issue "Thousand Words"

Editor's note

2006 年 5 巻 p. 162
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.162

ジャーナルフリー

PDF形式でダウンロード (32K)
Data science as an academic discipline

F. Jack Smith

2006 年 5 巻 p. 163-164
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.163

ジャーナルフリー

PDF形式でダウンロード (41K)
International Scientific Data, Standards, and Digital Libraries: An NSF NSDL (U.S.) and CODATA Workshop

Laura M. Bartolo

2006 年 5 巻 p. 165-167
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.165

ジャーナルフリー

PDF形式でダウンロード (56K)
Data standards for the international virtual observatory

R. J. Hanisch

2006 年 5 巻 p. 168-173
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.168

ジャーナルフリー

抄録を表示する抄録を非表示にする

A primary goal of the International Virtual Observatory Alliance, which brings together Virtual Observatory Projects from 16 national and international development projects, is to develop, evaluate, test, and agree upon standards for astronomical data formatting, data discovery, and data delivery. In the three years that the IVOA has been in existence, substantial progress has been made on standards for tabular data, imaging data, spectroscopic data, and large-scale databases and on managing the metadata that describe data collections and data access services. In this paper, I describe how the IVOA operates and give my views as to why such a broadly based international collaboration has been able to make such rapid progress.

抄録全体を表示

PDF形式でダウンロード (124K)
The Crystallographic Information File (CIF)

I.D. Brown, B. McMahon

2006 年 5 巻 p. 174-177
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.174

ジャーナルフリー

抄録を表示する抄録を非表示にする

The Crystallographic Information File (CIF), owned by the International Union of Crystallography, is a file structure based on tag-value ASCII pairs with tags defined in machine-readable dictionaries. The crystallographic community publishes and archives large quantities of numeric information generated by crystal structure determinations, and CIF's acceptance was assured by its adoption as the submission format for Acta Crystallographica and by the obvious needs of the community. CIF's strength lies in its dictionaries, which define most of the concepts of crystallography; its weakness is the difficulty of writing software that exploits its full potential.

抄録全体を表示

PDF形式でダウンロード (57K)
Geography Markup Language

David S. Burggraf

2006 年 5 巻 p. 178-204
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.178

ジャーナルフリー

抄録を表示する抄録を非表示にする

Geography Markup Language (GML) is an XML application that provides a standard way to represent geographic information. GML is developed and maintained by the Open Geospatial Consortium (OGC), which is an international consortium consisting of more than 250 members from industry, government, and university departments. Many of the conceptual models described in the ISO 19100 series of geomatics standards have been implemented in GML, and it is itself en route to becoming an ISO Standard (TC/211 CD 19136). An overview of GML together with its implications for the geospatial web is given in this paper.

抄録全体を表示

PDF形式でダウンロード (246K)
The publication of scientific data by World Data Centers and the National Library of Science and Technology in Germany

J. Brase, U. Schindler

2006 年 5 巻 p. 205-208
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.205

ジャーナルフリー

抄録を表示する抄録を非表示にする

In its 2004 report "Data and information", the International Council for Science (ICSU) strongly recommended a new strategic framework for scientific data and information. On an initiative from a working group from the Committee on Data for Science and Technology (CODATA), the German Research Foundation (DFG) has started the project "Publication and Citation of Scientific Primary Data" as part of the program "Information-infrastructure of network -based scientific-cooperation and digital publication" in 2004. Starting with the field of earth science, the German National Library of Science and Technology (TIB) is now established as a registration agency for scientific primary data as a member of the International DOI Foundation (IDF).

抄録全体を表示

PDF形式でダウンロード (127K)
MathML in practice: issues and promise

T.W. Cole

2006 年 5 巻 p. 209-218
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.209

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper discusses MathML, an XML-based standard for expressing mathematics - everything from elementary mathematics to undergraduate college-level mathematics. Limitations of pre-existing options led to the creation of MathML. MathML is designed to be useful for authoring and publishing, for creating online interactive math resources, and as a non-proprietary approach for archiving. MathML is supported by the W3C and by multiple scholarly publishers and vendors of computer-based mathematics software. The question now is whether MathML can achieve greater acceptance among authors and better integration with standards in related domains, either through cross-walks or through direct incorporation into other domain schemas.

抄録全体を表示

PDF形式でダウンロード (185K)
Data-centric view in e-Science information systems

Gregor Erbach

2006 年 5 巻 p. 219-222
発行日: 2006年
公開日: 2006/11/28

DOIhttps://doi.org/10.2481/dsj.5.219

ジャーナルフリー

抄録を表示する抄録を非表示にする

Network approaches in Current Research Information Systems support the shift from a document-centric to a data-centric view, which acknowledges the primacy of data in the scientific process. E-science holds the promise of a complete, data-centred documentation of the scientific process.

抄録全体を表示

PDF形式でダウンロード (60K)

Contents of Volume 5, 2006

Towards development of a high quality public domain global roads database

Andrew Nelson, Alexander de Sherbinin, Francesca Pozzi

2006 年 5 巻 p. 223-265
発行日: 2006年
公開日: 2006/12/07

DOIhttps://doi.org/10.2481/dsj.5.223

ジャーナルフリー

抄録を表示する抄録を非表示にする

There is clear demand for a global spatial public domain roads data set with improved geographic and temporal coverage, consistent coding of road types, and clear documentation of sources. The currently best available global public domain product covers only one-quarter to one-third of the existing road networks, and this varies considerably by region. Applications for such a data set span multiple sectors and would be particularly valuable for the international economic development, disaster relief, and biodiversity conservation communities, not to mention national and regional agencies and organizations around the world. The building blocks for such a global product are available for many countries and regions, yet thus far there has been neither strategy nor leadership for developing it. This paper evaluates the best available public domain and commercial data sets, assesses the gaps in global coverage, and proposes a number of strategies for filling them. It also identifies stakeholder organizations with an interest in such a data set that might either provide leadership or funding for its development. It closes with a proposed set of actions to begin the process.

抄録全体を表示

PDF形式でダウンロード (2062K)

J-STAGEへの登録はこちら（無料）