Data Science Journal
Online ISSN : 1683-1470
Volume 3
Displaying 1-19 of 19 articles from this issue
Contents of Volume 3, 2004
  • Wing Tsang
    2004 Volume 3 Pages 1-9
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    This paper describes the present situation regarding chemical kinetic databases for the simulation of the combustion of liquid fuels. Past work in the area is summarized. Much is known about the reactions of the smaller fragments from combustion processes. In order to describe real liquid fuels there is the need for an understanding of how the larger organic fuels are broken down to these fragments. The type of reactions that need to be considered are described and the breakdown of heptane is used as an example.
    Download PDF (188K)
  • E.A. Kihn, M. Zhizhin, R. Siquig, R. Redmon
    2004 Volume 3 Pages 10-28
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    The Environmental Scenario Generator (ESG) is a network distributed software system designed to allow a user to interact with archives of environmental data for the purpose of scenario extraction, data analysis and integration with existing models that require environmental input. The ESG uses fuzzy-logic based search tools to allow a user to look for specific environmental scenarios in vast archives by specifying the search in human linguistic terms. For example, the user can specify a scenario such as a "cloud free week" or "high winds and low pressure" and then search relevant archives available across the network to get a list of matching events. The ESG hooks to existing archives of data by providing a simple communication framework and an efficient data model for exchanging data. Once data has been delivered by the distributed archives in the ESG data model, it can easily be accessed by the visualization, integration and analysis components to meet specific user requests. The ESG implementation provides a framework which can be taken as a pattern applicable to other distributed archive systems.
    Download PDF (630K)
  • P Ginsparg
    2004 Volume 3 Pages 29-37
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    If we were to start from scratch today to design a quality-controlled archive and distribution system for scientific and technical information, it could take a very different form from what has evolved in the past decade from pre-existing print infrastructure. Ultimately, we might expect some form of global knowledge network for research communications. Over the next decade, there are many technical and non-technical issues to address along the way, everything from identifying optimal formats and protocols for rendering, indexing, linking, querying, accessing, mining, and transmitting the information, to identifying sociological, legal, financial, and political obstacles to realization of ideal systems. What near-term advances can we expect in automated classification systems, authoring tools, and next-generation document formats to facilitate efficient data mining and long-term archival stability? How will the information be authenticated and quality controlled? What differences should be expected in the realization of these systems for different scientific research fields? Can recent technological advances provide not only more efficient means of accessing and navigating the information, but also more cost-effective means of authentication and quality control? Relevant experiences from open electronic distribution of research materials in physics and related disciplines during the past decade are used to illuminate these questions, and some of their implications for proposals to improve the implementation of peer review are then discussed.
    Download PDF (44K)
  • G Cotter, M Frame, R Sepic
    2004 Volume 3 Pages 38-59
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    Information concerning biodiversity and ecosystems is critical to a wide range of scientific, educational, and government uses; however, much of this information is not easily accessible. This paper presents the core concepts underlying the National Biological Information Infrastructure (NBII) , a Web-based system coordinated by the U.S. Geological Survey that provides data and information on U.S. biological resources and, through a variety of partnerships, biological resources in many other nations. This paper will highlight NBII development, implementation, technological innovation, and successful user applications at two regional nodes: the NBII Southern Appalachian Information Node and the NBII Central Southwest/Gulf Coast Node.
    Download PDF (785K)
  • K Kurihara, T Kunisawa
    2004 Volume 3 Pages 60-79
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    A gene order database of 32 completely sequenced plastid genomes was developed. The data structure is formally identical to that of the feature tables in the major GenBank/EMBL/DDBJ databases. The quality of annotations was largely improved. A normalizing gene-labeling system across the complete plastid genomes was developed so that comparative studies are made available without having to go back to sequence analysis. Many incorrect coordinates of tRNA-encoding regions found in the major databases were corrected. We attempted to distinctively label tRNA genes with the anticodon sequence CAT, which encodes either the initiator tRNA, elongator tRNA, or Ile-tRNA. The database is available at http://www.rs.noda.tus.ac.jp/~kunisawa.
    Download PDF (530K)
  • B Delecroix, R Epstein
    2004 Volume 3 Pages 80-87
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    Co-word analysis is based on a sociological theory developed by the CSI and the SERPIA (Callon, Courtial, Turner, 1991) in the mid eighties. This method, originally dedicated to scientific fields, measures the association strength between terms in documents to reveal and visualise the evolution of scientific fields through the construction of clusters and strategic diagram. This method has since been successfully applied to investigate the structure of many scientific areas. Nowadays it occurs in many software systems which are used by companies to improve their business, and define their strategy but its relevance to this kind of application has not been proved yet. Using the example of economic and marketing information on DSL technologies from Reuters Business Briefing, this presentation gives an interpretation of co-word analysis for this kind of information. After an overview of the software we used (Sampler) and after an outline of the experimental protocol, we investigate and explain each step of the co-word analysis process: terminological extraction, computation of clusters and the strategic diagram. In particular, we explain the meaning of each parameter of the method: the choice of variables and similarity measures is discussed. Finally we try to give a global interpretation of the method in an economic context. Further studies will be added to this work in order to allow a generalisation of these results.
    Download PDF (142K)
  • Y Kaji, H Tsuji, M Fujita, Y Xu, K Yoshida, S Mashiko, K Shimura, S Mi ...
    2004 Volume 3 Pages 88-95
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    The distributed materials database system named 'Data-Free-Way' has been developed by four organizations (the National Institute for Materials Science, the Japan Atomic Energy Research Institute, the Japan Nuclear Cycle Development Institute, and the Japan Science and Technology Corporation) under a cooperative agreement in order to share new and accumulated information for use in the development of advanced nuclear materials and for use in the design of structural components, etc. In order to make the system more valuable, the development of a knowledge based system, in which knowledge extracted from the material database is expressed, is planned for more effective utilization of Data-Free-Way. XML (eXtensible Markup Language) has been adopted as the method of describing the retrieved results and their meanings. A knowledge note described with XML is stored as a knowledge item in the knowledge base. Since this knowledge note is described with XML, the user can easily convert the displayed tables and graphs into a data format that the user usually uses. This paper will describe the current status of Data-Free-Way, the method of describing knowledge extracted from the materials database with XML and the distributed materials knowledge based system.
    Download PDF (553K)
  • Christian Bourret
    2004 Volume 3 Pages 96-113
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    Since the beginning of the 1980's most industrialized countries have had to cope with the problem of managing the costs of their Health care systems and particularly the costs of hospitalization. The development of information technologies accelerated by the Internet have led to responses that depend greatly on the better management of information. Health care is entering the Information Society: the rise of networks is a key aspect of our Knowledge and Information Society.
    With information and services developments that allow for greater involvement of the patient, Health data and more particularly medical data represent a very important economic and social concern. This paper analyzes the rise of networks within the Health field from the French experience in Healthcare Networks and compares it with that of the United States, Canada, the United Kingdom and Spain. Then we outline the special properties of Health data: its personal, sensitive and confidential nature as well as how it needs to conform to particular legislation. Finally we study the use of these data, the essential role of Information and Communication Systems with its key element, the patient's Electronic Health Record, in promoting new practices based on information sharing and the quality of data. This marks the beginning of huge changes.
    Download PDF (319K)
  • Enrico Franceschi, Andrea Bulgarelli, Fabio Grandi, Fulvio Gianotti, M ...
    2004 Volume 3 Pages 114-134
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    The development of a space mission requires the implementation of several Test Equipments which simulate data flows for a specific payload. This paper describes a project involving the adoption o XML-related technologies in the management of those data flows arranged into telemetry packets. The project is mainly made up of three parts: the creation of an XML database containing the packet descriptors (requiring the definition of an XML Schema); the development of an interface between preexisting software modules and the new XML database; the implementation of a QuickLook module which processes packet data converted into the XML-based FITSML format developed at NASA/GSFC. The final result is a set of interacting software modules that implement a demonstrative but fully operational prototype.
    Download PDF (584K)
  • P Arzberger, P Schroeder, A Beaulieu, G Bowker, K Casey, L Laaksonen, ...
    2004 Volume 3 Pages 135-152
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    Access to and sharing of data are essential for the conduct and advancement of science. This article argues that publicly funded research data should be openly available to the maximum extent possible. To seize upon advancements of cyberinfrastructure and the explosion of data in a range of scientific disciplines, this access to and sharing of publicly funded data must be advanced within an international framework, beyond technological solutions. The authors, members of an OECD Follow-up Group, present their research findings, based closely on their report to OECD, on key issues in data access, as well as operating principles and management aspects necessary to successful data access regimes.
    Download PDF (367K)
  • Udeepta D. Bordoloi, David L. Kao, Han-Wei Shen
    2004 Volume 3 Pages 153-162
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    Novel visualization methods are presented for spatial probability density function data. These are spatial datasets, where each pixel is a random variable, and has multiple samples which are the results of experiments on that random variable. We use clustering as a means to reduce the information contained in these datasets; and present two different ways of interpreting and clustering the data. The clustering methods are used on two datasets, and the results are discussed with the help of visualization techniques designed for the spatial probability data.
    Download PDF (433K)
  • H Huang, ZZ Hu, BE Suzek, CH Wu
    2004 Volume 3 Pages 163-174
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    The Protein Information Resource (PIR) provides many databases and tools to support genomic and proteomic research. PIR is a member of UniProt—Universal Protein Resource—the central repository of protein sequence and function, which maintains UniProt Knowledgebase with extensively curated annotation, UniProt Reference databases to speed sequence searches, and UniProt Archive to reflect sequence history. PIR also provides PIRSF family classification system based on evolutionary relationships of full-length proteins, and iProClass integrated database of protein family, function, and structure. These databases are easily accessible from PIR web site using a centralized data retrieval system for information retrieval and knowledge discovery.
    Download PDF (1017K)
  • Helga Tabuchi
    2004 Volume 3 Pages 175-181
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    The Standing Committee of Copyright and Related Rights of the World Intellectual Property Organization (WIPO) has been discussing the possibility of introducing intellectual property protection of non-original databases through new international norms. It has been examining whether databases that do not presently qualify for copyright protection should also be protected. On the other hand, it is pointed out that the need of the scientific, research and educational sectors and the issue of access to information should also be taken into account.
    Download PDF (152K)
  • F J Smith
    2004 Volume 3 Pages 182-190
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    An experimental system called QPS (Quantitative Problem Solver) has shown that a numerical database of quantities in the physical sciences can be enhanced by adding intelligence for problem-solving. The system needs to store not only numerical data but also the formulae that operate on the data. It needs also the logical software that enables the system to find and combine together data and formulae to solve problems. It is shown that this logical software is similar to the backward-chaining algorithm used in expert systems with factual data. It has been successfully tested on a large number of problems, including many taken from textbooks in physics and chemistry and some taken from practical problems in engineering, including problems that need the solution of simultaneous equations, and including a novel solution to the problem of choosing the optimum material for a component. It has an interface based on the well known symbols used in equations; it can work with any system of units and it can check the accuracy of the calculations. The principle can be used in any numerical database that contains data which can be manipulated using formulae, not just in the physical sciences.
    Download PDF (208K)
Special Section on Selection, Appraisal, and Retention of Digital Scientific Data
  • W. L. Anderson
    2004 Volume 3 Pages 191-201
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    One goal of the Committee on Data for Science and Technology is to solicit information about, promote discussion of, and support action on the many issues related to scientific and technical data preservation, archiving, and access. This brief paper describes four broad categories of issues that help to organize discussion, learning, and action regarding the work needed to support the long-term preservation of, and access to, scientific and technical data. In each category, some specific issues and areas of concern are described.
    Download PDF (201K)
  • Terry Eastwood
    2004 Volume 3 Pages 202-208
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    This paper aims to extract lessons from archivists' experience of appraising electronic records that are likely to have wider application in the preservation of other digital materials, including scientific data. It relies mainly on the work of the Appraisal Task Force of the InterPARES project on long-term preservation of authentic electronic records to develop a picture of the process of appraisal. It concludes that the aspects of assessment of authenticity, determination of the feasibility of preservation, and monitoring electronic records as they are maintained in the live environment are likely to find counterparts in attempts to appraise digital objects for long-term preservation in the scientific community. It also argues that the activities performed during appraisal constitute the first vital step in the process of preservation of digital materials.
    Download PDF (166K)
  • M gutmann, K Schürer, D Donakowski, Hilary Beedham
    2004 Volume 3 Pages 209-221
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    The number of data collections produced in the social sciences prohibits the archiving of every scientific study. It is therefore necessary to make decisions regarding what can be preserved and why it should be preserved. This paper reviews the processes used by two data archives, one from the United States and one from the United Kingdom, to illustrate how data are selected for archiving, how they are appraised, and what steps are required to retain the usefulness of the data for future use. It also presents new initiatives that seek to encourage an increase in the long-term preservation of digital resources.
    Download PDF (217K)
  • Luigi Fusco, Joost van Bemmelen
    2004 Volume 3 Pages 222-226
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    Earth Observation Missions provide continuous surveillance of the Earth regardless of atmospheric conditions producing huge amounts of data every year that need to be processed, elaborated, appraised and archived by dedicated systems. Emerging institutional and international environmental initiatives, like the ESA and EC Global Monitoring for Environment and Security (GMES), require access to full historical data collections, including the performed data elaborations, scientific analysis, models and results. The historical ESA Earth Observation archives account for Petabytes data holding, which is augmented, since the launch of Envisat in 2002, by some 500 Terabytes per year. The access and utilisation of these archives is an important measurement for long-term data preservation; improving it is a continuous challenge at programmatic, technological and operational level. This article describes how Digital Library and Grid technology can support the underlying infrastructure for long-term data preservation.
    Download PDF (1229K)
  • J. Esanu, J. Davidson, S. Ross, W. Anderson
    2004 Volume 3 Pages 227-232
    Published: 2004
    Released on J-STAGE: January 05, 2006
    JOURNAL FREE ACCESS
    CODATA and ERPANET collaborated to convene an international archiving workshop on the selection, appraisal, and retention of digital scientific data, which was held on 15-17 December 2003 at the Biblioteca Nacional in Lisbon, Portugal. The workshop brought together more than 65 researchers, data and information managers, archivists, and librarians from 13 countries to discuss the issues involved in making critical decisions regarding the long-term preservation of the scientific record. One of the major aims for this workshop was to provide an international forum to exchange information about data archiving policies and practices across different scientific, institutional, and national contexts. Highlights from the workshop discussions are presented.
    Download PDF (176K)
feedback
Top