Data Science Journal
Online ISSN : 1683-1470
9 巻
選択された号の論文の29件中1~29を表示しています
Contents of Volume 9, 2010
Papers
  • S. Shankar, T. Purusothaman
    2010 年 9 巻 p. 1-12
    発行日: 2010/02/24
    公開日: 2010/02/24
    [早期公開] 公開日: 2010/02/12
    ジャーナル フリー
    This article proposes an innovative utility sentient approach for the mining of interesting association patterns from transaction databases. First, frequent patterns are discovered from the transaction database using the FP-Growth algorithm. From the frequent patterns mined, this approach extracts novel interesting association patterns with emphasis on significance, utility, and the subjective interests of the users. The experimental results portray the efficiency of this approach in mining utility-oriented and interesting association rules. A comparative analysis is also presented to illustrate our approach's effectiveness.
  • Yi Wu, Peng Zhou, Jian Lin, Wanhua Qiu
    2010 年 9 巻 p. 13-28
    発行日: 2010/03/12
    公開日: 2010/03/12
    [早期公開] 公開日: 2010/02/28
    ジャーナル フリー
    By comparing a hard real-time system and a soft real-time system, this article elicits the risk of over-design in soft real-time system designing. To deal with this risk, a novel concept of statistical design is proposed. The statistical design is the process accurately accounting for and mitigating the effects of variation in part geometry and other environmental conditions, while at the same time optimizing a target performance factor. However, statistical design can be a very difficult and complex task when using clas-sical mathematical methods. Thus, a simulation methodology to optimize the design is proposed in order to bridge the gap between real-time analysis and optimization for robust and reliable system design.
  • Weizhong Lu, Yuanchun Zhou, Lei Liu, Baoping Yan
    2010 年 9 巻 p. 29-41
    発行日: 2010/05/26
    公開日: 2010/05/26
    [早期公開] 公開日: 2010/05/19
    ジャーナル フリー
    It is important to improve data reliability and data access efficiency for data-intensive applications in a data grid environment. In this paper, we propose an Information Dispersal Algorithm (IDA)-based parallel storage scheme for massive data distribution and parallel access in the Scientific Data Grid. The scheme partitions a data file into unrecognizable blocks and distributes them across many target storage nodes according to user profile and system conditions. A subset of blocks, which can be downloaded in parallel to remote clients, is required to reconstruct the data file. This scheme can be deployed on the top of current grid middleware. A demonstration and experimental analysis show that the IDA-based parallel storage scheme has better data reliability and data access performance than the existing data replication methods. Furthermore, this scheme has the potential to reduce considerably storage requirements for large-scale databases on a data grid.
  • Eric C. Kansa, Ahrash Bissell
    2010 年 9 巻 p. 42-53
    発行日: 2010/07/08
    公開日: 2010/07/08
    [早期公開] 公開日: 2010/06/29
    ジャーナル フリー
    In some areas of science, sophisticated web services and semantics underlie "cyberinfrastructure". However, in "small science" domains, especially in field sciences such as archaeology, conservation, and public health, datasets often resist standardization. Publishing data in the small sciences should embrace this diversity rather than attempt to corral research into "universal" (domain) standards. A growing ecosystem of increasingly powerful Web syndication based approaches for sharing data on the public Web can offer a viable approach. Atom Feed based services can be used with scientific collections to identify and create linkages across different datasets, even across disciplinary boundaries without shared domain standards.
  • Toshihiro Ashino
    2010 年 9 巻 p. 54-61
    発行日: 2010/07/08
    公開日: 2010/07/08
    [早期公開] 公開日: 2010/06/26
    ジャーナル フリー
    We have rich information resources for materials science and engineering - raw measurement data, computational simulation methods, digitized handbooks, and digital libraries. However, these resources have a wide variety of formats, terminologies, and concepts, which makes it difficult to find appropriate information for materials design, development, and evaluation. One solution to this problem is to integrate these resources into a computer readable concept map, called a domain ontology, which describes concepts and relationships among the concepts in materials science and engineering. This paper describes a trial that constructs a standard of metadata description using ontology language and demonstrates the validity of this construction through data exchange among heterogeneous material databases. "Materials Ontology," which consists of several sub ontologies corresponding to substance, process, environment, and property, is developed using the ontology language of the Semantic Web, OWL, which enables the definition of a flexible and detailed structure of materials information. A versatile "materials data format" is built on the Materials Ontology as a component of the materials information platform and is applied to exchange data among three different thermal property databases, maintained by two major materials science research institutes in Japan.
  • E Poovammal, M Ponnavaikko
    2010 年 9 巻 p. 62-72
    発行日: 2010/07/17
    公開日: 2010/07/17
    [早期公開] 公開日: 2010/06/23
    ジャーナル フリー
    Micro data is a valuable source of information for research. However, publishing data about individuals for research purposes, without revealing sensitive information, is an important problem. The main objective of privacy preserving data mining algorithms is to obtain accurate results/rules by analyzing the maximum possible amount of data without unintended information disclosure. Data sets for analysis may be in a centralized server or in a distributed environment. In a distributed environment, the data may be horizontally or vertically partitioned. We have developed a simple technique by which horizontally partitioned data can be used for any type of mining task without information loss. The partitioned sensitive data at 'm' different sites are transformed using a mapping table or graded grouping technique, depending on the data type. This transformed data set is given to a third party for analysis. This may not be a trusted party, but it is still allowed to perform mining operations on the data set and to release the results to all the 'm' parties. The results are interpreted among the 'm' parties involved in the data sharing. The experiments conducted on real data sets prove that our proposed simple transformation procedure preserves one hundred percent of the performance of any data mining algorithm as compared to the original data set while preserving privacy.
  • Robert R. Downs, James J. Marshall
    2010 年 9 巻 p. 73-92
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/07/17
    ジャーナル フリー
    The use of scientific data is becoming increasingly dependent on the software that fosters such use. As the ability to reuse software contributes to capabilities for reusing software-dependent data, instruments for measuring software reusability contribute to the reuse of software and related data. The development and current state of a proposed set of Reuse Readiness Levels (RRLs) are summarized, and potential uses of the software reusability measures are described, along with proposed use cases to support sponsorship of software projects, software production, software adoption, and data stewardship during the systems development lifecycle and the data lifecycle.
  • Rajesh Tailor, Med Ram Verma, Balkishan Sharma
    2010 年 9 巻 p. 93-99
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/06/29
    ジャーナル フリー
    An alternative ratio-cum-product estimator of population mean using the coefficient of kurtosis for two auxiliary variates has been proposed. The proposed estimator has been compared with a simple mean estimator, the usual ratio estimator, a product estimator, and estimators proposed by Singh (1967) and Singh et al. (2004). An empirical study is also carried out in support of the theoretical findings.
  • Mary Zborowski
    2010 年 9 巻 p. 100-106
    発行日: 2011/01/25
    公開日: 2011/01/25
    [早期公開] 公開日: 2011/01/13
    ジャーナル フリー
    NRC-CISTI serves Canada as its National Science Library (as mandated by Canada's Parliament in 1924) and also provides direct support to researchers of the National Research Council of Canada (NRC). By reason of its mandate, vision, and strategic positioning, NRC-CISTI has been rapidly and effectively mobilizing Canadian stakeholders and resources to become a lead player on both the Canadian national and international scenes in matters relating to the organization and management of scientific research data. In a previous communication (CODATA International Conference, 2008), the orientation of NRC-CISTI towards this objective and its short- and medium-term plans and strategies were presented. Since then, significant milestones have been achieved. This paper presents NRC-CISTI's most recent activities in these areas, which are progressing well alongside a strategic organizational redesign process that is realigning NRC-CISTI's structure, mission, and mandate to better serve its clients. Throughout this transformational phase, activities relating to data management remain vibrant.
  • Steven Kraines, Weisen Guo
    2011 年 9 巻 p. 107-123
    発行日: 2011/01/29
    公開日: 2011/01/29
    [早期公開] 公開日: 2011/01/13
    ジャーナル フリー
    Work towards creation of a knowledge sharing system for sustainability science through the application of semantic data modeling is described. An ontology grounded in description logics was developed based on the ISO 15926 data model to describe three types of sustainability science conceptualizations: situational knowledge, analytic methods, and scenario frameworks. Semantic statements were then created using this ontology to describe expert knowledge expressed in research proposals and papers related to sustainability science and in scenarios for achieving sustainable societies. Semantic matching based on logic and rule-based inference was used to quantify the conceptual overlap of semantic statements, which shows the semantic similarity of topics studied by different researchers in sustainability science, similarities that might be unknown to the researchers themselves.
  • Guang Li, Yadong Wang
    2011 年 9 巻 p. 124-132
    発行日: 2011/02/16
    公開日: 2011/02/16
    [早期公開] 公開日: 2011/02/08
    ジャーナル フリー
    Privacy protection is indispensable in data mining, and many privacy-preserving data mining (PPDM) methods have been proposed. One such method is based on singular value decomposition (SVD), which uses SVD to find unimportant information for data mining and removes it to protect privacy. Independent component analysis (ICA) is another data analysis method. If both SVD and ICA are used, unimportant information can be extracted more comprehensively. Accordingly, this paper proposes a new PPDM method using both SVD and ICA. Experiments show that our method performs better in preserving privacy than the SVD-based methods while also maintaining data utility.
"Proceedings of the International Symposium: Fifty Years after IGY - Modern Information Technologies and Earth and Solar Sciences -" (Eds. Iyemori, T. et al.) Part 2
  • Masatoshi Ohishi
    2010 年 9 巻 p. S128-S134
    発行日: 2010/03/13
    公開日: 2010/03/13
    [早期公開] 公開日: 2010/03/10
    ジャーナル フリー
    Astronomical Virtual Observatories (VOs) are emerging research environment for astronomy, and 16 countries and a region have funded to develop their VOs based on international standard protocols for interoperability. The 16 funded VO projects have established the International Virtual Observatory Alliance (http://www.ivoa.net/) to develop the standard interoperable interfaces such as registry (meta data), data access, query languages, output format (VOTable), data model, application interface, and so on. The IVOA members have constructed each VO environment through the IVOA interfaces. National Astronomical Observatory of Japan (NAOJ) started its VO project (Japanese Virtual Observatory - JVO) in 2002, and developed its VO system. We have succeeded to interoperate the latest JVO system with other VOs in the USA and Europe since December 2004. Observed data by the Subaru telescope, satellite data taken by the JAXA/ISAS, etc. are connected to the JVO system. Successful interoperation of the JVO system with other VOs means that astronomers in the world will be able to utilize top-level data obtained by these telescopes from anywhere in the world at anytime. System design of the JVO system, experiences during our development including problems of current standard protocols defined in the IVOA, and proposals to resolve these problems in the near future are described.
  • H. Nagao, S. Tsuboi, Y. Ishihara, H. Yanaka
    2010 年 9 巻 p. S135-S139
    発行日: 2010/03/28
    公開日: 2010/03/28
    [早期公開] 公開日: 2010/03/21
    ジャーナル フリー
    The data center of our institute distributes solid earth science data obtained by the Ocean Hemisphere Project (OHP) network through the website of Pacific 21. We have developed Java-based software "GDSClient", which enables us to collect not only the data of the OHP network but also those distributed from other data centers by means of the web service technology. It is possible to request the data controlling parameters such as data centers, observatories, a data period, and other auxiliary detailed parameters. It is unnecessary to know differences between data centers with preparing a WSDL (Web Services Description Language) file, in which information of user interface is described in XML format. The latest GDSClients are released from the website of Pacific 21.
Special Issue
Information Technology Challenges in Earth and Solar Sciences (Part 2)
CRIS for European e-Infrastructure
1 Introduction
  • K. Jeffery, A. Asserson
    2010 年 9 巻 p. CRIS1-CRIS6
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/04/29
    ジャーナル フリー
    The European e-infrastructure is the ICT support for research although the infrastructure will be extended for commercial/business use. It supports the research process across funding agencies to research institutions to innovation. It supports experimental facilities, modelling and simulation, communication between researchers, and workflow of research processes and research management. We propose the core should be CERIF: an EU recommendation to member states for exchanging research information and for homogeneous access to heterogeneous information. CERIF can also integrate associated systems (such as finance, human resource, project management, and library services) and provides interoperation among research institutions, research funders, and innovators.
2 The need for a CRIS. Structure and Use of a CRIS - The Common European Research Information Format Model (CERIF)
  • Keith Jeffery
    2010 年 9 巻 p. CRIS7-CRIS13
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/04/29
    ジャーナル フリー
    A CERIF-CRIS consists of base entities with records describing components of the research and link entities describing relationships among records in the base entities. As an example, three base entities may contain records describing a person, a publication and a project while two link entities relate respectively the person to the publication in role author and the person to the project in role project leader. This powerful linking or inter-relating capability includes temporal as well as role aspects and inter-relates dynamically and flexibly all the components of R&D. The CERIF model can be extended to inter-relate appropriate information from legacy information systems in an organisation, such as those covering accounting, human resources, project management, assets, stock control, etc. A CERIF-CRIS can thus provide a flexible low-cost integration comparable with an ERP (Enterprise Resource Planning) System, particularly in an organisation with R&D as its primary business.
  • A. Asserson, K. Jeffery
    2010 年 9 巻 p. CRIS14-CRIS23
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/04/30
    ジャーナル フリー
    CRIS (Current Research Information Systems) provide researchers, research managers, innovators, and others with a view over the research activity of a domain. IRs (institutional repositories) provide a mechanism for an organisation to showcase through OA (open access) its intellectual property. Increasingly, organizations are mandating that their employed researchers deposit peer-reviewed published material in the IR. Research funders are increasingly mandating that publications be deposited in an open access repository: some mandate a central (or subject-based) repository, some an IR. In parallel, publishers are offering OA but replacing subscription-based access with author (or author institution) payment for publishing. However, many OA repositories have metadata based on DC (Dublin Core) which is inadequate; a CERIF (Common-European Research Information Format) CRIS provides metadata describing publications with formal syntax and declared semantics thus facilitating interoperation or homogeneous access over heterogeneous sources. The formality is essential for research output metrics, which are increasingly being used to determine future funding for research organizations.
3 How to Set up and Use a CERIF-CRIS
  • Brigitte Jörg
    2010 年 9 巻 p. CRIS24-CRIS31
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/06/29
    ジャーナル フリー
    With increased computing power more data than ever are being and will be produced, stored and (re-) used. Data are collected in databases, computed and annotated, or transformed by specific tools. The knowledge from data is documented in research publications, reports, presentations, or other types of files. The management of data and knowledge is difficult, and even more complicated is their re-use, exchange, or integration. To allow for quality analysis or integration across data sets and to ensure access to scientific knowledge, additional information - Research Information - has to be assigned to data and knowledge entities. We present the metadata model CERIF to add information to entities such as Publication, Project, Organisation, Person, Product, Patent, Service, Equipment, and Facility and to manage the semantically enhanced relationships between these entities in a formalized way. CERIF has been released as an EC Recommendation to European Member States in 2000. Here, we refer to the latest version CERIF 2008-1.0.
  • Anne Asserson
    2010 年 9 巻 p. CRIS32-CRIS38
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/04/29
    ジャーナル フリー
    CRISs (Current Research Information Systems) are becoming increasingly important for organizations that are related to research, such as funding organisations, universities, and ministries. A CRIS holds information on research activities, results of research, and competence. A CRIS is useful for assessing a person or department, to show the institution's activity, to monitor scholarly activities, and as a base for the development of research strategy. This could be from a local CRIS, national CRIS, or from interoperable CRISs. A CRIS will be really useful if it is structured and can interoperate with other CRISs. The CERIF model (Current European Research Information Model) is a structured model and is able to give statistics for planning, evaluation, and assessment within an institution or benchmarking among institutions. The CERIF CRISs are able to give multiple views, such as a researcher's CV and an overview of an institution's projects (ongoing or ended) with project partners on an organizational or personal level. The output publications of a project are given for an individual researcher or institution, with linkage to the full text (in the local repository) and a list of journals where researchers or organizations are publishing, events, and an annual report on an individual researcher. A CERIF CRIS is recommended by the EU for interoperability among CRISs. A CERIF provides a one stop shop for users and gives uniform access to full text publications and scientific data. A partial model for people, organisation, and results, not projects, can be used. It is recommended, however, to implement the full model. To secure consistent information, it is also recommended to establish authority lists for people (unique ID, name, organization, position, age, sex, etc.) organsations (name, acronym, address, etc.), journals (title, acronym, publisher, URL, etc), and books (publisher, acronym, address, county, etc.) in the CERIF CRIS.
4 CRIS and the European e-Infrastructure, Enabling European Research
  • K. Jeffery
    2010 年 9 巻 p. CRIS39-CRIS43
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/04/29
    ジャーナル フリー
    The ESFRI Roadmap marked a turning point in the evolution of European thinking on research facilities, providing a catalogue of such facilities with their characteristics. In parallel, the ESF (European Science Foundation) completed a questionnaire-based survey of research facilities. Finally, the ERF (European Research Facilities) consortium representing national facilities with international access was formed to parallel EIROForum (the European laboratories funded by international subscriptions). It is becoming increasingly clear that management of these facilities and management of the research process require extensive ICT: for research managers that is provided by CRIS (Current Research Information Systems) and to give researchers additionally access to facilities to control experiments with associated modelling and simulation and access to research datasets and software.
  • K. Jeffery
    2010 年 9 巻 p. CRIS44-CRIS52
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/04/29
    ジャーナル フリー
    The end-user demands low effort threshold access to systems providing e-information, e-business, and e-entertainment. Innovators and entrepreneurs require also equally low-energy access to heterogeneous information homogenised to a form and language familiar to them. On top of that, decision-makers, whether in a control room or government strategic planning, demand equally easy access to information that is statistically or inductively enhanced to knowledge and access to modelling or simulation systems to allow 'what if?' requests. Researchers and technical workers have an additional requirement for rapid integration of information with statistical, induction, modelling, and simulation systems to generate and verify hypotheses so generating data and information, to be used by others, which in turn advances knowledge. Access is required, and can now be provided, anytime, anyhow, anywhere through ambient computing technology. A new paradigm, GRIDs, provides the architectural framework.
5 Using a CRIS for e-Infrastructure
  • S. C. Lambert
    2010 年 9 巻 p. CRIS53-CRIS58
    発行日: 2010/07/23
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/05/11
    ジャーナル フリー
    Scientific research is supported by infrastructure, and e-infrastructure is one part of this. Repositories of data are a part of the e-infrastructure and have their own particular needs arising from the requirement for permanence of their data holdings. There are many threats to permanence, and there is a growing awareness of these threats and how they may be countered. Current Research Information Systems and other support to the research lifecycle, while focused on facilitating research activities in the present, will have a role in the preservation of the outputs of research into the future.
  • E. Dijk, M. van Meel
    2010 年 9 巻 p. CRIS59-CRIS65
    発行日: 2010/07/24
    公開日: 2010/07/24
    [早期公開] 公開日: 2010/05/03
    ジャーナル フリー
    Scholarly publications are a major part of the research infrastructure. One way to make output available is to store the publications in Open Access Repositories (OAR). A Current Research Information System (CRIS) that conforms to the standard CERIF (Common European Research Information Format) could be a key component in the e-infrastructure. A CRIS provides the structure and makes it possible to interoperate the CRIS metadata at every stage of the research cycle. The international DRIVER projects are creating a European repository infrastructure. Knowledge Exchange has launched a project to develop a metadata exchange format for publications between CRIS and OAR systems.
Essay
Meeting Reports
  • Jacek Becla, Kian-Tat Lim, Daniel Liwei Wang
    2010 年 9 巻 p. MR1-MR8
    発行日: 2011/03/01
    公開日: 2011/03/01
    [早期公開] 公開日: 2011/02/22
    ジャーナル フリー
    Academic and industrial users are increasingly facing the challenge of petabytes of data, but managing and analyzing such large data sets still remains a daunting task. The 4th Extremely Large Databases workshop was organized to examine the needs of communities under-represented at the past workshops facing these issues. Approaches to big data statistical analytics as well as emerging opportunities related to emerging hardware technologies were also debated. Writable extreme scale databases and the science benchmark were discussed. This paper is the final report of the discussions and activities at this workshop.
Errata
feedback
Top