The first Web-based version of the NIST X-ray Photoelectron Spectroscopy Database (XPSDB) is described. The current database, built from a relational database management system (RDBMS), contains critically evaluated data with over 19,000 line positions, chemical shifts, doublet splittings, and energy separations of photoelectron and Auger-electron lines. It is available free of charge to the public through the Internet at http://srdata.nist.gov/xps/.
SWISS-PROT is a curated, non-redundant protein sequence database which provides a high level of annotation and is integrated with a large number of other biological databases. It is supplemented by TrEMBL, a computer-annotated database which contains translations of all coding sequences in the EMBL Nucleotide Sequence Database which are not yet in SWISS-PROT. Each fully curated SWISS-PROT entry contains as much up-to-date information as possible from a variety of sources and the high quality of the annotation in SWISS-PROT provides the basis for the procedure which is used to automatically annotate the TrEMBL database. The large amounts of different data types found in both databases are stored in a highly structured and uniform manner and this structured organisation means that SWISS-PROT and TrEMBL together provide a comprehensive resource with data that are readily accessible for users and easily retrievable by computer programs.
The relationship between apparently disparate sets of data is a critical component of interpreting materials' behavior, especially in terms of assessing the impact of the microscopic characteristics of materials on their macroscopic or engineering behavior. In this paper we demonstrate the value of principal component analysis of property data associated with high temperature superconductivity to examine the statistical impact of the materials' intrinsic characteristics on high temperature superconducting behavior.
Results applying ordinary kriging and cokriging techniques as well as the turning bands simulation method to the survey of heavy metal pollution of the superficial layer (at the depth of 0-20 cm) of soil in a selected mining region of Upper Silesia (S Poland) are presented. The multivariate structural analysis, estimation and conditional simulation was performed on data coming from the regional monitoring of soils. Based on estimated and the simulated cadmium and zinc heavy metal soil concentrations, the most polluted zones and places in the Dabrowa Górnicza region, where environmental monitoring should be instituted again, were determined.
The present contribution is a case study of the possibilities of using data from world scientific collections to understand the distribution and conservation of Mexican birds. Information was gathered on specimens from Mexico housed in 40 scientific collections in Mexico, the United States, Canada, and Europe. This information was compiled in a centralized database and various analyses were developed to address historical patterns of ornithological investigations in Mexico: current and potential distribution areas of the species; patterns of species richness, endemism and seasonality; and conservation applications.
A recurring theme during the CODATA 2000 conference (Lake Maggiore, Italy, 15 - 19 October 2000) was the increasing convergence in data-rich branches of science between the storage and retrieval of data and the publication of conclusions drawn from the data. Web publishing technologies facilitate access to publications and data through the same interfaces and tools. For crystallography, the ability to deliver the experimental data alongside the research commentary offers tremendous advantages. A structured file format has been developed that allows not only submission of a research article accompanied by a complete supporting data set, but also automated validation of the description of the crystal structure reported in the article against the accompanying data. Such validation is an important component of the review process, and encourages better-quality publications. The adopted format is different from XML, but shares some of the properties of that markup language; and suggests the improvements in quality that might result in other subject areas from the adoption of similar methodology. The International Union of Crystallographyfully exploits the convergence of publishing and data-handling technologies in its online journals and associated Web site.
In this paper we survey the recent activities and achievements of our research group in the deployment of XMLrelated technologies in Cultural Heritage applications concerning the encoding of temporal semantics in Web documents. In particular we will review "The Valid Web", which is an XML/XSL infrastructure we defined and implemented for the definition and management of historical information within multimedia documents available on the Web, and its further extension to the effective encoding of advanced temporal features like indeterminacy, multiple granularities and calendars, enabling an efficient processing in a user-friendly Web-based environment. Potential uses of the developed infrastructures include a broad range of applications in the cultural heritage domain, where the historical perspective is relevant, with potentially positive impacts on E-Education and E-Science.
Recent developments on the World-Wide Web provide an unparalleled opportunity to revolutionise scientific, technical and medical publication. The technology exists for the scientific world to use primary publication to create a knowledge base, or Semantic Web, with a potential greatly beyond the paper archives and electronic databases of today.
A data set of 412 olfactory compounds, divided into animal, camphoraceous, ethereal and fatty olfaction classes, was submitted to an analysis by a Fuzzy Logic procedure called Adaptive Fuzzy Partition (AFP).This method aims to establish molecular descriptor/chemical activity relationships by dynamically dividing the descriptor space into a set of fuzzily partitioned subspaces. The ability of these AFP models to classify the four olfactory notes was validated after dividing the data set compounds into training and test sets, including 310 and 102 molecules, respectively. The main olfactory note was correctly predicted for 83 % of the test set compounds.
A review of issues in image compression is presented, with a strong focus on the wavelet transform and other closely related multiresolution transforms. The roles of information content, resolution scale, and image capture noise, are discussed. Experimental and practical results are reviewed.
STMML is an XML-based markup language covering many generic aspects of scientific information. It has been developed as a re-usable core for more specific markup languages. It supports data structures, data types, metadata, scientific units and some basic components of scientific narrative. The central means of adding semantic information is through dictionaries. The specification is through an XML Schema which can be used to validate STMML documents or fragments. Many examples of the language are given.
This paper proposes an overview of the IMSA application, a patient-oriented medical information system. IMSA stands for Interactive Multimedia System for Auto-medication and aims to provide a health-care Internet tool for the end-user. This system proposes an environment that integrates on-line health information, medical and pharmaceutical databases and a knowledge-based system for medical diagnosis. The implementation process focuses on cognitive science, knowledge representation and human-computer interaction.
Modeling of the solubility of amino acids and purine and pyrimidine bases with a set of sixteen molecular descriptors has been thoroughly analyzed to detect and understand the reasons for anomalies in the description of this property for these two classes of compounds. Unsatisfactory modeling can be ascribed to incomplete collateral data, i.e, to the fact that there is insufficient data known about the behavior of these compounds in solution. This is usually because intermolecular forces cannot be modeled. The anomalous modeling can be detected from the rather large values of the standard deviation of the estimates of the whole set of compounds, and from the unsatisfactory modeling of some of the subsets of these compounds. Thus the detected abnormalities can be used (i) to get an idea about weak intermolecular interactions such as hydration, self-association, the hydrogen-bond phenomena in solution, and (ii) to reshape the molecular descriptors with the introduction of parameters that allow better modeling. This last procedure should be used with care, bearing in mind that the solubility phenomena is rather complex.
Complete genomic sequence data are stored in the public GenBank/EMBL/DDBJ databases so that any investigator can make use of the data. This report describes a comparative analysis of codon usage that is impossible without such a public and open data system. A limited number of bacteriophages harbor their own transfer RNAs. Based on a comparison between T4 phage-encoded tRNA species and the relative cellular amounts of host Escherichia coli tRNAs, it is hypothesized that T4 tRNAs could serve to supplement host isoacceptor tRNA species that are present in minor amounts and thus enhance the translational efficiency of phage proteins. When compared to their respective host bacteria, the codon usage data of bacteriophages D3, φC31, HP1, D29 and 933W all show an increased frequency of synonymous codons or amino acids that correspond to phage tRNA species, suggesting their supplemental role in the efficient production of phage proteins. The data-analysis presents an example in which the availability of an open and fully accessible database system would allow one to obtain comprehensive insights into a fundamental problem in molecular biology.
The WFCC-MIRCEN World Data Centre for Microorganisms (WDCM) was set up more than 30 years ago as a data center of the World Federation for Culture Collections (WFCC). It published the World Directory of Collections of Cultures of Microorganisms when it was established and now provides a portal site for microbial resource centers and their customers by fully utilizing Internet technology. This paper introduces international initiatives on biological resources centers together with the activities of WDCM.
China's Natural Resources Database (CNRD) is a comprehensive database, developed to support the research on natural resources, social sustainable development and environmental security in China. This paper intends to introduce the background, contents, characteristics and application of the CNRD.
We present an overview of the VL approach to promote research and education in developing countries and to help reduce the technology gap of the digital divide. We discuss software tools for instrument control, data sharing and e-collaboration and communications with special attention to low-bandwidth networks. We analyse the tentative costs involved in VL and the skills needed for VL administration. We conclude by identifying some VL strategies for development.
This paper presents the SYDOX/MATCOMP/Xi project, funded by the French Ministry of Industry. The goal of the project was to provide the construction professionals with an on-line aid for component specification and selection at different levels of the construction life cycle. This two years project started in 1997 and involved several partners. This paper describes the main features of the information system: databases, query and communication systems. SYDOX(SYstème de DOnnées compleXes) is aimed at defining and demonstrating a prototype to access information about MATerials and COMPonents used in construction, implemented on a WWW server. Though the objective is general, the work was focused on a restricted sub-section of the construction domain. We describe the domain and the scope of the project, the starting point and the lessons learnt from the development of the prototype. We also propose some important ideas on which this research is based.
Various receptor data were collected, edited and integrated into an Integrated Receptor Database (IRDB). The data stored includes structural data (amino acid sequences, their secondary-structure and three-dimensional structure), functional data, binding affinity, cell signaling data etc. The purpose of this database is to allow structural biologists, drug designers and toxicologists to analyse and elucidate receptor-ligand dockings and the resultant post-binding signal transduction pathways. IRDB is available on line (http://impact.nihs.go.jp/RDB.html)