We explore the use of the LZW and LZSS data compression methods. These methods or some versions of them are very common in use of compressing different types of data. Even though on average LZSS gives better compression results, we determine the case in which the LZW perform best and when the compression efficiency gap between the LZW algorithm and its LZSS counterpart is the largest.
Chemical analyses of volcanic gases consist of: location of sampling, date of sampling, identification of the sampling, etc. Nowadays, these data are generally represented in different formats. All of these formats are inflexible and machine dependent. XML has become the most important method of transferring data between computers. VolcanoGasML is a new format, based on XML, for the chemical analyses of volcanic gases. Its definition is divided into several layers: the first one describes the general information concerning the sample, the second, which is organized in several sublayers, contains the chemical data.
Recently, the use of the Burrows-Wheeler method for data compression has been expanded. A method of enhancing the compression efficiency of the common JPEG standard is presented in this paper, exploiting the Burrows-Wheeler compression technique. The paper suggests a replacement of the traditional Huffman compression used by JPEG by the Burrows-Wheeler compression. When using high quality images, this replacement will yield a better compression ratio. If the image is synthetic, even a poor quality image can be compressed better.
This paper reviews the present status and major problems of the existing ISO standards related to imagery metadata. An imagery metadata model is proposed to facilitate the development of imagery metadata on the basis of conformance to these standards and combination with other ISO standards related to imagery. The model presents an integrated metadata structure and content description for any imagery data for finding data and data integration. Using the application of satellite data integration in CEOP as an example, satellite imagery metadata is developed, and the resulting satellite metadata list is given.
This paper presents data mining techniques that can be used to study voting patterns in the United States House of Representatives and shows how the results can be interpreted. We processed the raw data available at http://clerk.house.gov, performed t-weight calculations, an attribute relevance study, association rule mining, and decision tree analysis and present and interpret interesting results. WEKA and SQL Server 2005 were used for mining association rules and decision tree analysis.
Spatial Data Infrastructures (SDIs) have been developing in some countries for over 10 years but still suffer from having a relatively small installed base. Most SDIs will soon converge around a service-oriented-architecture (SOA) using IT standards promulgated primarily by the Open Geospatial Consortium (OGC) and ISO Technical Committee 211. There are very few examples of these types of architected SDIs in action, and as a result little detailed information exists on suitable governance models. This paper discusses the governance issues that are posed by SOA-based SDIs, particularly those issues surrounding standards and services management, with reference to an Australian marine case study and the general literature. A generalised governance framework is then postulated using an idealised use case model which is applicable for "bottom-up," community-based initiatives. This model incorporates guiding principles and motivational and self-regulation instruments that are characteristically found in successful open source development activities. It is argued that harnessing an open development model, using a voluntary workforce, could rapidly increase the size of the SDI installed base and importantly defray infrastructure build costs.
We present an approach for improving the relevance of search results by clustering the search results obtained for a query string with the help of a Concept Clustering Algorithm. The Concept Clustering Algorithm combines common phrase discovery and latent semantic indexing techniques to separate search results into meaningful groups. It looks for meaningful phrases to use as cluster labels and then assigns documents to the labels to form groups. The labels assigned to each document cluster provide meaningful information on the various documents available under that cluster. This provides a more interactive and easier way to probe through search results and identifying the relevant documents for the users using the search engine.
Linear regression (LR) and support vector regression (SVR) are widely used in data analysis. Geometrical correlation learning (GcLearn) was proposed recently to improve the predictive ability of LR and SVR through mining and using correlations between data of a variable (inner correlation). This paper theoretically analyzes prediction performance of the GcLearn method and proves that GcLearn LR and SVR will have better prediction performance than traditional LR and SVR for prediction tasks when good inner correlations are obtained and predictions by traditional LR and SVR are far away from their neighbor training data under inner correlation. This gives the applicable condition of GcLearn method.
The Inverted Exponential Distribution is studied as a prospective life distribution. In this paper, we derive Bayes' estimators for the parameter θ of inverted exponential distribution. These estimators are obtained on the basis of squared error and LINEX loss functions. Comparisons in terms of risks with the estimate of θ under squared error loss and LINEX loss functions have been made. Finally, numerical study is given to illustrate the results.
Patents in general contain much novel technological information. This paper demonstrates that the usage of patent analysis can facilitate a unique scheme for tracking technology development. In this paper, the walking technique of the Japanese biped robot is tracked as an example. The searching method of the FI (file index) and F-term classification system developed by JPO (Japan Patent Office) was employed in this study, where all the related patent data were searched from the IPDL (Intellectual Property Digital Library). This study investigated an important technique applied to the humanoid biped robot that imitates the walking behavior of the human beings on two legs. By analyzing the patent information obtained, the relative research capabilities, technical strengths, and patent citation conditions among patent competitors were compared. Furthermore, a formulated technical matrix of patent map is established in this paper to indicate that the ZMP (Zero Moment Point) control means is the main technology to achieve stabilized walking control of the humanoid biped robot. This study also incorporates relevant academic journal findings and industrial information. Results presented herein demonstrate that patents can function not only as a map for tracking a technology trajectory, but also as a guide to the main development of a new technology in years to come.
Herein, an extension to the object query language (OQL) for incorporating binary relational expressions is investigated. The extended query language is suitable for query submissions to an object oriented database, whose functionality is based upon the algebra of binary relations. Algebraic expressions, consisting of simple and multiple merged chains of binary relations, are stated in SQL syntax-based object queries, which are utilized by a multiwavefront algorithm mapped on a multi-directional multi-functional engine(M2FE), for object oriented parallel query processing. The proposed extension also attempts to solve other object oriented database issues, such as inheritance, relationships between objects and literals, and recursive queries.
The purpose of this study is to develop a plausible method to code and compile Buddhist texts from original Tibetan scripts into Romanized form. Using GUI (Graphical User Interface) based on Object Oriented Design, a dictionary of Tibetan characters can be easily made for Buddhist literature researchers. It is hoped that a computer system capable of highly accurate character recognition will be actively used by all scholars engaged in Buddhist literature research. In the present study, an efficient automatic recognition method for Tibetan characters is established. The result of the experiments performed is that the recognition rate achieved is 99.4% for 28,954 characters.
Malaria pandemic (MP) has been linked to a range of serious health problems including premature mortality. The main objective of this research is to quantify uncertainties about impacts of malaria on mortality. A multivariate spatial regression model was developed for estimation of the risk of mortality associated with malaria across Ogun State in Nigeria, West Africa. We characterize different local governments in the data and model the spatial structure of the mortality data in infants and pregnant women. A flexible Bayesian hierarchical model was considered for a space-time series of counts (mortality) by constructing a likelihood-based version of a generalized Poisson regression model that combines methods for point-level misaligned data and change of support regression. A simple two-stage procedure for producing maps of predicted risk is described. Logistic regression modeling was used to determine an approximate risk on a larger scale, and geo-statistical ("Kriging") approaches were used to improve prediction at a local level. The results suggest improvement of risk prediction brought about in the second stage. The advantages and shortcomings of this approach highlight the need for further development of a better analytical methodology.
Ministers of science and technology asked the OECD in January 2004 to develop international guidelines on access to research data from public funding. The resulting Principles and Guidelines for Access to Research Data from Public Funding were recently approved by OECD governments and are discussed below. They are intended to promote data access and sharing among researchers, research institutions, and national research agencies. OECD member countries have committed to taking these principles and guidelines into account in developing their own national laws and research policies, taking account of differences in their respective national context.
The controversial provisions in the European Union's Database Directive have created considerable uncertainty for commercial producers of databases, while recent case law has emasculated much of the Directive. However, researchers and academics must still work in a restrictive copyright environment within Europe. This paper reviews the Directive in the light of two recent UK reports that suggest a more liberal copyright regime is both culturally and economically desirable. The author suggests that unfair competition problems should be addressed by new unfair competition laws for Ireland and the UK and not through revision of the Directive.
As an important part of the science and technology infrastructure platform of China, the Ministry of Science and Technology launched the Scientific Data Sharing Program in 2002. Twenty-four government agencies now participate in the Program. After five years of hard work, great progress has been achieved in the policy and legal framework, data standards, pilot projects, and international cooperation. By the end of 2005, one-third of the existing public-interest and basic scientific databases in China had been integrated and upgraded. By 2020, China is expected to build a more user-friendly scientific data management and sharing system, with 80 percent of scientific data available to the general public. In order to realize this objective, the emphases of the project are to perfect the policy and legislation system, improve the quality of data resources, expand and establish national scientific data centers, and strengthen international cooperation. It is believed that with the opening up of access to scientific data in China, the Program will play a bigger role in promoting science and national innovation.
In June 2004, an expert Task Force, appointed by the National Research Council Canada and chaired by Dr. David Strong, came together in Ottawa to plan a National Forum as the focus of the National Consultation on Access to Scientific Research Data. The Forum, which was held in November 2004, brought together more than seventy Canadian leaders in scientific research, data management, research administration, intellectual property and other pertinent areas. This article presents a comprehensive review of the issues, and the opportunities and the challenges identified during the Forum. Complex and rich arrays of scientific databases are changing how research is conducted, speeding the discovery and creation of new concepts. Increased access will accelerate such changes even more, creating other new opportunities. With the combination of databases within and among disciplines and countries, fundamental leaps in knowledge will occur that will transform our understanding of life, the world and the universe. The Canadian research community is concerned by the need to take swift action to adapt to the substantial changes required by the scientific enterprise. Because no national data preservation organization exists, may experts believe that a national strategy on data access or policies needs to be developed, and that a "Data Task Force" be created to prepare a full national implementation strategy. Once such a national strategy is broadly supported, it is proposed that a dedicated national infrastructure, tentatively called "Data Canada", be established, to assume overall leadership in the development and execution of a strategic plan.
The digital revolution has transformed the accumulation of properly curated public research data into an essential upstream resource whose value increases with use. The potential contributions of such data to the creation of new knowledge and downstream economic and social goods can in many cases be multiplied exponentially when the data are made openly available on digital networks. Most developed countries spend large amounts of public resources on research and related scientific facilities and instruments that generate massive amounts of data. Yet precious little of that investment is devoted to promoting the value of the resulting data by preserving and making them broadly available. The largely ad hoc approach to managing such data, however, is now beginning to be understood as inadequate to meet the exigencies of the national and international research enterprise. The time has thus come for the research community to establish explicit responsibilities for these digital resources. This article reviews the opportunities and challenges to the global science system associated with establishing an open data policy.
The National Institutes of Health (NIH) implemented a policy on data sharing in 2003. The policy reaffirmed the principle that data should be made as widely and freely available as possible while safeguarding the privacy of research participants, and protecting confidential and proprietary data. Restricted availability of unique resources upon which further studies are dependent can impede the advancement of research and the delivery of medical care. Therefore, research data supported with NIH funds should be made readily available for research purposes to qualified individuals within the scientific community.
One approach to sharing data is to establish a network of databases. However, there are a number of barriers to creating successful networks, which can include fundamental differences in informatics infrastructure and communication tools used at various research sites. Solutions will entail standards for data collection, processing, and archiving to allow interoperability among the databases and the ability to query data across databases. Open architectures for data collection as well as software to facilitate communication across different databases are needed.
A distributed infrastructure that would enable those who wish to do so to contribute their scientific or technical data to a universal digital commons could allow such data to be more readily preserved and accessible among disciplinary domains. Five critical issues that must be addressed in developing an efficient and effective data commons infrastructure are described. We conclude that creation of a distributed infrastructure meeting the critical criteria and deployable throughout the networked university library community is practically achievable.
Large-scale data policies easily may have unplanned effects of "homogeneity" on the available data supply. A narrowing of the scope of the data supply tailored to established research paradigms could limit the opportunities for unconventional, but also adventurous, new research directions with the risk of slowing down scientific progress. A systematic assessment of data portfolios not only on the aspects of quality, accessibility and sustainability, but diversity as well, could help to diminish this risk.
Supplement: Proceedings of the 20th International CODATA Conference
The biodiversity databases in Taiwan were dispersed to various institutions and colleges with limited amount of data by 2001. The Natural Resources and Ecology GIS Database sponsored by the Council of Agriculture, which is part of the National Geographic Information System planned by the Ministry of Interior, was the most well established biodiversity database in Taiwan. But thisThis database was, however, mainly collectingcollected the distribution data of terrestrial animals and plants within the Taiwan area. In 2001, GBIF was formed, and Taiwan joined as one of the an Associate Participant and started, starting the establishment and integration of animal and plant species databases; therefore, TaiBIF was able to co-operate with GBIF. The information of Catalog of Life, specimens, and alien species were integrated by the Darwin core. The standard. These metadata standards allowed the biodiversity information of Taiwan to connect with global databases.
Optical disks, DVDs and CDs, are convenient recording media on which to safely store data for a long period of time. However, the complete data erasure from recorded media is also important for the security of the data. After erasure of data from optical disks, recycling the material is needed in order to recover the valuable components of the optical disks. Here, data erasure methods for optical disks are discussed in the view of material recycling. The main finding of the study is that the explosion of optical disks in water is a very suitable method for complete erasure of data on the disks as well as recycling of their materials.
Anthropometric data are used by numerous types of organizations for health evaluation, ergonomics, apparel sizing, fitness training, and many other applications. Data have been collected and stored in electronic databases since at least the 1940s. These databases are owned by many organizations around the world. In addition, the anthropometric studies stored in these databases often employ different standards, terminology, procedures, or measurement sets. To promote the use and sharing of these databases, the World Engineering Anthropometry Resources (WEAR) group was formed and tasked with the integration and publishing of member resources. It is easy to see that organizing worldwide anthropometric data into a single database architecture could be a daunting and expensive undertaking. The challenges of WEAR integration reflect mainly in the areas of distributed and disparate data, different standards and formats, independent memberships, and limited development resources. Fortunately, XML schema and web services provide an alternative method for networking databases, referred to as the Loosely Coupled WEAR Integration. A standard XML schema can be defined and used as a type of Rosetta stone to translate the anthropometric data into a universal format, and a web services system can be set up to link the databases to one another. In this way, the originators of the data can keep their data locally along with their own data management system and user interface, but their data can be searched and accessed as part of the larger data network, and even combined with the data of others. This paper will identify requirements for WEAR integration, review XML as the universal format, review different integration approaches, and propose a hybrid web services/data mart solution.
The concept of a bulletin board system (BBS) equipped with information visualization techniques is proposed for supporting online data analysis. Although group discussion is known to be effective for analyzing data from various viewpoints, the number of participants is limited by time and space constraints. To solve that problem, this paper proposes to augment a BBS, a popular web based tool. In order for discussion participants to share data online, the system provides them with a visual representation of target data, which elicits comments from participants as well as compares these comments. In order to illustrate the concept's potential, a BBS equipped with KeyGraph is also developed for supporting online chance discovery. It has functions for making visual annotations on the KeyGraph as well as a function for retrieving similar scenarios. The experimental results show the effectiveness of the BBS in terms of the usefulness of scenario generation support functions as well as that of scenario retrieval engines.
In the present work, energy recovery and mechanical recycling, two treatment options for plastic wastes from discarded television sets, have been assessed and compared in the context of the life cycle assessment methodology (LCA). The environmental impact of each option was assessed by calculating the depletion of abiotic resources (ADP) and the global warming potential (GWP). Then, the indicators were compared, and the option with the smaller environmental impact was selected. The main finding of this study was that mechanical recycling of plastics is a more attractive treatment option in environmental terms than incineration for energy recovery.
It is important for a collaborative community to decide its next action. The leader of a collaborative community must choose an action that increases rewards and reduces risks. When a leader cannot make this decision, action will be determined through community member discussion. However, this decision cannot be made in blind discussions, so systematic discussion is necessary to choose effective action in a limited time. In this paper, we propose a bulletin board system framework in which effective discussion is established through visualized discussion logs.
We propose an approach for understanding leadership behavior in dot-jp, a non-profit organization, by analyzing heterogeneous multi-data composed of questionnaires and mailing list archives. Attitudes toward leaders were obtained from the questionnaires, and human networks were extracted from the mailing list archives. By integrating the results, we discovered that leaders must receive messages from other people as well as send messages to construct reliable relationships.
Scholarly data, such as academic articles, research reports and theses/dissertations, traditionally have limited dissemination in that they generally require journal subscription or affiliation with particular libraries. The notion of open access, made possible by rapidly advancing digital technologies, aims to break the limitations that hinder academic developments and information exchange. This paper presents the Electronic Thesis & Dissertation (ETD) Project at the Simon Fraser University Library, British Columbia, Canada, and discusses various technological considerations associated with the Project including selection of software, capture of metadata, and long-term preservation of the digitized data. The paper concludes that a well-established project plan that takes into account not only technological issues but also issues relating to project policies, procedures, and copyright permissions that occur in the process of providing open access plays a vital role for the overall success of such projects.
The construction of e-government entails the construction of an information system connecting the government with its society. The infrastructure and content of the applications determine the success of an e-government's implementation. Limitations of Internet access and people's awareness are the main problems in most developing countries, including Indonesia. Thus, the implementers of Indonesian e-government have to consider the social conditions and cultural behavior of their society. We propose the development of Community-based Information Systems (CIS) to empower the implementation of Indonesian e-government. A CIS is designed to take into account the Indonesian societal system, which is structured by communities. CIS is formatted based on one primary portal that groups numbers of community websites. In this paper, one pilot system is presented (www.nagari.org).
As a matter of fact, humans continuously delegate and distribute cognitive functions to the environment to lessen their limits. They build models, representations, and other various mediating structures that are thought to be good constructions. In doing this, humans are engaged in a process of cognitive niche construction. More precisely, we argue that a cognitive niche emerges from a network of continuous interplay between individuals and environment, in which people alter and modify the environment by mimetically externalizing fleeting thoughts, private ideas, etc., into external supports. This can turn out to be useful, especially for all those situations that require information transmission, shared knowledge, and more generally, cognitive resources.
The UNESCO office in Venice (the Regional Bureau for Science and Culture in Europe) has promoted, in collaboration with the Italian Agency for New Technologies, Energy, and the Environment (ENEA), an e-learning project on renewable energy: the DESIRE-net project (Development and Sustainability with International Renewable Energies network). The project's aim is to share the best available knowledge on renewable energies among all the countries that have joined the project and exploit this knowledge at every level. Currently the project involves 30 Eastern European and Southern Mediterranean countries as well as Australia, Indonesia, and China.
Although experimental as well as epidemiological studies have revealed the health effects of ionizing radiation, most of our knowledge is for high doses of radiation, while little is known for low doses. For practical purposes, we estimate the risk of low dose radiation by extrapolating the effects at high doses to low doses in a linear relationship. However, several lines of evidence have accumulated in recent years that suggest this linear extrapolation is not necessarily correct and needs further scientific evaluation. Today, many scientists in the field are striving to understand the biological responses to low dose radiation. This work will provide new and perhaps convincing data which are necessary for risk estimation of low dose radiation. Here, I overview the background of the issue.
In this paper, we introduce integrated data mining. Because of recent rapid progress in medical science as well as clinical diagnosis and treatment, integrated and cooperative research among medical researchers, biology, engineering, cultural science, and sociology is required. Therefore, we propose a framework called Cyber Integrated Medical Infrastructure (CIMI). Within this framework, we can deal with various types of data and consequently need to integrate those data prior to analysis. In this study, for medical science, we analyze the features and relationships among various types of data and show the possibility of integrated data mining.
Astronomy is one of the most data-intensive of the sciences. Data technology is accelerating the quality and effectiveness of its research, and the rate of astronomical discovery is higher than ever. As a result, many view astronomy as being in a "Golden Age," and projects such as the Virtual Observatory are amongst the most ambitious data projects in any field of science. But these powerful tools will be impotent unless the data on which they operate are of matching quality. Astronomy, like other fields of science, therefore needs to establish and agree on a set of guiding principles for the management of astronomical data. To focus this process, we are constructing a "data manifesto," which proposes guidelines to maximise the rate and cost-effectiveness of scientific discovery.
This paper describes a proper translation-selecting and translation-clustering algorithm for Korean translation of words automatically extracted from newspapers. As about 80% of the English words in Korean newspapers appear in abbreviated form, it is necessary to make clusters of translation words to construct easily bilingual knowledge bases such as dictionaries and translation patterns. As a seed to acquiring a translation cluster, we selected a proper translation word from a given translation set using bi-gram-based histograms. Translation words that share bi-grams with the chosen proper translation word are assigned to the cluster for the proper word. The given translation set then picks out the translation words of the cluster. These processes continue until the translation set becomes empty. Experimental results show that our algorithms are superior to bi-gram-based binary vectors including Dice coefficient and Jaccard coefficient in selecting the proper translation word for each translation cluster.
The latent structure behind an observation often plays an important role in the dynamics of visible events. Such latent structure is composed of invisible events named dark events. Human-interactive annealing is developed to visualize and understand dark events. This paper presents an application of human-interactive annealing for extracting new scenarios for patent technology using the latent technology structure behind current patented technology.
The concept of sustainable ecological-social-economic development is considered proceeding from the condition of obligatory coordination of economic, ecological, and human dimensions in such a way that from one generation to the other, the quality and safety of life should not decrease, the environmental conditions should not worsen, and social progress should meet the needs of every person. An approach of system coordination and balancing of these three constituents is suggested.
The National Institute of Standards and Technology (NIST) is developing a digital library to replace the widely used National Bureau of Standards Handbook of Mathematical Functions published in 1964. The NIST Digital Library of Mathematical Functions (DLMF) will include formulas, methods of computation, references, and links to software for over forty functions. It will be published both in hardcopy format and as a website featuring interactive navigation, a mathematical equation search, 2D graphics, and dynamic interactive 3D visualizations. This paper focuses on the development and accessibility of the 3D visualizations for the digital library. We examine the techniques needed to produce accurate computations of function data, and through a careful evaluation of several prototypes, we address the advantages and disadvantages of using various technologies, including the Virtual Reality Modeling Language (VRML), interactive embedded graphics, and video capture to render and disseminate the visualizations in an environment accessible to users on various platforms.
KISTI-ACOMS has been developed as a part of the national project for raising the information society index of academic societies that began in 2001. ACOMS automates almost every activity of academic societies, including membership management, journal processing, conference organization, and e-journal management and provides a search system. ACOMS can be customized easily by the system administrator of an academic society. The electronic databases built by ACOMS are serviced to the users through the KISTI website (http://www.yeskisti.net) along with other journal databases created in a conventional way. KISTI plans to raise the usage ratio of the ACOMS database to deliver society services up to 100% in the future.
The National Institute of Information and Communications Technology (NICT) operates a Japanese space forecast center for the International Space Environment Service (ISES). Information on space weather is exchanged daily among the space weather forecast centers all over the world. Researches of space weather need data on large areas of space from the Sun to the Earth's upper atmosphere. It is necessary for researchers of space weather to access various data and to communicate among various other researchers using a network. We describe experiments on the space weather network using the Japan Gigabit Network 2 (JGN2) operated by the NICT.
Under the support of the National Digital Archive Program (NDAP), basic species information about most Taiwanese fishes, including their morphology, ecology, distribution, specimens with photos, and literatures have been compiled into the "Fish Database of Taiwan" (http://fishdb.sinica.edu.tw). We expect that the all Taiwanese fish species databank (RSD), with 2800+ species, and the digital "Fish Fauna of Taiwan" will be completed in 2007. Underwater ecological photos and video images for all 2,800+ fishes are quite difficult to achieve but will be collected continuously in the future. In the last year of NDAP, we have successfully integrated all fish specimen data deposited at 7 different institutes in Taiwan as well as their collection maps on the Google Map and Google Earth. Further, the database also provides the pronunciation of Latin scientific names and transliteration of Chinese common names by referring to the Romanization system for all Taiwanese fishes (2,902 species in 292 families so far). The Taiwanese fish species checklist with Chinese common/vernacular names and specimen data has been updated periodically and provided to the global FishBase as well as the Global Biodiversity Information Facility (GBIF) through the national portal of the Taiwan Biodiversity Information Facility (TaiBIF). Thus, Taiwanese fish data can be queried and browsed on the WWW. For contributing to the "Barcode of Life" and "All Fishes" international projects, alcohol-preserved specimens of more than 1,800 species and cryobanking tissues of 800 species have been accumulated at RCBAS in the past two years. Through this close collaboration between local and global databases, "The Fish Database of Taiwan" now attracts more than 250,000 visitors and achieves 5 million hits per month. We believe that this local database is becoming an important resource for education, research, conservation, and sustainable use of fish in Taiwan.
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
This paper proposes an efficient algorithm to compress the cubes in the progress of the parallel data cube generation. This low overhead compression mechanism provides block-by-block and record-by-record compression by using tuple difference coding techniques, thereby maximizing the compression ratio and minimizing the decompression penalty at run-time. The experimental results demonstrate that the typical compression ratio is about 30:1 without sacrificing running time. This paper also demonstrates that the compression method is suitable for Hilbert Space Filling Curve, a mechanism widely used in multi-dimensional indexing.
The concept of information has become a crucial topic in several emerging scientific disciplines, as well as in organizations, in companies and in everyday life. Hence it is legitimate to speak of the so-called information society; but a scientific understanding of the Information Age has not had time to develop. Following this evolution we face the need of a new transdisciplinary understanding of information, encompassing many academic disciplines and new fields of interest. Therefore a Science of Information is required. The goal of this paper is to discuss the aims, the scope, and the tools of a Science of Information. Furthermore we describe the new Science of Information Institute (SOII), which will be established as an international and transdisciplinary organization that takes into consideration a larger perspective of information.
It is widely accepted that colloids play an important role in the contaminant migration process at present. However, the colloid deposition structure on rock surfaces has scarcely been studied. In this paper, preliminary results for a fractal characterization for colloid deposition in saturated fractures are presented, which consider the pH value, ionic strength, and flow rate of the solution. Under different chemical conditions, deposition behavior obviously changed, and fractal analysis appears to be an effective tool to capture the evolution and general behavior of depositions. Scanning Electron Microscopy (SEM) is used to observe the colloidal growth on granite surfaces and to acquire the visual image on a detailed level. The images are analyzed for their mass fractal dimensions. The influence on colloid fractal deposition is discussed.
The fast growing Open Content movement has profound consequences for pedagogical approaches to learning. This paper will explore the use of Open Content in higher education, including training for scientists and scholars at large, and consider its pedagogic implications. Relevance of these issues is expected to grow in the near future, involving the ability of scholars to cope with the increased need to access, search through, and fruitfully draw knowledge from data, especially for teams where cross-disciplinary competences are required to analyse, evaluate, and exchange data across a variety of research fields.