Gathering information from social media content is becoming increasingly popular. Twitter, a microblog where posts are limited to 140 characters, is an excellent platform for gathering instant and interactive information. Considerable research has focused on Twitter’s effectiveness for disseminating emergency alerts and confirming the safety of acquaintances. However, there has been less emphasis on the analysis of Twitter posts to obtain information specialized to specific domains. Such analysis could enable simple and rapid identification of information related to state-of-the-art technology. Against this background, this study reports on a preliminary analysis of tweets by Japanese academic researchers. Our content analysis and text analysis reveal that many academic researchers tweet about their individual activities, education, or research. Their tweets contain domain-specific knowledge and have identifiable textual characteristics. This study provides basic findings that can be applied to obtain domain-specific knowledge from Twitter.
The UC CEISMIC Canterbury Earthquakes Digital Archive was established in response to the devastating earthquakes that struck Canterbury region in New Zealand from September 2010 onwards, including 4 quakes of magnitude 6 or greater and over 11,000 aftershocks. 185 people died and significant parts of Christchurch city were either destroyed or have needed to be demolished, resulting in financial losses of an estimated NZ$30 billion. The rebuild is expected to take 10 – 15 years, and the UC CEISMIC archive is designed to accommodate this, acting as a distributed national (and eventually international) repository for digital content produced as a result of the earthquakes. This paper outlines the design principles and architecture of the archive, describing the commitment to open access and open source that allowed the project team to bring together a broad-ranging national consortium comprised of leading cultural organizations, who work alongside content providers ranging from individual citizens, government agencies and community groups, to large media companies. Principles common to the digital humanities community were used to bond the broader project team, in an interesting example of scholar-led community engagement. The goal is to provide a model that can be used, either in whole or in part, by future teams in need of similar capability.
This study proposes a framework to access to the modern history of Japanese philosophy using natural language processing (NLP) and visualization. In order to discover new knowledge from massive amounts of information, support of information technologies is required. For supporting knowledge discovery from vast amount of books, we developed an OCR-based automatic book-digitizing framework and the system visualizing documents with relationships among them calculated by using NLP techniques. We applied the framework to Japanese journal Shisō (“Thought”) by the Japanese publisher Iwanami Shoten. We show an example of knowledge structure extracted from Shisō by using our visualizing system.
To support the automatic semantic analysis of texts in the humanities, it is not sufficient to analyze words and evaluate word pairs, because it is necessary to process larger units, such as phrases, sentences, and paragraphs. This study proposes the introduction of intratextuality into a digital archive system. In the future, this method will be developed as the basis for semantic analysis of larger units. Classical literary structures that are used frequently in the Old and New Testaments were digitized as a case study. A literary structure data format for a relational database was also implemented. The literary structures of 39 books in the Old Testament and 27 books in the New Testament were digitized. The total number of digitized literary structures was 1,507 and the elements of these structures comprised 7,715 pairs. These data were stored in a Java-based relational database system and a web-based viewer program for rhetorical structures was implemented as a JSP servlet. This web-based program will be combined with an existing digital archive system that can manage intertextuality data. The Java-based relational database system and the JSP servlet will facilitate numerical analyses of the intertextuality and intratextuality of digital archive systems of classical texts, thus making it much easier to conduct scientific analyses of the meanings of texts.
Printed books are finished products once published. Digital editions, on the other hand, may have an eternal life as updateable items, at least as archives. Since the work on digital editions started in the different Nordic countries more than 20 years ago, a consensus or convergence of practice seems to have grown in the planning and development of digital scholarly editions in the Nordic countries (here defined as Denmark, Finland, Norway, and Sweden; for Finnish editions, especially those based on material written in Swedish) (Dahlström and Ore 2013). Whereas it is possible to find some examples of similar developments in other countries or continents, it seems that there are remarkable similarities between the Nordic countries. In this paper I want to discuss what factors have contributed to this situation, and to illustrate them by presenting some fairly recent digital editions from Nordic countries.
This paper introduces the idea of data sharing strategy based on a conversion service, not on a sharing application, scheme, or ontology, that are dominant in proposals for language documentation. Although these three methods have been basic tactics for sharing corpora, they have a conceptual flaw in terms of descriptive linguistics. In this paper we report the results of a previous project - the LingDy project, and propose a basic concept for corpus sharing strategy to support personal diachronic data sharing. This paper is a revised version of a handout at JADH2012, so readers should be careful that this content is based on results at the time of 2012.
This paper describes a knowledge based character processing model to resolve some problems of coded character model. Currently, in the field of information processing of digital texts, each character is represented and processed by the “Coded Character Model.” In this model, each character is defined and shared using a coded character set (code) and represented by a code-point (integer) of the code. In other words, when knowledge about characters is defined (standardized) in a specification of a coded character set, then there is no need to store large and detailed knowledge about characters into computers for basic text processing. In terms of flexibility, however, the coded character model has some problems, because it assumes a finite set of characters, with each character of the set having a stable concept shared in the community. However, real character usage is not so static and stable. Especially in Chinese characters, it is not so easy to select a finite set of characters which covers all usages. To resolve these problems, we have proposed the “Chaon” model. This is a new model of character processing based on character ontology. This report briefly describes the Chaon model and the CHISE (Character Information Service Environment) project, and focuses on how to represent Chinese characters and their glyphs in the context of multiple unification rules.
The Collaborative EuropeaN Digital Archival Research Infrastructure (CENDARI) project has developed a new virtual environment for humanities research, reimagining the analogue landscape of research sources for medieval and modern history and humanities research infrastructure models for the digital age. To achieve this, the project has needed to be sensitive to the ways in which historical research practices in the 21st Century are distinct from those of earlier eras, harnessing the affordances of technology to reveal connections and support or refute hypotheses, enabling transnational approaches, and federating sources beyond the well-known and across the largely national organization paradigms that dominate within traditional knowledge infrastructures (libraries, archives and museums). This paper describes both the user-centered development methodology deployed by the project and the resulting technical architecture adopted to meet these challenging requirements. The resulting system is a robust ‘enquiry environment’ able to integrate a variety of data types and standards with bespoke tools for the curation, annotation, communication and validation of historical insight.