Historians have traditionally relied on close readings of select primary sources to evaluate linguistic and discursive changes over time, but this approach can be limiting in its scope. Numeric representations of language allow us to statistically quantify and compare the significance of discursive changes and capture linguistic relationships over time. Here, we compare two deep learning methods of quantitatively identifying the chronology of linguistic shifts: RNN classification and RNN language modeling. In particular, we examine deep learning methods of isolating stylistic from topical changes, generating “decade embeddings,” and charting the changing average perplexity in a language model trained on chronologically sorted data. We apply these models to a historical diplomatic corpus, finding that the two world wars proved to be notable moments of linguistic change in American foreign relations. With this example we show applications of text-based deep learning methods for digital humanities usages.
The Song Dynasty was a decisive period of transformation in ancient China, during which relationships between scholars and politicians are thought to have become closer and closer, and this change is considered part of the “Tang–Song transition.” In the Song Dynasty, the Yuanyou 元祐 era (1086–94) was a critical and complex period with regard to its political environment. The major purpose of this paper is to investigate the relationships between scholars and politicians during this period. The connections between figures collected from the CBDB (China Biographical Database) include both literary relations and political relations. Two scholars have a literary relation when both of them write to a common third figure, and a political relation between two politicians is demonstrated through such connections as political support associations, recommendation sponsorship, and oppositional political affiliations. In the present study, two matrices are respectively constructed according to literary and political relations among figures and a Poisson-Gamma factorization model is adopted to obtain the key factors of the matrices. According to calculated results and literary history, the scholars can be clearly classified into three groups. We identified two groups of the politicians with this method, while we found other politicians to have steered a course between them. Furthermore, the figures engaged in common literary pursuits are more likely to share common political goals. As a result, the observation that scholars and politicians are related closely in the Yuanyou era confirms that this period must have featured literati politics.
Although the literary structures employed in the Bible (chiasmus, concentric structures, and parallelism) are important for its interpretation, the ambiguity of these structures renders them problematic. In this paper, in order to establish an objective framework of evaluation for these structures, a quantitative analysis method is proposed. The target hypothesis of literary structures is the Parallel Literary Structure hypothesis about the hierarchical literary structure of the Bible. This hypothesis proposes that the literary structures in all books of the Bible have a parallel common system. Specifically, the validity of text divisions was evaluated based on divisions rendered in a number of extant Bible translations. Then, corresponding pericope pairs (a pericope is a small story unit in the Bible) that include various “common rare” words and phrases were counted, and the number of valid pairs was compared with the number of randomly constructed structures. From this analysis, statistical significances were extracted and the result strongly supports the hypothesis quantitatively.
The Landscapes of Injustice project is a multi-institutional seven-year research project funded by a Partnership Grant from the Social Sciences and Humanities Research Council of Canada. The project mission is to investigate, document, and analyze the process by which, beginning in 1942, tens of thousands of people of Japanese ethnicity were interned, and their property was seized and disposed of by the Canadian government and its agents. Part of this process involves identifying where Japanese Canadians were living and working prior to their internment, what property they owned, and how the dispossession affected them. This requires that we identify Japanese Canadian individuals across a range of different types of official and unofficial records, often based only on name, on a scale that makes an entirely manual process impractical. The project has therefore developed a semi-automated algorithmic approach to determining whether any name in the records is Japanese or not. This article describes the algorithm in detail, along with its application and limitations.
Migrants all over the world have left multiple traces in different countries, and this cultural heritage is of growing interest to researchers and to the migrant communities themselves. Cultural heritage institutions, however, have dwindling funds and resources to meet the demand for the heritage of immigrant communities to be protected. In this article we propose that the key to bridging this gap is to be found in new possibilities that are opened up if resources are linked to enable digital exploration of archival records and collections. In particular, we focus on the value of building a composite and distributed resource around migrants’ life courses. If this approach is used and dispersed collections held by heritage institutions can be linked, migrant communities can have access to detailed information about their families and researchers to a wealth of data—serial and qualitative—for sophisticated and innovative research. Not only does the scattered data become more usable and manageable, it becomes more visible and coherent; patterns can be discovered that were not apparent before. We use the Dutch-Australian collaborative project “Migrant: Mobilities and Connection” as an example and case study of this life course–centered methodology and propose that this may develop into a migration heritage template for migrants worldwide.
This article uses character n-grams methods to assess the authorship of three religious texts written in medieval Japan, comparing them to the works of Monkan (1278-1357), a Shingon monk active during the first half of the 14th century. Such texts belong to a literary genre called shōgyō that presents many challenges that render the use of traditional authorship attribution methods inappropriate. They were composed in Japanese kanbun (classical Chinese read in Japanese word order), and their contents are closer to cumulative work than to original creations by a single author.
This article thus draws on previous research on the translators of the Chinese Buddhist canon, which have proven to be far more effective than traditional methods developed for modern languages for the analysis of this type of literature. Concretely, it proposes a workflow proceeding from the preparation of the corpus (with manuscript edition, encoding,…) to a concrete data analysis using the variable length n-gram method. The last part of the article deals with future perspectives that would further refine the results, such as taking into account the various speaking voices found inside the shōgyō and their relationship to the author, as well as stylistic analysis based on grammatical patterns.
As a whole, the experiment succeeded in showing global trends in the texts of the Japanese Shingon schools, finding stylistic differences between the works of Kūkai, of monks from the 12th century, and of Monkan. Combined with a rigorous historical enquiry into the redaction context of the texts and their manuscripts, the data analysis also demonstrates that one of them was almost certainly written by Monkan, and another either by him or by one of his close disciples.