Abstract
In this paper, we (i) propose a general-purpose database schema that can represent multichannel and multimodal spoken discourse corpora, and (ii) develop tools to construct a database, instantiating this schema with reference to configuration files, from annotations in various formats that have been created with existing annotation tools. Spoken discourse corpora involve more than 10 different annotations including both verbal and nonverbal information. They require the integration of a large number of linguistic/nonlinguistic units and relations among them and the function to search them with complex queries referring to multiple units. In spoken discourse corpora, it is essential to utilize existing annotation tools, which are widely used in the community. We propose a method to construct an environment for the usage of spoken discourse corpora that effectively utilizes existing annotation and search tools. The method has been applied to spoken discourse corpora developed by different organizations, and has been used effectively for corpus-linguistic research.