We describe an agent-based parallel HPSG parser that operates on shared-memory parallel machines. It efficiently parses real-world corpora by using a wide-coverage HPSG grammar. The efficiency is due to the use of a parallel parsing algorithm and the efficient treatment of feature structures. The parsing algorithm is based on the CKY algorithm, in which resolving constraints between a mother and her daughters is regarded as an atomic operation. The CKY algorithm features data distribution and granularity of parallelism. The keys to the efficient treatment of feature structures are i) transferring them through shared-memory, ii) copying them on demand, and iii) writing/reading them simultaneously onto/from memory.Being parallel, our parser is more efficient than sequential parsers. The average parsing time per sentence for the EDR Japanese corpus was 78 msec and its speed-up reaches 13.2 when 50 processors were used.
This paper examines the use of linguistic techniques in the area of automatic term recognition. It describes the TRUCKS model, which makes use of different types of contextual information-syntactic, semantic, terminological and statistical-seeking particularly to identify those parts of the context which are most relevant to terms. From an initial corpus of sublanguage texts, this identifies, disambiguates and ranks candidate terms. The system is evaluated with respect to the statistical approach on which it is built, and with respect to its expected theoretical performance. We show that by using deeper forms of contextual information, we can improve on the extraction of multi-word terms. The resulting list of ranked terms is shown to improve on that produced by traditional methods, in terms of precision and distribution, while the information acquired in the process can also be used for a variety of other applications, such as disambiguation, lexical tuning and term clustering.