Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
26th (2012)
Session ID : 3M2-IOS-3b-3
Conference information

On Chinese Postal Address and Associated Information Extraction
*Chia-Hui CHANGChia-Yi HUANGYueng-Sheng SU
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Address information is closely linked to people's daily life. People often need to query addresses of shopping malls, schools, and organization, and use the map service of map marking to make sure reality location. MapMarker is a service, which extracts English postal addresses from general web pages and marks them with associated information on map. This paper extends the idea to Chinese postal addresses extraction on the Web and improves the extraction of associated information for each address with hierarchical clustering. We show how to prepare the data for training and conduct full address extraction using both BIEO and IO tagging methods. We compare the difference with and without Yahoo Chinese word segmentation. The results show that Chinese postal addresses can be extracted with high F-measure 0.97 using BIEO tagging without word segmentation since incorrect segmentation can lead to worse labeling of address tokens. Meanwhile, associated information for each address is also identified based on clustering of the addresses into address blocks. The F-measure is improved to 0.92 from 0.90.

Content from these authors
© 2012 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top