This paper proposes a new data compression method for a Japanese-text file, where the text is written in shift-JIS (JIS X 0208) codes. In the first pass, a dictionary array is built up by the higher frequency of both single and double byte characters. In the second pass, all the registered characters are replaced with the dictionary items: the code OxFF is put into a compressed file in front of non-registered ASCII character so as to distinguish non-registered characters from registered ones. It takes
O (1) time on a hashing basis to confirm whether each input character belongs to the dictionary, and to transfer its code to a dictionary item. Furthermore, the run-length encoding is applied to a sequence of consecutive identical characters for the purpose of accomplishment of the much higher compression ratio. The code OxFE is a indicator to start this encoding. A feature of the method is to be a non-modal type of compression.
抄録全体を表示