A Universal Algorithm for Sequential Data Compression
JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER, IEEE
thenshow that the efficiency of our universal code with no a priori knowledge of the source approaches those bounds. The proposed compression algorithm is an adaptation of a simple copying procedurediscussed recently  in a study on the complexity of finite sequences. Basically, we employ the concept of encoding future segments of the source-output via maximum-length copying from a buffer I.INTRODUCTION containing the recent past output. The transmitted N MANY situations arising in digital com- codeword consists of the buffer address and the length of munications and data processing, theencountered the copied segment. With a predetermined initial load of strings of data display various structural regularities or are the buffer and the information contained in the codewords, otherwisesubject to certain constraints, thereby allowing the source data can readily be reconstructed at the defor storage and time-saving techniques of data compres- coding end of the process. sion. Given adiscrete data source, the problem of data The main drawback of the proposed algorithm is its compression is first to identify the limitations of the source, susceptibility to error propagation in theevent of a channel and second to devise a coding scheme which, subject to error. certain performance criteria, will best compress the given source. Once the relevant source parameters have beenidentified, the problem reduces to one of minimum-redundancy coding. This phase of the problem has received extensive treatment in the literature [l]-. When no a priori knowledge of the sourcecharacteristics is available, and if statistical tests are either impossible or unreliable, the problem of data compression becomes considerably more complicated. In order to overcome these difficulties one...