sicp-ex-2.72



<< Previous exercise (2.71) | Index | Next exercise (2.73) >>


shawn

I think the solution depends on how encode-symbol is implemented. In an efficient implementation, only one branch need to be searched to find the most frequent symbol. And that branch only has one symbol. Therefore, to encode the most frequent symbol, the order of growth is actually O(1).

As for encoding the least frequent symbol, we need to search both left and right branches at every node. Since we need to search n-1 nodes, the order of growth is O(n) + O(n-1) + ... + O(1), which is O(n^2).

Note that the solution for most frequent symbol is true for all cases, while the solution for least frequent symbol is only true for the special case of 2.71.


pluies

Encoding the most frequent element as per ex. 2.71 is a mere search into the symbol list, which is accomplished in O(n).

Encoding the least frequent element involves descending down the tree, with a search in the symbol list each time.

The complexity is O(n) + O(n-1) + ... + O(1), akin to O(n²).

Still not 100% sure of that; correct me if I'm wrong.

*given complexity of `member` / `element-of-set` function is O(n)


torinmr

Note: I misunderstood the question as analyzing the time to encode an entire message, instead of the time to encode one symbol.

Note: This answer is based on the following implementation:

 (define (encode message tree)                                                       
   (if (null? message)                                                               
       '()                                                                           
       (append (encode-symbol (car message) tree)                                    
               (encode (cdr message) tree))))                                           
                                                                                     
 (define (encode-symbol symbol tree)                                                 
   (cond ((leaf? tree) '())                                                          
         ((member symbol (symbols (left-branch tree)))                             
          (cons 0 (encode-symbol symbol (left-branch tree))))                        
         ((member symbol (symbols (right-branch tree)))                            
          (cons 1 (encode-symbol symbol (right-branch tree))))                       
         (else (error "Symbol -- not in tree" symbol)))) 
  
 (define (generate-huffman-tree pairs)                                               
   (successive-merge (make-leaf-set pairs)))                                         
                                                                                     
 (define (successive-merge leaf-set)                                                 
   (if (null? (cdr leaf-set))                                                        
       (car leaf-set)                                                                
       (successive-merge (adjoin-set                                                 
                          (make-code-tree (car leaf-set)                             
                                          (cadr leaf-set))                           
                          (cddr leaf-set))))) 

Let's consider the case where we have a message which contains (2^n - 1) symbols from an alphabet of n distinct symbols, with the symbols having the relative frequency described in 71. (i.e. The ith symbol occurs 2^(i-1) times.)

The encode procedure makes one call to encode-symbol for each symbol in the message. Besides this call, all operations take constant time except for the append operation, which takes time proportional to the number of bits in the representation of the symbol being considered. Thus, the total time spent in encode, not including encode-symbol, is:

O(n*1 + n*2 + (n-1)*4 + ... + 1*2^(n-1)) = O(2^n)

Now, let's analyze a call to encode-symbol, for the ith symbol (the one with frequency 2^(i-1)). encode-symbol is called once for each level of the tree traversed, for a total of (i+1) calls if i < n, or i calls if i == n. In each call on a non-leaf node, member is called on the left branch, which always takes constant time (since the left branch always has one element). Then, member is called on the right branch. This list contains (n - [current tree level]) elements, and the desired element is at the (i - [current tree level])th position, so this step takes O(i - [current tree level]) time. All the other operations in encode-symbol are constant time.

So, a call to encode-symbol for the ith symbol takes O(i^2) time: constant + O(i) = O(i) time at the first level, constant + O(i-1) = O(i), and so on down to the final recursive step, which is constant time.

So, encoding the most frequent symbol is O(1), while encoding the least frequent is O(n^2). Note that this does not change the running time for encoding the whole message, which is:

O(2^n + 4*2^(n-1) + 9*2^(n-2) + 16*2^(n-3) + 25*2^(n-4) + 36*2^(n-5) + ...)
= O(2^n * SUM_{1-n}(i^2/2^i))
= O(2^n * SUM_{1->n}(i^2/2^i))

We can prove that SUM_{1->n}(i^2/2^i) is convergent using the ratio test:

((i+1)^2 / 2^(i+1)) / (i^2 / 2^i))
= (1/2) (i+1)^2/i^2
= (1/2) (i^2 + 2i + 1)/i^2
= (1/2) (1 + 2/i + 1/i^2)

Which approaches 1/2 as i approaches infinity, proving the series is convergent. Thus, the running time is O(2^n) for encoding a complete message.

Note that this answer can be different depending on the specific implementation of encode-symbol and generate-huffman-tree. For example, if the huffman tree implementation prefers to put smaller nodes on the right, but encode-symbol searches the left subtree first (or vice-versa), then looking up the most frequent symbol will become O(n), as the entire symbol list for the subtree which doesn't contain the symbol will be searched.

Conversely, if we changed encode-symbol so that it assumed any symbol not in the left subtree was in the right subtree without calling member (and just returned an error if it eventually got to a leaf node which was not the requested symbol), then it would only take O(n) time to look up the least-frequent symbol.