sicp-ex-2.72



<< Previous exercise (2.71) | Index | Next exercise (2.73) >>


RWitak

I think the answer to this question depends too much on the implementations to be answered generally. Take this version of encode-symbol:

    (define (encode-symbol symbol tree)
      (define (iter-encode symbol tree result)
        (if (leaf? tree)
            result
            (if (memq symbol (symbols (left-branch tree)))
                (iter-encode symbol (left-branch tree) 
                             (append result (list 0)))
                (iter-encode symbol (right-branch tree) 
                             (append result (list 1))))))
      (if (memq symbol (symbols tree))
          (iter-encode symbol tree '())
          (error "bad symbol -- ENCODE-SYMBOL" symbol)))

The symbol lists, just as the tree itself, could be made so that they're ordered by decreasing weight, eg. when implemented similar to SICP's "sample-tree" (and remember that we only discuss special cases similar to Ex. 2.71!):

    (define power2-huffman
      (make-code-tree (make-leaf 'A 16)
                      (make-code-tree
                       (make-leaf 'B 8)
                       (make-code-tree
                        (make-leaf 'C 4)
                        (make-code-tree
                         (make-leaf 'D 2)
                         (make-leaf 'E 1))))))

Codes:

  A 0
  B 10
  C 110
  D 1110
  E 1111    

In this scenario, encode-symbol only checks the full list once. Then it iterates over the tree, only ever checking the left branch - which in the special case we're supposed to examine always has just a single leaf with a single symbol, no matter how big the tree.

This means that in more than half of all cases (16/31), we're done with one iteration - O(1). In more than half of the remaining cases, we need 2 iterations, each checking just a single symbol. This continues until the last case, which is the rarest of all, and takes n iterations, making it O(n). Also, the initial lookup is very fast for most cases, only the least likely case has to traverse the full list.

Now comes the speculative part, as I suck at math: Given that in my implementation, the number of cases that depend on deeper levels gets cut in half and each step on its own is O(n), I estimate the overall complexity of that procedure as about O(log n) - please correct me, if that's wrong.


Apart from all that, if we know beforehand that only a "powers-of-two" kind of tree will be used (and that it's implemented in the way given above), we don't even need the tree itself for encoding, just the ordered list of all symbols:

    (define (power2-encode-symbol symbol power2-huffman)
      (define (traverse symlist)
        (cond ((null? symlist) 
               (error "bad symbol -- POWER2-ENCODE-SYMBOL" symbol))
              ((eq? symbol (car symlist))
               (if (null? (cdr symlist))
                   '(1)
                   '(0)))
              (else (cons 1 (traverse (cdr symlist))))))
      (traverse (symbols power2-huffman)))

In this very efficient encoding algorithm, the resulting code is already finished by the time we checked if the symbol is even contained in the tree! I guess it has the same overall complexity, but fewer steps in reality.


aos

Here's my take on this problem.

NB: This specifically refers to the encode-symbol procedure, which is a sub-procedure of the actual encode. If we were to extend the solution to encode, we'd have to multiply by an additional n for the entire message.

For the encode-symbol procedure in 2.68:

For the special case described in 2.71:

1. Encoding the most frequent symbol:

2. Encoding the least frequent symbol:

The above answer is not comprehensive enough.

For the encode-symbol procedure in 2.68:

    Consider the maximum and minimum
  • The minimum value is the case of a full binary tree:O(n * log_n)
  • The maximum value is like in 2.71:O(n^2)
  • The final answer is between O(n * log_n) and O(n^2).