<< Previous exercise (2.71) | Index | Next exercise (2.73) >>

Here's my take on this problem.

NB: This specifically refers to the *encode-symbol* procedure, which is a sub-procedure of the actual *encode*. If we were to extend the solution to *encode*, we'd have to multiply by an additional **n** for the entire message.

For the *encode-symbol* procedure in 2.68:

- Search the symbol list at each node: O(n) time
- Then take log_n branches
- Total:
**O(n * log_n)**

For the special case described in 2.71:

1. Encoding the most frequent symbol:

- Search through symbol list: O(n) time
- Take the first single branch, since it will be at the top of the list: constant
- Total:
**O(n)**

2. Encoding the least frequent symbol:

- Search through symbol list at each level: O(n) time
- Take the next branch, since we are only removing one node, it would be: O(n - 1)
- Total: O(n * (n - 1)), or
**O(n^2)**

The above answer is not comprehensive enough.

For the *encode-symbol* procedure in 2.68:

Consider the maximum and minimum

- The minimum value is the case of a full binary tree:
**O(n * log_n)** - The maximum value is like in 2.71:
**O(n^2)** - The final answer is between
**O(n * log_n)**and**O(n^2)**.

I think the answer to this question depends too much on the implementations to be answered generally. Take this version of

encode-symbol:The symbol lists, just as the tree itself, could be made so that they're ordered by decreasing weight, eg. when implemented similar to SICP's "sample-tree" (and remember that we only discuss special cases similar to Ex. 2.71!):

Codes:

In this scenario,

encode-symbolonly checks the full list once. Then it iterates over the tree, only ever checking the left branch - which in the special case we're supposed to examine always has just a single leaf with a single symbol, no matter how big the tree.This means that in more than half of all cases (16/31), we're done with one iteration - O(1). In more than half of the remaining cases, we need 2 iterations, each checking just a single symbol. This continues until the last case, which is the rarest of all, and takes n iterations, making it O(n). Also, the initial lookup is very fast for most cases, only the least likely case has to traverse the full list.

Now comes the speculative part, as I suck at math: Given that in my implementation, the number of cases that depend on deeper levels gets cut in half and each step on its own is O(n), I estimate the overall complexity of that procedure as about O(log n) - please correct me, if that's wrong.

Apart from all that, if we know beforehand that only a "powers-of-two" kind of tree will be used (and that it's implemented in the way given above), we don't even need the tree itself for encoding, just the ordered list of all symbols:

In this very efficient encoding algorithm, the resulting code is already finished by the time we checked if the symbol is even contained in the tree! I guess it has the same overall complexity, but fewer steps in reality.