sicp-ex-2.60

bravely

@sgm, actually there is to improve `intersection-set`: getting all the dupes using union then filtering via strict intersection. strict version by itself only keeps dupes from the first set, not the second. didn't try to optimize though. `element-of-set` sure, stays as it is.

 (define (element-of-set? x set) 
   (cond ((null? set) false) 
         ((equal? x (car set)) true) 
         (else (element-of-set? x (cdr set))))) 
  
 (define (adjoin-set x set) 
   (cons x set)) 
  
 (define (intersection-set-strict set1 set2) 
   (cond ((or (null? set1) (null? set2)) '()) 
         ((element-of-set? (car set1) set2)         
          (cons (car set1) 
                (intersection-set-strict (cdr set1) set2))) 
         (else (intersection-set-strict (cdr set1) set2)))) 
  
 (define (intersection-set set1 set2) 
   (let ((inter (intersection-set-strict set1 set2)) 
         (union (union-set set1 set2))) 
     (filter (lambda (x) (element-of-set? x inter)) 
             union))) 
  
 (define (union-set set1 set2) 
   (append set1 set2)) 
  
  
 ; from racket's rackunit 
 (check-equal? (intersection-set '(0 1 1 2 2 3) '(2 2 3 4)) 
               '(2 2 3 2 2 3))

Zelphir

@bravely I got another version for keeping the duplicates, although it might not be as elegant.

 (define (element-of-duplicate-set? x set) 
   (cond [(empty? set) false] 
         [(equal? x (car set)) true] 
         [else (element-of-duplicate-set? x (cdr set))])) 
  
 (define (adjoin-duplicate-set x set) 
   (cons x set)) 
  
 (define (intersection-duplicate-set set1 set2) 
   (define (iter result s1 s2 already-swapped) 
     (cond 
       [(and (empty? s1) (empty? s2)) 
         result] 
        
       [(and (empty? s1) (not already-swapped)) 
         (iter result s2 result true)] 
  
       [(and (empty? s1) already-swapped) 
         result] 
  
       [(element-of-duplicate-set? (car s1) s2) 
         (iter (append result (list (car s1))) 
               (cdr s1) 
               s2 
               already-swapped)] 
  
       [else 
         (iter result (cdr s1) s2 already-swapped)])) 
  
   (iter '() set1 set2 false)) 
  
 (define (union-duplicate-set set1 set2) 
   (append set1 set2))

sebastian

@bravely, if your improvement on intersection uses strict intersection, it has the same Θ(n^2) complexity, so it is not an improvement at all. Even tho the definition might look cleaner, it is in fact actually worse than traversing the first set checking whether each element is also in the other set, because your improvement will need to append the sets in the union, so it will take n steps more.

Also, we haven't answered the last question in the exercise. You should used this non-strict repeated-allowing representation (which is just any data structure, really) every time you can. The set data structure is defined by this non-repeatedness and it should be used only where this property is necessary, because the set basic operations are expensive compared to other data structures.

master

WARNING: This is wrong

-master

Here's how I implemented element-of-set? and intersection-set:

  
 (define (remove x set) 
   (filter (lambda (y) (not (equal? x y))) set)) 
  
 (define (element-of-set? x set) 
   (cond ((null? set) false) 
         ((equal? x (car set)) true) 
         (else (element-of-set? x (remove (car set) (cdr set)))))) 
  
 (define (intersection-set set1 set2) 
   (cond ((or (null? set1) (null? set2)) '()) 
         ((element-of-set? (car set1) set2) 
          (cons (car set1) (intersection-set (remove (car set1) (cdr set1)) set2))) 
         (else (intersection-set (cdr set1) set2))))

I believe my element-of-set? is more efficient because it removes duplicates of negative matches while it is searching. Imagine for example an ordered list with an initial member 1, a final member n which is what is being searched for and then n-2 other members each with an exorbitant amount of repititions. If we don't filter those members which we know aren't in there then we will need to check them each time. This way we only need to check them once, turning something with quadratic time complexity into something with linearithmic time complexity (I think, time complexity isn't really my forte).

egbc

@master, unfortunately, your implementation of element-of-set? is quite inefficient. Since remove has an algorithmic complexity of Θ(n), and is potentially being called once for every element in the set when you call element-of-set?, you've created a procedure with an algorithmic complexity of Θ(n^2), whereas simply iterating over the full set without removing any elements would have a complexity of Θ(n). As an example, imagine the following procedure call:

 (element-of-set? 11 (list 1 2 3 4 5 6 7 8 9 10))

remove will have been called 10 times, for no benefit.

master

You're right, there is no benefit; my thought process was mistaken. Although I do want to defend my implementation a little, yes for ordered sets with no repitition it is absolute rubbish. However, while it does have Θ(n^2) worst case time complexity, if there are lots of repititions then I think it is still Θ(n) in the best case, Θ(n log n) average-case (correct me if I'm wrong). Obviously in this case that is indeed slower, but not as bad as Θ(n^2).

pilot227

I feel as though we're missing the point of the exercise. Please correct me if I'm wrong. But the point was to create operations that worked on sets that allow duplicates. So the intersection-set operation should return the duplicates from each set as well - returning the values which appear in both sets. Now admittedly there is some ambiguity here as to whether to include all common values (e.g if there are six 2s in set 1 and four 2s in set2 return four 2s) or all instances of common values (e.g if there are six 2s in set1 and four 2s in set2 return ten 2s) but this depends on what you're trying to do I suppose. Either way the remove-element procedures from above seem to be trying to impose the distinct element narrative from the previous exercise.

 (define (intersection-set-dup set1 set2)                                                                          
   (define (filter-set-dup predicate seq in out)                                                                   
     (cond ((null? seq) (list in out))                                                                             
           ((predicate (car seq))                                                                                  
            (filter-set-dup predicate (cdr seq) (cons (car seq) in) out))                                          
           (else (filter-set-dup predicate (cdr seq) in (cons (car seq) out)))))                                   
                                                                                                                   
   (if (or (null? set1) (null? set2))                                                                              
       '()                                                                                                         
       (let ((filtererd_set1 (filter-set-dup (lambda (x) (equal? x (car set1))) set1 '() '()))                     
             (filtererd_set2 (filter-set-dup (lambda (x) (equal? x (car set1))) set2 '() '())))                    
         (append (append (car filtererd_set1) (car filtererd_set2))                                                
                 (intersection-set-dup (cadr filtererd_set1) (cadr filtererd_set2))))))

There are, no doubt, more elegant/efficient implementations. Say you were querying two databases and wanted all transactions above a certain value -- you'd want all duplicates (however they're defined -same value, same customer, same time, same item etc) to be present in the result.

partj

I think no nobody answered the last part of the question - when is the duplicate representation to be preferred? Here's my answer.

 ; Theta(n) but n could be large due to duplicates. Previously Theta(n) with no duplicates. 
 (define (element-of-set? x set) 
   (cond ((null? set) false) 
         ((equal? x (car set)) true) 
         (else (element-of-set? x (cdr set))))) 
  
 ; Theta(1). Previously Theta(n). 
 (define (adjoin-set x set) (cons x set)) 
  
 ; Theta(n). Previously Theta(n^2). 
 (define (union-set set1 set2) (append set1 set2)) 
  
 ; Theta(n^2) but n could be large due to duplicates. Previously Theta(n^2). 
 (define (intersection-set set1 set2) 
   (cond ((or (null? set1) (null? set2)) '()) 
         ((element-of-set? (car set1) set2) 
          (cons (car set1) (intersection-set (cdr set1) set2))) 
         (else (intersection-set (cdr set1) set2))))

Observation: There is an efficiency improvement for operations that add to the set, while there's a drop in efficiency for the other operations (membership check and intersection). That drop is proportional to the no. of duplicates that are added to the set(s). e.g. For intersection of sets that are 2x their non-duplicate size, the performance cost is 4x.

For applications where the sets are to be constructed from known non-duplicate elements, this representation is more efficient than the non-duplicate one.