1   
  2   
  3   
  4   
  5   
  6   
  7   
  8   
  9   
 10   
 11   
 12   
 13   
 14   
 15   
 16   
 17   
 18   
 19   
 20   
 21   
 22   
 23   
 24   
 25   
 26   
 27   
 28   
 29   
 30   
 31   
 32   
 33   
 34   
 35   
 36   
 37   
 38  """ 
 39  Provides the implementation for various knapsack algorithms. 
 40   
 41  Knapsack algorithms are "fit" algorithms, used to take a set of "things" and 
 42  decide on the optimal way to fit them into some container.  The focus of this 
 43  code is to fit files onto a disc, although the interface (in terms of item, 
 44  item size and capacity size, with no units) is generic enough that it can 
 45  be applied to items other than files. 
 46   
 47  All of the algorithms implemented below assume that "optimal" means "use up as 
 48  much of the disc's capacity as possible", but each produces slightly different 
 49  results.  For instance, the best fit and first fit algorithms tend to include 
 50  fewer files than the worst fit and alternate fit algorithms, even if they use 
 51  the disc space more efficiently. 
 52   
 53  Usually, for a given set of circumstances, it will be obvious to a human which 
 54  algorithm is the right one to use, based on trade-offs between number of files 
 55  included and ideal space utilization.  It's a little more difficult to do this 
 56  programmatically.  For Cedar Backup's purposes (i.e. trying to fit a small 
 57  number of collect-directory tarfiles onto a disc), worst-fit is probably the 
 58  best choice if the goal is to include as many of the collect directories as 
 59  possible. 
 60   
 61  @sort: firstFit, bestFit, worstFit, alternateFit 
 62   
 63  @author: Kenneth J. Pronovici <pronovic@ieee.org> 
 64  """ 
 65   
 66   
 67   
 68   
 69   
 70   
 71   
 72   
 73   
 75   
 76     """ 
 77     Implements the first-fit knapsack algorithm. 
 78   
 79     The first-fit algorithm proceeds through an unsorted list of items until 
 80     running out of items or meeting capacity exactly.  If capacity is exceeded, 
 81     the item that caused capacity to be exceeded is thrown away and the next one 
 82     is tried.  This algorithm generally performs more poorly than the other 
 83     algorithms both in terms of capacity utilization and item utilization, but 
 84     can be as much as an order of magnitude faster on large lists of items 
 85     because it doesn't require any sorting. 
 86   
 87     The "size" values in the items and capacity arguments must be comparable, 
 88     but they are unitless from the perspective of this function.  Zero-sized 
 89     items and capacity are considered degenerate cases.  If capacity is zero, 
 90     no items fit, period, even if the items list contains zero-sized items. 
 91   
 92     The dictionary is indexed by its key, and then includes its key.  This 
 93     seems kind of strange on first glance.  It works this way to facilitate 
 94     easy sorting of the list on key if needed. 
 95   
 96     The function assumes that the list of items may be used destructively, if 
 97     needed.  This avoids the overhead of having the function make a copy of the 
 98     list, if this is not required.  Callers should pass C{items.copy()} if they 
 99     do not want their version of the list modified. 
100   
101     The function returns a list of chosen items and the unitless amount of 
102     capacity used by the items. 
103   
104     @param items: Items to operate on 
105     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
106   
107     @param capacity: Capacity of container to fit to 
108     @type capacity: integer 
109   
110     @returns: Tuple C{(items, used)} as described above 
111     """ 
112   
113      
114     included = { } 
115   
116      
117     used = 0 
118     remaining = capacity 
119     for key in items.keys(): 
120        if remaining == 0: 
121           break 
122        if remaining - items[key][1] >= 0: 
123           included[key] = None 
124           used += items[key][1] 
125           remaining -= items[key][1] 
126   
127      
128     return (included.keys(), used) 
 129   
130   
131   
132   
133   
134   
136   
137     """ 
138     Implements the best-fit knapsack algorithm. 
139   
140     The best-fit algorithm proceeds through a sorted list of items (sorted from 
141     largest to smallest) until running out of items or meeting capacity exactly. 
142     If capacity is exceeded, the item that caused capacity to be exceeded is 
143     thrown away and the next one is tried.  The algorithm effectively includes 
144     the minimum number of items possible in its search for optimal capacity 
145     utilization.  For large lists of mixed-size items, it's not ususual to see 
146     the algorithm achieve 100% capacity utilization by including fewer than 1% 
147     of the items.  Probably because it often has to look at fewer of the items 
148     before completing, it tends to be a little faster than the worst-fit or 
149     alternate-fit algorithms. 
150   
151     The "size" values in the items and capacity arguments must be comparable, 
152     but they are unitless from the perspective of this function.  Zero-sized 
153     items and capacity are considered degenerate cases.  If capacity is zero, 
154     no items fit, period, even if the items list contains zero-sized items. 
155   
156     The dictionary is indexed by its key, and then includes its key.  This 
157     seems kind of strange on first glance.  It works this way to facilitate 
158     easy sorting of the list on key if needed. 
159   
160     The function assumes that the list of items may be used destructively, if 
161     needed.  This avoids the overhead of having the function make a copy of the 
162     list, if this is not required.  Callers should pass C{items.copy()} if they 
163     do not want their version of the list modified. 
164   
165     The function returns a list of chosen items and the unitless amount of 
166     capacity used by the items. 
167   
168     @param items: Items to operate on 
169     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
170   
171     @param capacity: Capacity of container to fit to 
172     @type capacity: integer 
173   
174     @returns: Tuple C{(items, used)} as described above 
175     """ 
176   
177      
178     included = { } 
179   
180      
181     itemlist = items.items() 
182     itemlist.sort(lambda x, y: cmp(y[1][1], x[1][1]))   
183     keys = [] 
184     for item in itemlist: 
185        keys.append(item[0]) 
186   
187      
188     used = 0 
189     remaining = capacity 
190     for key in keys: 
191        if remaining == 0: 
192           break 
193        if remaining - items[key][1] >= 0: 
194           included[key] = None 
195           used += items[key][1] 
196           remaining -= items[key][1] 
197   
198      
199     return (included.keys(), used) 
 200   
201   
202   
203   
204   
205   
207   
208     """ 
209     Implements the worst-fit knapsack algorithm. 
210   
211     The worst-fit algorithm proceeds through an a sorted list of items (sorted 
212     from smallest to largest) until running out of items or meeting capacity 
213     exactly.  If capacity is exceeded, the item that caused capacity to be 
214     exceeded is thrown away and the next one is tried.  The algorithm 
215     effectively includes the maximum number of items possible in its search for 
216     optimal capacity utilization.  It tends to be somewhat slower than either 
217     the best-fit or alternate-fit algorithm, probably because on average it has 
218     to look at more items before completing. 
219   
220     The "size" values in the items and capacity arguments must be comparable, 
221     but they are unitless from the perspective of this function.  Zero-sized 
222     items and capacity are considered degenerate cases.  If capacity is zero, 
223     no items fit, period, even if the items list contains zero-sized items. 
224   
225     The dictionary is indexed by its key, and then includes its key.  This 
226     seems kind of strange on first glance.  It works this way to facilitate 
227     easy sorting of the list on key if needed. 
228   
229     The function assumes that the list of items may be used destructively, if 
230     needed.  This avoids the overhead of having the function make a copy of the 
231     list, if this is not required.  Callers should pass C{items.copy()} if they 
232     do not want their version of the list modified. 
233   
234     The function returns a list of chosen items and the unitless amount of 
235     capacity used by the items. 
236   
237     @param items: Items to operate on 
238     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
239   
240     @param capacity: Capacity of container to fit to 
241     @type capacity: integer 
242   
243     @returns: Tuple C{(items, used)} as described above 
244     """ 
245   
246      
247     included = { } 
248   
249      
250     itemlist = items.items() 
251     itemlist.sort(lambda x, y: cmp(x[1][1], y[1][1]))     
252     keys = [] 
253     for item in itemlist: 
254        keys.append(item[0]) 
255   
256      
257     used = 0 
258     remaining = capacity 
259     for key in keys: 
260        if remaining == 0: 
261           break 
262        if remaining - items[key][1] >= 0: 
263           included[key] = None 
264           used += items[key][1] 
265           remaining -= items[key][1] 
266   
267      
268     return (included.keys(), used) 
 269   
270   
271   
272   
273   
274   
276   
277     """ 
278     Implements the alternate-fit knapsack algorithm. 
279   
280     This algorithm (which I'm calling "alternate-fit" as in "alternate from one 
281     to the other") tries to balance small and large items to achieve better 
282     end-of-disk performance.  Instead of just working one direction through a 
283     list, it alternately works from the start and end of a sorted list (sorted 
284     from smallest to largest), throwing away any item which causes capacity to 
285     be exceeded.  The algorithm tends to be slower than the best-fit and 
286     first-fit algorithms, and slightly faster than the worst-fit algorithm, 
287     probably because of the number of items it considers on average before 
288     completing.  It often achieves slightly better capacity utilization than the 
289     worst-fit algorithm, while including slighly fewer items. 
290   
291     The "size" values in the items and capacity arguments must be comparable, 
292     but they are unitless from the perspective of this function.  Zero-sized 
293     items and capacity are considered degenerate cases.  If capacity is zero, 
294     no items fit, period, even if the items list contains zero-sized items. 
295   
296     The dictionary is indexed by its key, and then includes its key.  This 
297     seems kind of strange on first glance.  It works this way to facilitate 
298     easy sorting of the list on key if needed. 
299   
300     The function assumes that the list of items may be used destructively, if 
301     needed.  This avoids the overhead of having the function make a copy of the 
302     list, if this is not required.  Callers should pass C{items.copy()} if they 
303     do not want their version of the list modified. 
304   
305     The function returns a list of chosen items and the unitless amount of 
306     capacity used by the items. 
307   
308     @param items: Items to operate on 
309     @type items: dictionary, keyed on item, of C{(item, size)} tuples, item as string and size as integer 
310   
311     @param capacity: Capacity of container to fit to 
312     @type capacity: integer 
313   
314     @returns: Tuple C{(items, used)} as described above 
315     """ 
316   
317      
318     included = { } 
319   
320      
321     itemlist = items.items() 
322     itemlist.sort(lambda x, y: cmp(x[1][1], y[1][1]))     
323     keys = [] 
324     for item in itemlist: 
325        keys.append(item[0]) 
326   
327      
328     used = 0 
329     remaining = capacity 
330   
331     front = keys[0:len(keys)/2] 
332     back = keys[len(keys)/2:len(keys)] 
333     back.reverse() 
334   
335     i = 0 
336     j = 0 
337   
338     while remaining > 0 and (i < len(front) or j < len(back)): 
339        if i < len(front): 
340           if remaining - items[front[i]][1] >= 0: 
341              included[front[i]] = None 
342              used += items[front[i]][1] 
343              remaining -= items[front[i]][1] 
344           i += 1 
345        if j < len(back): 
346           if remaining - items[back[j]][1] >= 0: 
347              included[back[j]] = None 
348              used += items[back[j]][1] 
349              remaining -= items[back[j]][1] 
350           j += 1 
351   
352      
353     return (included.keys(), used) 
 354