Bin-arrays

Bin-arrays are a compact way of storing a large collection of lists while allowing modifications in O(1) time. They share many of the characteristics of doubly-linked lists and are applicable to many of the situations where doubly-linked lists would be used. Bin-arrays are most useful, however, when dealing with large amounts of data, and when using a minimal amount of memory is more important than having extremely fast lookups, insertions, or deletions.

Structure and Algorithms

A precise statement of the problem being solved is, how do you compactly store a large collection of homogeneous lists of varying lengths while allowing deletions of arbitrary elements and insertions at arbitrary positions in constant time?

The most straightforward solution to the stated problem is shown in Figure 2 above. The general scheme is to allocate a node for each element in each list and to link together the elements using forward and backward pointers. This approach is easy to implement, easy to understand, and reasonably compact. However, the memory usage is not as low as it could be for the following reasons. First, allocating a large number of small objects can cause an undesirable amount of overhead in the general purpose memory allocation routines of a given language (for example, malloc() in C). Some implementations of memory allocators are better able to deal with this situation than others, but it is probably not unreasonable to assume that there will be undesirable overhead. (Note that in C++, a programmer can write a class-specific operator new() to dramatically reduce, though perhaps not entirely eliminate, the overhead.) Second, using a separate object for each element entails a nontrivial amount of fixed overhead per object in Java. This is so because every class in Java must derive from Object, which means that even an empty object takes up around 16 bytes.

Figure 3 above shows an approach which avoids both these problems by allocating only three arrays rather than lots of individual objects. The general scheme is to store each element in an array and to use a pair of array indices to link together the elements. Memory usage, however, may still not be as low as could be desired. If the user data consists simply of 4-byte integers and each link data is a 4-byte integer, then there is a constant 200% overhead in this representation. When storing a million elements, this amounts to 8MBs of overhead. This may seem like a tolerable amount of overhead given the large amounts of memory common in even desktop systems these days, but it still seems worthwhile to try to reduce the memory usage even further. It is quite likely that we may want to deal with not just a million elements, but say ten million elements, or even fifty million elements. These numbers imply overheads of 80MBs and 400MBs, which are sizable amounts even today.

The bin-arrays data structure, shown in Figure 4 above, is a modification of the array-based linked-list approach. It reduces memory usage by reducing the overhead of the link data. In this scheme, the elements of a given list are stored in one of four arrays. The array chosen depends on the length of the list. A list of length 1 (for example, [6]) is stored in the 1-bins array; one of length 2 (for example, [4, 5]) is stored in the 2-bins array; one of length 3 (for example, [1, 2, 3]) is stored in the 3-bins array; and finally, one of length 4 or longer (for example, [7, 8, 9, 10, 11, 12, 13]) is stored in the threaded 4-bins array. Inserting to, or deleting from, a list may cause that list to move to a different array. For example, appending to the list [6] will cause it to move successively to the 2-bins array, then the 3-bins array, and finally to the 4-bins array. Similarly, deleting from the list [1, 2, 3] will cause it to move successively to the 2-bins array and then to the 1-bins array. Two bits are reserved in the index stored in the index array for indicating which of these four arrays actually contains the elements of a given list.

The 1-bins, 2-bins, and 3-bins arrays store user data contiguously in fixed-size blocks. Because each stored list fits entirely within a block, we can do without storing any link data. Hence, the memory usage in this scheme for a collection of lists containing fewer than 4 elements is near optimal.

The 4-bins array stores the lists that have 4 or more elements. The approach is similar to the array-based linked-list representation except that there are now four data elements per a pair of forward and backward links. There are also a few other slight differences. The "previous" link of the first bin of a given list stores the index of the last bin of the same list. This makes it possible to find the last bin in constant time, in order to do appends, and to determine when the end of the chain of bins has been reached during traversals. Another difference is that the "next" link of the last bin gives the size of the list. In combination with the "previous" link of the first bin, this makes it possible to determine the length of any list in constant time. Finally, because each bin has room for four elements, and because not all of the slots may be occupied at a given moment, two bits in the "previous" link of each bin is reserved for recording the current occupancy (assuming that all data items are packed at the beginning of the bin).

There are two distinct algorithms for modifying the lists in the 4-bins array. One algorithm assumes that the lists are unordered, and the other that they are ordered. All lists in a given 4-bins array must be of the same type, either ordered or unordered.

The algorithm for unordered lists is the simpler of the two and always ensures that the 4-bins array is used in the most optimal way. The procedures for inserting and deleting an element is as follows: (i) to insert an element, simply append it to the end of the list, and (ii) to delete an element, overwrite the slot of the element with the last element in the list. See Figure 5 below.

First, consider the insertion algorithm when inserting an element into bin i. By hypothesis, the invariants hold true, and so u(i-1) + u(i) > k and u(i) + u(i+1) > k. It is easy to see that these relations will continue to hold if any of the u(j) values is increased by one, which is what would happen if an element was inserted into bin i-1, i, or i+1. So suppose that none of the three adjacent bins has room. Then u(i-1) = u(i) = u(i+1) = k, and bin i will be split into two bins, i' and i'', with the elements distributed in some manner between the two. The result will be that u(i') > 0, u(i'') > 0, and u(i') + u(i'') = u(i) + 1 = k + 1. This implies that u(i-1) + u(i') = k + u(i') > k and u(i'') + u(i+1) = u(i'') + k > k. Hence the invariants will be maintained.

Next, consider the deletion algorithm when deleting an element from bin i. First, consider the case where bin i has only one element. By hypothesis, the invariants hold true, and so u(i-1) + u(i) > k and u(i) + u(i+1) > k. That is, u(i-1) > k - 1 and u(i+1) > k - 1. This means that u(i-1) = u(i+1) = k, and hence u(i-1) + u(i+1) > k. Therefore the invariants will hold after bin i is unlinked from the chain. Next, consider the case where bin i has more than one element. Let i' be bin i after the element has been deleted. If u(i-1) + u(i') > k and u(i') + u(i+1) > k, then we are done. So suppose that u(i-1) + u(i') <= k. Then the deletion algorithm will combine bins i-1 and i' into bin i'' (see Figure 7 below). In the resulting chain, bins i-2 and i'' are adjacent, and bins i'' and i+1 are adjacent. We need to show that the second invariant holds in each pair. First, note that u(i'') > u(i-1) and u(i'') > u(i'), since u(i'') = u(i-1) + u(i'). Now consider the first pair. By hypothesis, u(i-2) + u(i-1) > k. Therefore, u(i-2) + u(i'') > u(i-2) + u(i-1) > k. Now consider the second pair. Similar reasoning shows that u(i'') + u(i+1) > u(i') + u(i+1) > k. Therefore the invariants hold in each pair. The other possibility when bin i has more than one element--namely, u(i') + u(i+1) <= k--is similar. Hence, in all cases, the invariants are maintained after deletion.

免费一1一级做爰片在线观看

Bin-arrays

Introduction

Structure and Algorithms

Analysis of Memory Usage

Conclusion