Introduction to Algorithms (CLRS) states that a hash table using doubly linked lists is able to delete items more quickly than one with singly linked lists. Can anybody tell me what is the advantage of using doubly linked lists instead of single linked list for deletion in Hashtable implementation?
Hashtable with doubly linked lists
algorithmhashtable
Related Solutions
There are several differences between HashMap
and Hashtable
in Java:
Hashtable
is synchronized, whereasHashMap
is not. This makesHashMap
better for non-threaded applications, as unsynchronized Objects typically perform better than synchronized ones.Hashtable
does not allownull
keys or values.HashMap
allows onenull
key and any number ofnull
values.One of HashMap's subclasses is
LinkedHashMap
, so in the event that you'd want predictable iteration order (which is insertion order by default), you could easily swap out theHashMap
for aLinkedHashMap
. This wouldn't be as easy if you were usingHashtable
.
Since synchronization is not an issue for you, I'd recommend HashMap
. If synchronization becomes an issue, you may also look at ConcurrentHashMap
.
The common approach to get a unique collection of items is to use a set
. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in set()
function. If you later need a real list again, you can similarly pass the set to the list()
function.
The following example should cover whatever you are trying to do:
>>> t = [1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> t
[1, 2, 3, 1, 2, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]
As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.
Maintaining order
If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on OrderedDict
to keep the order of keys during insertion:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]
Starting with Python 3.7, the built-in dictionary is guaranteed to maintain the insertion order as well, so you can also use that directly if you are on Python 3.7 or later (or CPython 3.6):
>>> list(dict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]
Note that this may have some overhead of creating a dictionary first, and then creating a list from it. If you don’t actually need to preserve the order, you’re often better off using a set, especially because it gives you a lot more operations to work with. Check out this question for more details and alternative ways to preserve the order when removing duplicates.
Finally note that both the set
as well as the OrderedDict
/dict
solutions require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.
Best Solution
The confusion here is due to the notation in CLRS. To be consistent with the true question, I use the CLRS notation in this answer.
We use the hash table to store key-value pairs. The value portion is not mentioned in the CLRS pseudocode, while the key portion is defined as
k
.In my copy of CLR (I am working off of the first edition here), the routines listed for hashes with chaining are insert, search, and delete (with more verbose names in the book). The insert and delete routines take argument
x
, which is the linked list element associated with keykey[x]
. The search routine takes argumentk
, which is the key portion of a key-value pair. I believe the confusion is that you have interpreted the delete routine as taking a key, rather than a linked list element.Since
x
is a linked list element, having it alone is sufficient to do an O(1) deletion from the linked list in theh(key[x])
slot of the hash table, if it is a doubly-linked list. If, however, it is a singly-linked list, havingx
is not sufficient. In that case, you need to start at the head of the linked list in sloth(key[x])
of the table and traverse the list until you finally hitx
to get its predecessor. Only when you have the predecessor ofx
can the deletion be done, which is why the book states the singly-linked case leads to the same running times for search and delete.Additional Discussion
Although CLRS says that you can do the deletion in O(1) time, assuming a doubly-linked list, it also requires you have
x
when calling delete. The point is this: they defined the search routine to return an elementx
. That search is not constant time for an arbitrary keyk
. Once you getx
from the search routine, you avoid incurring the cost of another search in the call to delete when using doubly-linked lists.The pseudocode routines are lower level than you would use if presenting a hash table interface to a user. For instance, a delete routine that takes a key
k
as an argument is missing. If that delete is exposed to the user, you would probably just stick to singly-linked lists and have a special version of search to find thex
associated withk
and its predecessor element all at once.