Hashing Table and Binary Tree
Hashing Table
In computing, a hash table (hash
map) is a data structure that implements an associative
array abstract data type, a structure that can
map keys to values. A hash table uses a hash
function to compute an index, also called a hash code,
into an array of buckets or slots, from which the
desired value can be found.
Ideally, the hash function will assign each key to a
unique bucket, but most hash table designs employ an imperfect hash function,
which might cause hash collisions where the hash function
generates the same index for more than one key. Such collisions are always
accommodated in some way.
Hashing is a
technique that is used to uniquely identify a specific object from a group of
similar objects. Some examples of how hashing is used in our lives include:
- In
universities, each student is assigned a unique roll number that can be
used to retrieve information about them.
- In
libraries, each book is assigned a unique number that can be used to
determine information about the book, such as its exact position in the
library or the users it has been issued to etc.
In both
these examples the students and books were hashed to a unique number.
Assume that
you have an object and you want to assign a key to it to make searching easy.
To store the key/value pair, you can use a simple array like a data structure
where keys (integers) can be used directly as an index to store values.
However, in cases where the keys are large and cannot be used directly as an
index, you should use hashing.
In hashing,
large keys are converted into small keys by using hash functions.
The values are then stored in a data structure called hash table.
The idea of hashing is to distribute entries (key/value pairs) uniformly across
an array. Each element is assigned a key (converted key). By using that key you
can access the element in O(1) time. Using the key, the
algorithm (hash function) computes an index that suggests where an entry can be
found or inserted.
Hashing is
implemented in two steps:
1. An element is converted into an
integer by using a hash function. This element can be used as an index to store
the original element, which falls into the hash table.
2. The element is stored in the hash
table where it can be quickly retrieved using hashed key.
hash = hashfunc(key)
index = hash % array_size
In this
method, the hash is independent of the array size and it is then reduced to an
index (a number between 0 and array_size − 1) by using the modulo operator (%).
Hash function
A hash
function is any function that can be used to map a data set of an arbitrary
size to a data set of a fixed size, which falls into the hash table. The values
returned by a hash function are called hash values, hash codes, hash sums, or
simply hashes.
To achieve a
good hashing mechanism, It is important to have a good hash function with the
following basic requirements:
1. Easy to compute: It should be easy
to compute and must not become an algorithm in itself.
2. Uniform distribution: It should
provide a uniform distribution across the hash table and should not result in
clustering.
3. Less collisions: Collisions occur
when pairs of elements are mapped to the same hash value. These should be
avoided.
Note: Irrespective of how good a hash
function is, collisions are bound to occur. Therefore, to maintain the
performance of a hash table, it is important to manage collisions through various
collision resolution techniques.
Need for a good hash function
Let us
understand the need for a good hash function. Assume that you have to store
strings in the hash table by using the hashing technique {“abcdef”, “bcdefa”,
“cdefab” , “defabc” }.
To compute
the index for storing the strings, use a hash function that states the
following:
The index
for a specific string will be equal to the sum of the ASCII values of the
characters modulo 599.
As 599 is a
prime number, it will reduce the possibility of indexing different strings
(collisions). It is recommended that you use prime numbers in case of modulo.
The ASCII values of a, b, c, d, e, and f are 97, 98, 99, 100, 101, and 102
respectively. Since all the strings contain the same characters with different
permutations, the sum will 599.
The hash
function will compute the same index for all the strings and the strings will
be stored in the hash table in the following format. As the index of all the
strings is the same, you can create a list on that index and insert all the
strings in that list.
Hash table
A hash table is a data structure that
is used to store keys/value pairs. It uses a hash function to compute an index
into an array in which an element will be inserted or searched. By using a good
hash function, hashing can work well. Under reasonable assumptions, the average
time required to search for an element in a hash table is O(1).
What is
Collision?
Since a hash
function gets us a small number for a key which is a big integer or string,
there is a possibility that two keys result in the same value. The situation
where a newly inserted key maps to an already occupied slot in the hash table
is called collision and must be handled using some collision handling
technique.
What
are the chances of collisions with large table?
Collisions are
very likely even if we have big table to store keys. An important observation
is Birthday Paradox. With only 23 persons, the probability that two people
have the same birthday is 50%.
How to
handle Collisions?
There are mainly two methods to handle collision:
1) Separate Chaining
2) Open Addressing
In this article,
only separate chaining is discussed. We will be discussing Open addressing in
the next post.
Separate
Chaining:
The idea is to make
each cell of hash table point to a linked list of records that have same hash
function value.
Let us consider a
simple hash function as “key mod 7” and
sequence of keys as 50, 700, 76, 85, 92, 73, 101.
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to the chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly
used when it is unknown how many and how frequently keys may be inserted or
deleted.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using a
linked list. Open addressing provides better cache performance as everything is
stored in the same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in the worst
case.
4) Uses extra
space for links.
Performance
of Chaining:
Performance of
hashing can be evaluated under the assumption that each key is equally likely
to be hashed to any slot of table (simple uniform hashing).
Binary Tree
A tree whose elements have at most 2
children is called a binary tree. Since each element in a binary tree can have
only 2 children, we typically name them the left and right child.
A Binary Tree node contains following
parts.
1.
Data
2.
Pointer to left child
3.
Pointer to right child
Main
applications of trees include:
1. Manipulate hierarchical data.
2. Make information easy to search (see tree traversal).
3. Manipulate
sorted lists of data.
4. As
a workflow for compositing digital images for visual effects.
5. Router
algorithms
6. Form of a
multi-stage decision-making (see business chess).
Binary Tree: A tree whose elements have at most 2 children is called a binary
tree. Since each element in a binary tree can have only 2 children, we
typically name them the left and right child.
Tree is a
hierarchical data structure. Main uses of trees include maintaining
hierarchical data, providing moderate access and insert/delete operations.
Binary trees are special cases of tree where every node has at most two
children.
Types of Binary Tree
Full Binary Tree A Binary Tree is full if every node has 0 or 2 children. Following
are examples of a full binary tree. We can also say a full binary tree is a
binary tree in which all nodes except leaves have two children.
Complete Binary Tree: A Binary Tree is complete Binary Tree if all levels are completely
filled except possibly the last level and the last level has all keys as left
as possible
Perfect Binary Tree A Binary tree is Perfect Binary Tree in which
all internal nodes have two children and all leaves are at the same level.
Balanced Binary Tree
A binary tree is balanced if the
height of the tree is O(Log n) where n is the number of nodes. For Example, AVL
tree maintains O(Log n) height by making sure that the difference between
heights of left and right subtrees is atmost 1. Red-Black trees maintain O(Log
n) height by making sure that the number of Black nodes on every root to leaf
paths are same and there are no adjacent red nodes. Balanced Binary Search
trees are performance wise good as they provide O(log n) time for search,
insert and delete.
Expression Tree
Expression tree is a binary tree in
which each internal node corresponds to operator and each leaf node corresponds
to operand so for example expression tree for 3 + ((5+9)*2) would be:
Inorder traversal of expression tree
produces infix version of given postfix expression (same with preorder traversal
it gives prefix expression)
Construction of Expression Tree:
Now For constructing expression tree
we use a stack. We loop through input expression and do following for every
character.
1) If character is operand push that
into stack
2) If character is operator pop two
values from stack make them its child and push current node again.
At the end only element of stack will
be root of expression tree.
Threaded Binary Tree Concept
Inorder traversal
of a Binary tree can either be done using recursion or with the use
of a auxiliary stack. The idea of threaded binary trees is to make inorder
traversal faster and do it without stack and without recursion. A binary tree is
made threaded by making all right child pointers that would normally be NULL
point to the inorder successor of the node (if it exists).
There are two types of threaded binary trees.
Single Threaded: Where a NULL right pointers is made to point to the inorder successor (if
successor exists)
Double Threaded: Where both left and right NULL pointers are made to point to inorder
predecessor and inorder successor respectively. The predecessor threads are
useful for reverse inorder traversal and postorder traversal.
The threads are
also useful for fast accessing ancestors of a node.
Following diagram
shows an example Single Threaded Binary Tree. The dotted lines represent
threads.
Since right
pointer is used for two purposes, the boolean variable rightThread is used to
indicate whether right pointer points to right child or inorder successor.
Similarly, we can add leftThread for a double threaded binary tree.
Inorder
Taversal using Threads
Following is C
code for inorder traversal in a threaded binary tree.
Source :
https://www.hackerearth.com/practice/data-structures/hash-tables/basics-of-hash-tables/tutorial/
https://en.wikipedia.org/wiki/Hash_table
https://binusmaya.binus.ac.id
https://www.geeksforgeeks.org/binary-tree-data-structure/
https://www.geeksforgeeks.org/expression-tree/
https://www.geeksforgeeks.org/threaded-binary-tree/
Comments
Post a Comment