As databases increased in size and became more
complex new techniques were developed to cater for this change. Two of those
techniques are indexed and hashed file structure.
“An index for a file contains a list of the keys
stored in the file along with entries indicating where the record containing
each key is stored.” (Brookshear, 2006). Searching for a record requires
searching the index file, which is usually located in the same storage media as
the original record located, after moving it to the main memory. Searching the
index file instead of searching the database for the record itself is faster
because the index file is smaller in size than the entire database.
On the other hand, “hashing allows a record to
be located by means of a key value. But, rather than looking up the key in an
index, hashing identifies the location of the record directly from the key”
(Brookshear, 2006). In hashing the data is stored in “buckets”, each bucket can
hold more than one record, placing data into buckets are done using a hash
function “Algorithm that coverts key values into bucket numbers” (Brookshear,
2006). Hashing can be applied to files in mass storage and main memory and the
result will be hash file and hash table.
The index in indexed file structure can grow up
in size and consumes disk space; this depends on the table size and “the size
and number of columns in the index“ IBM as well each transaction on the table
requires and update to the index file and this will lead into performance
degradation.
On the other hand, hashed file structure
although doesn’t consume as much space as he index file, but incase of
insufficient buckets or multiple records occupy the same bucket, bucket
collision will occur, which will lead to overflow that cannot be eliminated but
can be reduced using “overflow buckets”.
Hashing is used is cache implementation, to
increase data access speed when stored in slower media, another usage would be in
data authentication using cryptographic hash function, hashed file structure
“is especially useful when the number
of values to
be stored is
large and the
keys are some objects other than small integers that
cannot be used directly as index like strings.” (Yi, Y. 2008)
Index is prefered when using hash fuction will
cause lots of collions, and when frequency of incersion and deletion is low.
References:
Brookshear J. (2006), Computer Science An
Overview, edition 9, Boston: Pearson Education Inc.
IBM, (2006), ‘Advantages and disadvantages of
indexes’ DB2 Information Center Home [Online],
Available from: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0005052.htm
(Accessed on 13 February 2011)
Russell, J. et al. (1999), ‘Selecting an Index
Strategy’ Application Developer's Guide – Fundamentals [Online], Available from: http://www.mscd.edu/~ittsdba/oradoc817/appdev.817/a76939/adg06idx.htm
(Accessed
on 13 February 2011)
Wikipedia, (2011) ‘Hash Table’. [Online].
Avialable from:http://en.wikipedia.org/wiki/Hash_table
(Accessed
on 13 February 2011)
Yi, Y. (2008) ‘Hash Table – Problem context’.
[Online], Available from:
http://parlab.eecs.berkeley.edu/wiki/_media/patterns/paraplop_g2_4.pdf
(Accessed
on 13 February 2011)
No comments:
Post a Comment