MSc Information Systems Management: Random vs Indexed

As databases increased in size and became more complex new techniques were developed to cater for this change. Two of those techniques are indexed and hashed file structure.

“An index for a file contains a list of the keys stored in the file along with entries indicating where the record containing each key is stored.” (Brookshear, 2006). Searching for a record requires searching the index file, which is usually located in the same storage media as the original record located, after moving it to the main memory. Searching the index file instead of searching the database for the record itself is faster because the index file is smaller in size than the entire database.

On the other hand, “hashing allows a record to be located by means of a key value. But, rather than looking up the key in an index, hashing identifies the location of the record directly from the key” (Brookshear, 2006). In hashing the data is stored in “buckets”, each bucket can hold more than one record, placing data into buckets are done using a hash function “Algorithm that coverts key values into bucket numbers” (Brookshear, 2006). Hashing can be applied to files in mass storage and main memory and the result will be hash file and hash table.

The index in indexed file structure can grow up in size and consumes disk space; this depends on the table size and “the size and number of columns in the index“ IBM as well each transaction on the table requires and update to the index file and this will lead into performance degradation.

On the other hand, hashed file structure although doesn’t consume as much space as he index file, but incase of insufficient buckets or multiple records occupy the same bucket, bucket collision will occur, which will lead to overflow that cannot be eliminated but can be reduced using “overflow buckets”.

Hashing is used is cache implementation, to increase data access speed when stored in slower media, another usage would be in data authentication using cryptographic hash function, hashed file structure “is especially useful when the number of values to be stored is large and the keys are some objects other than small integers that cannot be used directly as index like strings.” (Yi, Y. 2008)

Index is prefered when using hash fuction will cause lots of collions, and when frequency of incersion and deletion is low.

References:

Brookshear J. (2006), Computer Science An Overview, edition 9, Boston: Pearson Education Inc.

IBM, (2006), ‘Advantages and disadvantages of indexes’ DB2 Information Center Home [Online], Available from: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0005052.htm

(Accessed on 13 February 2011)

Russell, J. et al. (1999), ‘Selecting an Index Strategy’ Application Developer's Guide – Fundamentals [Online], Available from: http://www.mscd.edu/~ittsdba/oradoc817/appdev.817/a76939/adg06idx.htm (Accessed on 13 February 2011)

Wikipedia, (2011) ‘Hash Table’. [Online]. Avialable from:http://en.wikipedia.org/wiki/Hash_table (Accessed on 13 February 2011)

Yi, Y. (2008) ‘Hash Table – Problem context’. [Online], Available from:

http://parlab.eecs.berkeley.edu/wiki/_media/patterns/paraplop_g2_4.pdf (Accessed on 13 February 2011)

MSc Information Systems Management

Thursday, April 5, 2012

Random vs Indexed

No comments:

Post a Comment

Contact Form