GUID Vs Int data type as primary key

Recently one of my friend ask me when I should go for GUID and When I should go for Int as primary key in table. So decided to write a blog post for it. Here are advantages and disadvantage of the GUID and INT.

INT Data Type:

Advantages:

  1. Its required small space in terms of the storage it will only allocates 4 bytes to store data.
  2. Insert and update performance will be faster then the GUID. It will increase the performance of the application.
  3. Easy to index and Join will give best performance with the integer.
  4. Easy to understand and remember
  5. Support of function that will give last value generated like Scope_Indentity()

Disadvantages:

  1. If you are going to merge table frequently then there may be a chance to duplicated primary key.
  2. Limited range of uniqueness if you are going to store lots of data then it may be chance to run out of storage for INT data type.
  3. Hard to work with distributed tables.

GUID Data Type:

Advantages:

  1. It is unique for the current domains. For primary key is uniquely identifies the table.
  2. Less chances of for duplication.
  3. Suitable for inserting and updating large amount of data.
  4. Easy for merging data across servers.

Disadvantages:

  1. Bigger storage size (16bytes) will occupy more disk size then integer.
  2. Hard to remember and lower performance with Join then integer.
  3. Don’t have function to get last uniquely generated primary key.
  4. A GUID primary Key will added to all the other indexes on tables. So it will decrease the performance.

Conclusion:

From above the advantages and disadvantages we can conclude that if you are having very large amount of data in table then go for the GUID as primary key in database. Otherwise INT will give best performance. Hope this will help you. Please post your opinion as comment.

Shout it
kick it on DotNetKicks.com

3 comments:

  1. You missed an advantage of GUIDs which can partially nullify disadvantage #3:

    You can create a GUID in code and use it before you even hit the database. Then you don't need a function like Scope_Identity().

    ReplyDelete
  2. You can use BIGINT to decrease chances of duplication.

    ReplyDelete
  3. All relational database engines are optimized to work with indexes on columns whose data value increments after insertions, i.e., integer data types, - this is known as a natural progression.

    Relational database engines are NOT optimized to work with indexes on columns whose data value is randomized after insertions, i.e., GUIDs - this is known as an un-natural progression.

    If you are dealing with extremely small data sets, and you don't care if you are perceived as not knowning how to choose a suitable data type, then use GUID.

    But, if you are dealing with large data sets (particularly when getting into GBs and TBs) you will never be able to scale the database if you are use GUIDs as a data type for index columns or primary key columns.

    As regards the listed advantages of GUIDs vs integers, well, not a single one of them is valid. Whenever loading a record sourced from another database always deprecated the primary key and generate a new integer primary key value for that record. As regards duplication, if you have viable algorithms in place there is no chance of duplication (this is just an common excuse for bad programming). As regards loading large data set, GUID offer no benefit, but in fact will seriously fragment the data files. Integer data types are perfectly suited for loading small or extremely large data set, that is when you know how to write decent code.

    ReplyDelete

Your feedback are very important to me. Please provide your feedback via putting comments.