Friday, 26 February 2010

Quindecillion

Am currently using a SHA-1 hash generating function to generate unique indexes over large text columns.
The NVARCHAR(1024) columns I am encoding could cost as much as 2048 bytes each to store before compression (as they’re UNICODE).
The idea is to reduce my storage and to be able to index large text columns using HASHBYTES. The function allows me to use VARBINARY(20) to hold the Hash calculation. 20 bytes being a lot more friendly than 2048.

Links :
1) SHA Hash functions
2) MSSQLTips.com : Unique constraints for large text columns (using hashbytes)

To quote that article,

The odds of a duplicate hash value being generated are 1 in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00
1 in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 is ‘one in a quindecillion’.

Here’s how I found out this amazing fact...
Really Big Numbers

(you can all go back to sleep now...)

No comments: