I needed a 32bit hash generation function and there appeared to be no such obvious hash in ruby. With a bit of help I found murmurhash 2.0 which appeared to fit the job. Below is the code for a pure ruby version of the endian-neutral version:
module Digest
def self.murmur_hash2( string, seed )
# seed _must_ be an integer, but I do try to enforce that.
# m and r are mixing constants generated offline.
# They are not really magic, they just happen to work well.
raise "seed isn't an integer, and I can't convert it either." unless
seed.is_a?( Integer ) or seed.respond_to?( 'to_i' )
seed = seed.to_i unless seed.is_a?( Integer )
m = 0x5bd1e995
r = 24
len = string.length
h = ( seed ^ len )
while len >= 4
string.scan( /..../ ) do |data|
k = data[0]
k |= data[1] << 8
k |= data[2] << 16
k |= data[3] << 24
k = ( k * m ) % 0x100000000
k ^= k >> r
k = ( k * m ) % 0x100000000
h = ( h * m ) % 0x100000000
h ^= k
len -= 4
end
end
if len == 3 then
h ^= string[-1] << 16
h ^= string[-2] << 8
h ^= string[-3]
end
if len == 2 then
h ^= string[-1] << 8
h ^= string[-2]
end
if len == 1 then
h ^= string[-1]
end
h = ( h * m ) % 0x100000000
h ^= h >> 13
h = ( h * m ) % 0x100000000
h ^= h >> 15
return h
end
end
To use it copy the above into a separate .rb file (say murmurhash2.rb) and:
require 'murmerhash2.rb'
string = "the string to be hashed"
seed = an integer
result = Digest::murmur_hash2( string, seed )
I note that MurmurHash 3.0 is currently in beta so I shall have a go at coding that up once it becomes stable. I also would quite like to get this into a gem but before I do if anyone has any comments on the above code please let me know!