Toblog

1. v. The act of writing a weblog or 2. n. Toby’s weblog.

Pure ruby version of MurmurHash 2.0

I needed a 32bit hash generation function and there appeared to be no such obvious hash in ruby. With a bit of help I found murmurhash 2.0 which appeared to fit the job. Below is the code for a pure ruby version of the endian-neutral version:

module Digest
  def self.murmur_hash2( string, seed )
    # seed _must_ be an integer, but I do try to enforce that.
    # m and r are mixing constants generated offline.
    # They are not really magic, they just happen to work well.

    raise "seed isn't an integer, and I can't convert it either." unless 
      seed.is_a?( Integer ) or seed.respond_to?( 'to_i' )

    seed = seed.to_i unless seed.is_a?( Integer )

    m = 0x5bd1e995
    r = 24
    len = string.length

    h = ( seed ^ len )

    while len >= 4
      string.scan( /..../ ) do |data|
        k = data[0]
        k |= data[1] << 8
        k |= data[2] << 16
        k |= data[3] << 24

        k = ( k * m ) % 0x100000000
        k ^= k >> r
        k = ( k * m ) % 0x100000000

        h = ( h * m ) % 0x100000000
        h ^= k

        len -= 4
      end
    end

    if len == 3 then
      h ^= string[-1] << 16
      h ^= string[-2] << 8
      h ^= string[-3]
    end
    if len == 2 then
      h ^= string[-1] << 8
      h ^= string[-2]
    end
    if len == 1 then
      h ^= string[-1]
    end

    h = ( h * m ) % 0x100000000
    h ^= h >> 13
    h = ( h * m ) % 0x100000000
    h ^= h >> 15

    return h
  end
end

To use it copy the above into a separate .rb file (say murmurhash2.rb) and:

require 'murmerhash2.rb'

string = "the string to be hashed"
seed = an integer
result = Digest::murmur_hash2( string, seed )

I note that MurmurHash 3.0 is currently in beta so I shall have a go at coding that up once it becomes stable. I also would quite like to get this into a gem but before I do if anyone has any comments on the above code please let me know!

Published on 2010/12/14 at 18:18 by Toby, tags , , , ,

If you liked this article you can add me to Twitter

comment Pure ruby version of MurmurHash 2.0

Powered by Publify – Thème Frédéric de Villamil | Photo Glenn