Toblog

1. v. The act of writing a weblog or 2. n. Toby’s weblog.

Pure ruby version of MurmurHash 2.0

I needed a 32bit hash generation function and there appeared to be no such obvious hash in ruby. With a bit of help I found murmurhash 2.0 which appeared to fit the job. Below is the code for a pure ruby version of the endian-neutral version:

module Digest
  def self.murmur_hash2( string, seed )
    # seed _must_ be an integer, but I do try to enforce that.
    # m and r are mixing constants generated offline.
    # They are not really magic, they just happen to work well.

    raise "seed isn't an integer, and I can't convert it either." unless 
      seed.is_a?( Integer ) or seed.respond_to?( 'to_i' )

    seed = seed.to_i unless seed.is_a?( Integer )

    m = 0x5bd1e995
    r = 24
    len = string.length

    h = ( seed ^ len )

    while len >= 4
      string.scan( /..../ ) do |data|
        k = data[0]
        k |= data[1] << 8
        k |= data[2] << 16
        k |= data[3] << 24

        k = ( k * m ) % 0x100000000
        k ^= k >> r
        k = ( k * m ) % 0x100000000

        h = ( h * m ) % 0x100000000
        h ^= k

        len -= 4
      end
    end

    if len == 3 then
      h ^= string[-1] << 16
      h ^= string[-2] << 8
      h ^= string[-3]
    end
    if len == 2 then
      h ^= string[-1] << 8
      h ^= string[-2]
    end
    if len == 1 then
      h ^= string[-1]
    end

    h = ( h * m ) % 0x100000000
    h ^= h >> 13
    h = ( h * m ) % 0x100000000
    h ^= h >> 15

    return h
  end
end

To use it copy the above into a separate .rb file (say murmurhash2.rb) and:

require 'murmerhash2.rb'

string = "the string to be hashed"
seed = an integer
result = Digest::murmur_hash2( string, seed )

I note that MurmurHash 3.0 is currently in beta so I shall have a go at coding that up once it becomes stable. I also would quite like to get this into a gem but before I do if anyone has any comments on the above code please let me know!

Published on 2010/12/14 at 18:18 by Toby, tags , , , ,

ruby net-sftp uninitialized constant

I’m currently writing some code that fetches a file from an sftp server. Using ruby 1.8.7 / net-sftp 2.0.4 the following code triggers an exception:

Net::SFTP.start( sftpHost, sftpUser, :password => sftpPass ) do |sftp|
  $stderr.puts "Downloading: #{sftpPath}#{sftpFile}"
  xml = sftp.download!( "#{sftpPath}#{sftpFile}" )
end

The exception is:

/usr/lib64/ruby/gems/1.8/gems/net-sftp-2.0.4/lib/net/sftp/session.rb:123:in `download!': uninitialized constant Net::SFTP::Session::StringIO (NameError)

The solution is to add:

require 'stringio'

to the top of the session.rb file mentioned in the exception message.

Update: 2010-08-19 16:00 I’ve just heard back from the developer having emailed him this blog entry. He is hopefully pushing a fix this evening. Now that’s service!

Update: 2010-08-20 Last night version 2.0.5 was released which has the fix in it.

Published on 2010/08/19 at 14:25 by Toby, tags , , , ,

Ruby BigDecimal Performance

I have a query against an Oracle database that returns a decimal value for a price. Ruby’s Oracle connector returns that as a BigDecimal object.

Once I have got all the items I have to aggregate them. As part of the aggregation I run this code:

@hash[id].price = ( ( @hash[id].price * @hash[id].fill_volume ) +
                    ( item.price * item.fill_volume ) ) /
                  ( @hash[id].fill_volume + item.fill_volume )


Each time it runs this code it gets slower. After about 250 loops of this code it gets really slow. Here’s the time it takes to do the 269th loop (the times are UNIX time with microseconds):

269
1281710074.568992
1281710075.574573
270

…just over a second! This seems to happen more if the id is the same over many loops.

A very simple solution is to make the price be a Float rather than a BigDecimal. When I do that, I get the following times:

269
1281710378.886902
1281710378.886920
270

Significantly faster (and given I have 10000s of lines to parse this is a big thing) worth the potential loss of accuracy. What I’d like to know is why BigDecimal behaves like this (at least in ruby 1.8.6). Any ideas?

UPDATE 20101221: Matt Patterson has had a look into it and it looks like BigDecimal is O(n2) (or worse) as the size of the BigDecimal gets larger. It looks like I’m going to have to have a look at BigDecimal#limit and BigDecimal#round.

Published on 2010/08/13 at 15:31 by Toby, tags , , ,

The return of the blogroll

I’ve bought the blogroll back since it disappeared during the upgrade. It’s now generated from my google reader opml feed which I export and run though some ruby which means it’s going to be up-to-date more frequently.

Here is the code that takes the opml and outputs the html fragment:

#!/usr/bin/ruby

require 'rexml/document'
require 'pp.rb'

class FeedData
  attr_accessor :name, :url, :category

  def initialize( name, url, category )
    ( @name, @url, @category ) = name, url, category
  end
end

class FeedDataList
  def initialize()
    @array = Array::new
  end

  def add( name, url, category )
    tmpFeed = FeedData::new( name, url, category )
    @array.push( tmpFeed )
  end

  def eachCategory
    catArray = Array::new
    @array.each do |feed|
      if !catArray.member?( feed.category ) then
        catArray.push( feed.category )
      end
    end

    catArray.sort.each do |result|
      yield result
    end
  end

  def eachItemInCategory( category )
    @array.each do |feed|
      if feed.category == category then
        yield feed
      end
    end
  end
end

def parse_opml( opml_node, feeds, parents_names=[] )
  opml_node.elements.each('outline') do |element|
    if element.elements.size != 0 then
      feeds = parse_opml( element, feeds, parents_names + [ element.attributes[ 'text' ] ] )
    end
    if element.attributes['xmlUrl'] then
      feeds.add( element.attributes['title'], element.attributes['htmlUrl'], parents_names.last )
    end
  end

  return feeds
end

opml = REXML::Document.new( STDIN )
feeds = FeedDataList::new
feeds = parse_opml( opml.elements['opml/body'], feeds )

feeds.eachCategory do |category|
  print "<br /><strong>#{category}</strong> "
  feeds.eachItemInCategory( category ) do |item|
    print "<a href=\"#{item.url}\">#{item.name}</a> "
  end
  print "\n"
end

To make it work:

$ ./opml2html.rb < <opml file>

I got the inspiration for the recursive opml parsing code from the Dekstop blog, so thanks are due to them!

Published on 2008/08/13 at 16:03 by Toby, tags , , , , ,

Twitter to facebook status ruby script

UPDATE 2007-09-30: You can now make the twitter facebook application update your status, so this is now redundant!

UPDATE 2007-09-09 (Cribbed from Christian’s site as I couldn’t put it any better): It is with great disappointment that I must make this announcement. Facebook has requested that I remove the code from my website. They have also contacted everyone else who has found my code and publicly mentioned that they are using it.

I am saddened at this turn of events because the idea behind the code was to extend Facebook’s current service and fill in the gap that their API had. The API still does not provide a means for updating ones status.

I’ve been wanting to update my facebook status with my twitterings for some time now. Unfortunately facebook do not provide an API for setting the status so I have been thwarted. Until now. Yesterday I accidentally found Christian Flickinger’s blog entry where he has found a way to update facebook status using php. At the end of his blog he writes:

Anyone with some experience could easily use the above code to check Twitter and (if updated) push to Facebook. Happy mashing!

I know how the twitter api works and I’m a big fan of ruby so I thought I would pick up the gauntlet and hack up a quick ruby script to do just that.

You’ll need to install the json and curb gems for it to work. Then copy and paste the below code into a file called fbTwit.rb in a suitable place on your unix box (it writes state down so you may want it in it’s own directory), edit the variables so that they are correct for your accounts, run it by hand to check it works for you and then set it to run every (say) 10 minutes via cron with something like the following in your crontab:

00,10,20,30,40,50 * * * * cd [directory]; ./fbTwit.rb 2>&1 > /dev/null

  1. Code removed at the request of facebook.

The quality of the code probably isn’t that great: it only took a short time to write, but it works (for me).

[2007-08-01] Yesterday this stopped working, along with the SSL login for m.facebook.com. I have changed it to use the http login which still works, although whether you want your password sent in the clear is up to you.

Published on 2007/07/13 at 11:50 by Toby, tags , , , ,

TCSOTD 2007-01-05

Iris use dropped in ID Card plans

Fusion reactors in 10 years claim

Internet Explorer unsafe for 284 days in 2006

Part of Russian rocket lands in Wyoming

The problem with OpenXML
… ‘not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.’

15 tips to choose a good text type

Quit [Smoking] Counter

Humble Little Ruby Book

Riverbend’s take on Saddam’s hanging / lynching

Lord Levy in abuse of tax payer’s money allegation

New Scientist compiles a list of fun materials

Powered by Publify – Thème Frédéric de Villamil | Photo Glenn