27 June 2005

Hashing strings with Python

A fast hash function for good distribution of strings is "hashpjw". Written by P.J. Weinberg, and published in "Compilers: Principles, Techn iques, and Tools" (Addison-Wesley 1986 by Alfred Aho, Ravi Sethi, and Jeff Ullman). I've used hashpjw for some purpose in nearly every compiler or interpreter I've written (and that's quite a few). I've brewed up a lot of different symbol table packages for languages like Pascal, C, Java, and some of my own invention and at base I've always come back to this little method.

Here's an implementation in Python. Assign a value to tablesize to represent the number of buckets you want to create.

    def hash_pjw (self, aString, tablesize):
unsignedIntMask = 0xffffffffL
chr_index = 0
str_len = len(aString)
g = 0L
h = 0L
while (chr_index < h =" ((h" chr_index =" chr_index" g =" h" h =" (h">> 24)) ^ g
return (h % tablesize)

No comments: