Thursday, November 02, 2006

Using MemoryStream and BinaryFormatter for reuseable GetHashCode and DeepCopy functions

Here's a couple of techniques I learnt a while back to do add two important capabilities to your objects; compute a hash code and execute a deep copy. I can't find the orginal source for the hash code example, but the deep copy comes from Rockford Lhotka's CSLA. Both examples are my implementation of the basic idea. Both techniques utilise the MemoryStream and BinaryFormatter by getting the object to serialize itself to a byte array. To compute the hash code I simply use SHA1CryptoServiceProvider to create a 20 byte hash of the serialized object and get then xor an integer value from that.

public override int public override int GetHashCode()
{
    byte[] thisSerialized;
    using(System.IO.MemoryStream stream = new System.IO.MemoryStream())
    {
        new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter().Serialize(stream, this);
        thisSerialized = stream.ToArray();
    }
    byte[] hash = new System.Security.Cryptography.SHA1CryptoServiceProvider().ComputeHash(thisSerialized);
    uint hashResult = 0;
    for(int i = 0; i < hash.Length; i++)
    {
        hashResult ^= (uint)(hash[i] << i % 4);
    }
    return (int)hashResult;
}

The most common use for a hash code is to make hash tables efficient and to implement Equals(). Note, there's a one in 4,294,967,295 chance that this will provide a false equals (thanks to Richard for pointing that out to me):

public override bool Equals(object obj)
{
    if(!(obj is MyClass)) return false;
    return this.GetHashCode() == obj.GetHashCode();
}

To do a deep copy I simply get the object to serialize itself and deserialize it as a new instance. Be carefull, this technique will serialize everything in this object's graph so make sure you're aware of what is referenced by it and that all the objects in the graph are marked as [Serializable], Here's a generic example that you can reuse in any object that needs deep copy:

public T DeepCopy<T>()
{
    T snapshot;
    using(System.IO.MemoryStream stream = new System.IO.MemoryStream())
    {
        System.Runtime.Serialization.Formatters.Binary.BinaryFormatter formatter = 
            new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
        formatter.Serialize(stream, this);
        stream.Position = 0;
        snapshot = (T)formatter.Deserialize(stream);
    }
    return snapshot;
}

2 comments:

Anonymous said...

I realize this is a rather old post but I stumbled on it while looking up the BinaryFormatter.

Implementing GetHashCode in this way is ridiculously inefficient (and hash codes are all about efficiency), and basing Equals exclusively on the hash code is a terrible idea and will likely fail in some horrible and unpredictable way in production code.

This advice is bad, and you should feel bad.

Anonymous said...

^ the above comment doesn't explain why or how to improve on your solution and is in fact an inefficient use of reading time. Looking up solutions via google and the web is all about efficiency, and your post helped shed light on Binary Formatting albeit not all I was looking for; the comment above however is a rediculously inefficient addition to the blog post and basing my time and results on it is a terrible idea and has already failed me into writing a response to help negate the comment to future readers, even if it means inefficiently spending their time to efficiently avoid Anonymous.