Friendly Reminder: GetHashCode Isn’t an Identifier

tl;dr

Don’t use GetHashCode to identify an object. Microsoft has a big warning section in the GetHashCode docs about this.

The story

We store around 70 million unique objects in Azure Table Storage and we access them through an API that we built.

Our API takes in a list of object identifiers and returns those object from table storage if they exist.

In order to get the performance we wanted, we partition the data based off of the object’s identifier and use the identifier as the row key. Our table looks a bit like this

PartitionKey (generated) RowKey (object’s id) Data
0000000192 123456 I have data
0000000182 123457 me too!
0000000156 123458 yay! data

We we try to access the data, we query with the generated partition key and the object’s identifier as the row key.

To generate the partition, we have code that looks like this:

static int numberOfPartitions = 1000; 
public static string GeneratePartition(string id)
{
  // we can have 1000 partitions
  var partition = (Math.Abs(id.GetHashCode() % numberOfPartitions)).ToString();
  return partition.PadLeft(10, '0');
}

And this is where we ran into trouble.

This week I’m moving some of our services from one Azure subscription to another, so I’ve had to create new resources to create, update and access the data.

Everything was running fine. The logs looked good, other files generated in the process looked right, then I tried accessing the data through the API.

No Results.

I couldn’t get results even when running the API locally. I could see the data in Azure Storage Explorer. But the data didn’t match up.

After a couple of hours poking and prodding, I found the issue by picking a partition/row key pair that I knew existed. With that, I stepped through the code and lo and behold, my code was generating a completely different partition key.

The easy “fix” was to setup the app service just like the existing service. The difference between the 2 was that the existing app was 64 bit, the new one was 32 bit. Flipping that switch made it run properly.

While that got it running now, the better fix will be to update the code to not rely on GetHashCode.

This entry was posted in C#. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *