manosriram

Handling concurrency in KeyValue stores

Key value stores might seem simple from outer view. We set and get the values which I thought the same. However, things get tricky as we dive deep. The problem is that, what happens if few processes write a key at the same time, and few processes read the same key.

We can actually simulate this using goroutines in golang.

 1numGoRoutines := 1000
 2wg := sync.WaitGroup{}
 3wg.Add(numGoRoutines)
 4
 5for i := 0; i < numGoRoutines; i++ {
 6	kv := &nimbusdb.KeyValuePair{
 7		Key:   []byte(fmt.Sprintf("%d", i)),
 8		Value: []byte(fmt.Sprintf("testvalue%d", i)),
 9	}
10	go func() {
11		defer wg.Done()
12		writtenKvPair, err = d.Set(kv)
13	}()
14}
15wg.Wait()

The above code creates 1000 goroutines each setting a different value. But, the problem is that multiple go routines can touch the variable where the keyvalue pairs are stored. Why is that a problem? Because we want to be 100% sure that only one process is using the variable at a given moment, which you might have guessed that we’ll use locks.

1func (b *BTree) Set(key []byte, value KeyDirValue) *KeyDirValue {
2	b.mu.Lock()
3	defer b.mu.Unlock()
4	i := b.tree.ReplaceOrInsert(&item{key: key, v: value})
5	if i != nil {
6		return &i.(*item).v
7	}
8	return nil
9}

The b.mu is of type sync.RWMutex meaning it has both shared and exclusive locks available. In the Set function above we use .Lock() which means it is a exclusive lock. Any other goroutine cannot access the region until we call .Unlock()

1func (b *BTree) Get(key []byte) *KeyDirValue {
2	b.mu.RLock()
3	defer b.mu.RUnlock()
4	i := b.tree.Get(&item{key: key})
5	if i != nil {
6		return &i.(*item).v
7	}
8	return nil
9}

However, in the .Get() function, we use .RLock() since we allow multiple process reading a value.

This command go test ./tests -v --race can be run to check for race conditions.

The above code is from a key-value store which I’m writing from scratch. You can find it here

#database   #internals