go语言ConcurrentMap的性能问题

Question

go语言ConcurrentMap的性能问题

发布于
2016-05-03

新手上路，请多包涵

我最近在开发一个go语言的tcp框架（https://github.com/leesper/tao），因为之前做过Java服务器开发，对java.util.concurrent库提供的并发数据结构印象深刻，于是在自己的项目中也想写一个类似的ConcurrentMap。思路类似于Java标准库ConcurrentHashMap，但没那么复杂，采用的仍然是“分段锁”，ConcurrentMap定义如下（https://github.com/leesper/tao/blob/master/concurrent.go）：

type ConcurrentMap struct {
  shards []*syncMap
}

而syncMap就是加锁的Map类型：

type syncMap struct {
  shard map[interface{}]interface{}
  sync.RWMutex
}

ConcurrentMap的Put(k, v)操作很简单，就是用k算一个hashCode，散列到某一个syncMap，然后在这个syncMap上做真正的put(k, v)操作：

func (cm *ConcurrentMap)Put(k, v interface{}) error {
  if isNil(k) {
    return ErrorNilKey
  }
  if isNil(v) {
    return ErrorNilValue
  }
  if shard, err := cm.shardFor(k); err != nil {
    return err
  } else {
    shard.put(k, v)
  }
  return nil
}

func (sm *syncMap) put(k, v interface{}) {
  sm.Lock()
  defer sm.Unlock()
  sm.shard[k] = v
}

func (cm *ConcurrentMap)shardFor(k interface{}) (*syncMap, error) {
  if code, err := hashCode(k); err != nil {
    return nil, err
  } else {
    return cm.shards[code & uint32(INITIAL_SHARD_SIZE - 1)], nil
  }
}

func hashCode(k interface{}) (uint32, error) {
  var code uint32
  var err error
  h.Reset()
    switch v := k.(type) {
    case bool:
        h.Write((*((*[1]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case int:
        h.Write((*((*[intSize]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case int8:
        h.Write((*((*[1]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case int16:
        h.Write((*((*[2]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case int32:
        h.Write((*((*[4]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case int64:
        h.Write((*((*[8]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case uint:
        h.Write((*((*[intSize]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case uint8:
        h.Write((*((*[1]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case uint16:
        h.Write((*((*[2]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case uint32:
        h.Write((*((*[4]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case uint64:
        h.Write((*((*[8]byte)(unsafe.Pointer(&v))))[:])
        code = h.Sum32()
    case string:
        h.Write([]byte(v))
        code = h.Sum32()
  case Hashable:
    c := v.HashCode()
    h.Write((*((*[4]byte)(unsafe.Pointer(&c))))[:])
        code = h.Sum32()
  default:
    err = ErrorNotHashable
    }
    return code, err
}

我跑了一下benchmark，与全局加锁的lockMap做对比，发现性能并不高，加了-benchmem参数后发现，性能开销的很大一部分原因是因为分配内存的操作过多，于是我优化了一下hashCode()函数（https://github.com/leesper/tao/blob/master/util.go），采取了以下三种措施：
1）不要每次都new一个bytes.Buffer，减少了约60w次内存分配操作；
2）不要每次都fnv.New32a()，减少了60w次内存分配操作；
3）不使用byte.Buffer和encoding/binary库，直接使用unsafe.Pointer操作内存地址，又减少了60w次内存分配操作；
但是性能跑分仍然不佳，还不如人家全部加锁的lockMap呢，性能测试代码在https://github.com/leesper/tao/blob/master/benchmark_test.go
git上已经有人将Java的ConcurrentHashMap移植到了go中（https://github.com/fanliao/go-concurrentMap），我这部分的代码受到了https://github.com/DeanThompson/syncmap 的启发

我想请教的问题是，接下来我该怎么优化呢？因为我刚开始学习Go语言，对这块还没有思路，请赐教，谢谢。

附：性能测试跑分（4核多处理器）
bench