在 Go 语言中优化调用 Windows DLL 函数

Go 团队编写了golang.org/x/sys/windows包来调用 Windows DLL 中的函数。他们的方式效率低下，本文描述了一种更好的方式。

系统/Windows 方式

要调用 DLL 中的函数，比如kernel32.dll，必须：

使用LoadLibrary将 dll 加载到内存中。
获取 dll 中函数的地址。
在该地址调用函数。

以下是使用 sys/windows 库时的样子：

var (
    libole32 *windows.LazyDLL

    coCreateInstance *windows.LazyProc
)

func init() {
    libole32 = windows.NewLazySystemDLL("ole32.dll")
    coCreateInstance = libole32.NewProc("CoCreateInstance")
}

func CoCreateInstance(rclsid *GUID, pUnkOuter *IUnknown, dwClsContext uint32, riid *GUID, ppv *unsafe.Pointer) HRESULT {
    ret, _, _ := syscall.SyscallN(coCreateInstance.Addr(), 5,
        uintptr(unsafe.Pointer(rclsid)),
        uintptr(unsafe.Pointer(pUnkOuter)),
        uintptr(dwClsContext),
        uintptr(unsafe.Pointer(riid)),
        uintptr(unsafe.Pointer(ppv)),
        0,
    )
    return HRESULT(ret)
}

问题

问题在于这种方式内存效率低下。对于每个函数，我们只需要：

函数名称以获取其在 dll 中的地址，这是一个字符串，所以是 8 字节（字符串地址）+ 8 字节（字符串大小）+ 字符串内容。
函数的地址，在 64 位 CPU 上是 8 字节。

不幸的是，在 sys/windows 中，每个函数都需要：

type LazyProc struct {
    Name string

    mu   sync.Mutex
    l    *LazyDLL
    proc *Proc
}

type Proc struct {
    Dll  *DLL
    Name string
    addr uintptr
}

// sync.Mutex
type Mutex struct {
    _ noCopy

    mu isync.Mutex
}

// isync.Mutex
type Mutex struct {
    state int32
    sema  uint32
}

估算这些结构的大小：

LazyProc：16 + sizeof(Mutex) + 8 + 8 = 32 + sizeof(Mutex)
Proc：8 + 16 + 8 = 32
Mutex：8

总计：32 + 32 + 8 = 72，这还不包括可能的内存填充。Windows 有很多函数，所以这会累积起来。此外，在启动时，即使程序未使用这些函数，我们也会为每个函数调用NewProc，这会增加启动时间。

更好的方式

我们最终需要的是函数地址的uintptr，它将被延迟查找。假设我们使用ole32.dll中的 8 个函数，我们可以使用单个uintptr值数组来存储函数指针：

var oleFuncPtrs = [8]uintptr
var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject",... }

const kCoCreateInstance = 0
const kCoGetClassObject = 1
// etc.

const kFuncMissing = 1

func funcAddrInDLL(dll *windows.LazyDLL, funcPtrs []uintptr, funcIdx int, funcNames []string) uintptr {
  addr := funcPtrs[funcIdx];
  if addr == kFuncMissing {
    // 我们已经尝试查找但未找到
    // 这可能是因为较旧版本的 Windows 可能未实现此函数
    return 0
  }
  if addr!= 0 {
    return addr
  }
  // 通过名称在 dll 中查找函数
  name := funcNames[funcIdx]
  ///...
  return addr
}

在实际应用中，这需要使用互斥锁等进行多线程保护。

节省字符串

以下方式效率不高：

var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject",... }

除了字符串的文本，Go 还需要 16 字节：8 字节用于字符串指针，8 字节用于字符串大小。

我们可以更高效地将所有名称存储为单个字符串：

var oleFuncNames `
CoCreateInstance
CoGetClassObject
`

只有在通过名称查找函数时，我们才需要构造oleFuncNames的临时字符串切片。我们需要知道oleFuncNames中的偏移量和大小，我们可以巧妙地将其编码为一个数字：

// 自动生成的 shell 过程标识符：缓存索引 | 字符串开始 | 字符串结束。
const (
    _PROC_SHCreateItemFromIDList            _PROC_SHELL = 0 | (9 << 16) | (31 << 32)
    _PROC_SHCreateItemFromParsingName       _PROC_SHELL = 1 | (32 << 16) | (59 << 32)
  //...
)

我们将信息打包到一个数字中：

位 0-15：函数在函数指针数组中的索引。
位 16-31：多名称字符串中函数名称的开始。
位 32-47：多名称字符串中函数名称的结束。

这种技术需要代码生成，手动编写这些数字太困难了。