A few months ago, while working on dissident, I started looking around for guidance on how I should manage encryption keys. I found a few references here and there, but the best I happened to put together with the limited information available was something like:
- Call mlock(2) (or VirtualLock) on any sensitive resources.
- Overwrite the resources when finished with them.
- Allow them to run out of scope.
- Hope for the garbage-collector to clean them up, or force it with
A short time later, I realised that beef was kicking off. As it turned out, the aforementioned approach was fundamentally flawed as one thing hadn’t been accounted for: the garbage-collector. It goes around doing whatever it feels like doing; making a copy here; moving something around there; it’s a real pain from a security standpoint.
I really didn’t have a choice at that point: I had to dedicate time to the project, for the ten days it took to develop and release the fix.
A few people had mentioned wrapping libsodium, but I wanted a pure-go solution, so that wasn’t ideal. Instead myself and dotcppfile began analysing how libsodium actually worked. He began auditing that while I researched some protection strategies and implemented APIs for the relevant system calls.
Within a few days, we had a pretty solid understanding of libsodium and we were ready with a new and improved plan. I think the best way to explain it is to introduce you to the end product: memguard.
Alright, now say you need to generate an encryption key and store it securely. What’s the process?
Well first we need some memory from the OS, so we need to determine the number of pages that we have to allocate. In this case the length of the buffer is 32 bytes and we can assume the system page-size to be 4096 bytes. The data is stored between two guard pages and is prepended with a random canary of length 32 bytes (more on these later). So, since the data and the canary together will comfortably fit into a single page, we need to allocate just three pages.
But we can’t ask the Go runtime for the memory—since then it is free to mess around with it—so how do we do it? Well, there are a few ways to accomplish this, but we decided to go with Joseph Richey’s suggestion of using mmap(2) (or VirtualAlloc on Windows), since the system-call is natively implemented, and that allowed us to avoid a dirty cgo solution.
But actually, of course only the Unix system-calls were natively-implemented, the Windows ones were not. Luckily, there was this library by Alex Brainman that we could vendor instead, and it proved invaluable. (I did later add the missing system-calls to the standard library to remove the dependency.)
Now that our pages are allocated, we should configure the guard pages. We tell the kernel to disallow all reads and writes to the first and last pages, so if anything does try to do so, a SIGSEGV access violation is thrown and the process panics. This way buffer overflows can be detected immediately, and it becomes almost impossible for other processes to locate and access the data.
The remaining page, the one sandwiched between the guard pages, needs to be protected too. You see, as system memory runs out, the kernel copies over the memory of inactive processes to the disk, and that is something we would like to avoid. So, we tell the kernel to leave this middle page alone.
The last thing is the canary: a random value placed just before the data. If it ever changes, we know that something went wrong—probably a buffer underflow. When the program first ran, we generated a global value for the canary, so we just set the canary bytes to that, and the container is pretty much ready for use.
We can also instruct the kernel to disallow writes to the data pages so that any attempts to modify the contents of the container will trigger a SIGSEGV access violation and the process will panic.
Note that while we can try to do the best we can we will only ever be lowering the likelihood of sensitive data being exposed, not eliminating the possibility altogether.