I can see the lure on top of an hypervisor. You already have an actual kernel running which you can use it to administer the machine. Why pay the performance price of a full kernel in your vm if you are only going to run one application ?
On bare metal however, you will probably need to bundle a ton of things if you don't want to end up with an unmanageable device. That leaves the performance gain of compiling everything together or actually disposable devices (maybe for embedded?) as potential use cases I can envision. Is there more ?
I'm not the author, but running on bare metal is more fun than running in a hypervisor. And removing layers from your stack lets you see what the cost and value of the layers are. I've only done bootstrapping on x86 and 8-bit processors (which is just set your reset vector), but x86 has a lot of isoterric setup stuff, and learning about the magic is interesting, if not totally useful.
Also, some runtime environments have a lot of manageability within; probably less than an OS, but maybe enough. Not sure about Go.
Using virtualization provides a simple common layer, so host operating system can deal with drivers and guest operating system can deal with more interesting things.
Things like kernel malloc to setup pages and what not, mapping the device address, allocating interrupts, DMA (maybe), PCI, logging, maybe locking, basic c library stuff (memcpy etc). Some of those, you can probably just stub out, but some if it you have to do (badly or not). Some of that you probably need or want to build anyway.
It's a fair bit of work, but if you want to support a lot of hardware, it's probably less work than porting a bunch of drivers.
Depending on the unikernel design though, you can significantly reduce user/kernel context switches beyond what you can with a tuned general purpose OS, potentially to the point of always sitting in ring 0. How much difference that makes, of course depends on your application and the quality of your various kernels.
https://en.wikipedia.org/wiki/IBM_CP-40
I wonder how long it'll take the rest of the world to achieve that.
As an example if you have image inferencing project you might have a compute stick co-processor plugged in one slot and a camera in another. You might have a webapp as one application and ffmpeg in another. No reason not to isolate them as individual unikernels. Also, there's the case that one software team might have written one of those apps and a different team wrote the other.