在Firecracker上运行OSv

标签:ttl   修改   example   security   bre   cto   rust   including   could not   

 

Firecracker

Firecracker is a new light KVM-based hypervisor written in Rust and announced during last AWS re:Invent in 2018. But unlike QEMU, Firecracker is specialized to host Linux guests only and is able to boot micro VMs in ~ 125 ms. Firecracker itself can only run on Linux on bare-metal machines with Intel 64-bit CPUs or i3.metal or other Nitro-based EC2 instances.

Firecracker implements a device model with the following I/O devices:

  • paravirtual VirtIO block and network devices over MMIO transport
  • serial console
  • partial keyboard controller
  • PICs (Programmable Interrupt Controllers)
  • IOAPIC (Advanced Programmable Interrupt Controller)
  • PIT (Programmable Interval Timer)
  • KVM clock

Firecracker also exposes REST API over UNIX domain socket and can be confined to improve security through so called jailer. For more details look at the design doc and the specification.

If you want to hear more about what it took to enhance OSv to make it boot in 5 ms on Firecracker (total of 10 ms including the host side) which is ~20 times faster than Linux on the same hardware (5 years old MacBook Pro with Ubuntu 18.10), please read remaining part of this article. In the next paragraph I will describe the implementation strategy I arrived at. In the following three paragraphs I will focus on what I had to change in relevant areas - booting process, VirtIO and ACPI. Finally in the epilogue I will describe the outcome of this exercise and possible improvements we can make and benefit from in future.

If you want to try OSv on Firecracker before reading this article follow this wiki.

Implementation Strategy

OSv implements VirtIO drivers and is very well supported on QEMU/KVM. Given Firecracker is based on KVM and exposes VirtIO devices, at first it seemed OSv might boot and run on it out of the box with some small modifications. As first experiments and more research showed, the task in reality was not as trivial. The initial attempts to boot OSv on Firecracker caused KVM exit and OSv did not even print its first boot message.

For starters I had to identify which OSv artifact to use as an argument to Firecracker /boot-source API call. It could not be plain usr.img or its derivative used with QEMU as Firecracker expects 64-bit ELF (Executable and Linking Format) vmlinux kernel. The closest to it in OSv-land is loader.elf (enclosed inside of usr.img) - 64-bit ELF file with 32-bit entry point start32. Finally given it is not possible to connect to OSv running on Firecracker with gdb (like it is possible with QEMU), I could not use this technique to figure out where stuff breaks.

It became clear to me I should first focus on making OSv boot on Firecracker without block and network devices. Luckily OSv can be built with Ram-FS where application code is placed in bootfs part of loader.elf.

Then I should enhance VirtIO layer to make it support block and network devices with MMIO transport. Initially these changes seemed very reasonable to implement but they turned way more involved in the end.

Finally I had to tweak some parts of OSv to make it work without ACPI (Advanced Configuration and Power Interface) if unavailable.

Next three paragraphs describe each step of this plan in detail.

Booting

In order to make OSv boot on Firecracker, first I had to understand how current OSv booting process works.

Originally OSv had been designed to boot in 16-bit mode (aka real mode) when it expects hypervisor to load MBR (Master Boot Record), which is first 512 bytes of OSv image, at address 0x7c00 and execute it by jumping to that address. A this point OSv bootloader (code in these 512 bytes) loads command line found in next 63.5 KB of the image using interrupt 0x13. Then it loads remaining part of the image which is lzloader.elf (loader.elf + fastlz decompression logic) at address 0x100000 in 32KB chunks using the interrupt 0x13 and switching back and forth between real and protected mode. Next it reads the size of available RAM using the 0x15 interrupt and jumps to the code in the beginning of 1st MB that de-compresses lzloader.elf in 1MB chunks starting from the tail and going backwards. Eventually after loader.elf is placed in memory at the address 0x200000 (2nd MB), logic in boot16.S switches to protected mode and jumps to start32 to prepare to switch to long mode (64-bit). Please note that start32 is a 32-bit entry point of otherwise 64-bit loader.elf. For more details please read this Wiki.

Firecracker on other hand expects image to be a vmlinux 64-bit ELF file and loads its LOAD segments into RAM at addresses specified by ELF program headers. Firecracker also sets VM to long mode (aka 64-bit mode), state of relevant registers and paging tables to map virtual memory to physical one as expected by Linux. Finally it passes memory information and boot command line in the boot_params structure and jumps to vmlinux entry of startup_64 to let Linux kernel continue its booting process.

So the challenge is: how do we modify booting logic to support booting OSv as 64-bit vmlinux format ELF and at the same time retain ability to boot in real mode using traditional usr.img image file? For sure we need to replace current 32-bit entry point start32 of loader.elf with a 64-bit one - vmlinux_entry64 - that will be called by Firecracker (which will also load loader.elf in memory at 0x200000 as ELF header demands). At the same time we also need to change memory placement of start32 to be at some fixed offset so that boot16.S knows where to jump to.

So what exactly new vmlinux_entry64 should do? Firecracker sets up VMs to 64-bit state but OSv already provided 64-bit start64 function so one could ask - why not simply jump to it and be done with it?. Unfortunately this would not work (as I tested) because of slightly different memory paging tables and CPU setup between what Linux and OSv expects (and Firecracker sets up for Linux). So possibly vmlinux_entry64 needs to reset paging tables and CPU the OSv way? Alternatively vmlinux_entry64 could switch back to protected mode and jump to start32 and let it setup VM OSv way. I tried that as well but it did not work for some reason either.

Luckily we do not need to worry about the segmentation which is setup by Firecracker to flat memory model which is typical in long mode and what OSv expects.

At the end based on many trial-and-error attempts I came to conclusion that vmlinux_entry64 should do following:

  1. Extract command line and memory information from Linux boot_params structure whose address is passed in by Firecracker in RSI register and copy to another place structured same way as if OSv booted through boot16.S (please see extract_linux_boot_params for details).
  2. Reset CR0 and CR4 control registers to reset global CPU features OSv way.
  3. Reset CR3 register to point to OSv PML4 table mapping first 1GB of memory with 2BM medium size pages one-to-one (for more information about memory paging please read this article).
  4. Finally jump to start64 to complete boot process and start OSv.

The code below is slightly modified version of vmlinux_entry64 in vmlinux-boot64.S that implements the steps described above in GAS (GNU Assembler) language.

# Call extract_linux_boot_params with the address of # boot_params struct passed in RSI register to  # extract cmdline and memory information mov %rsi, %rdi call extract_linux_boot_params # Reset paging tables and other CPU settings the way  # OSv expects it mov $BOOT_CR4, %rax mov %rax, %cr4 lea ident_pt_l4, %rax mov %rax, %cr3 # Enable long mode by writing to EFER register by setting # LME (Long Mode Enable) and NXE (No-Execute Enable) bits mov $0xc0000080, %ecx mov $0x00000900, %eax xor %edx, %edx wrmsr mov $BOOT_CR0, %rax mov %rax, %cr0 # Continue 64-bit boot logic by jumping to start64 label mov $OSV_KERNEL_BASE, %rbp mov $0x1000, %rbx jmp start64

As you can see making OSv boot on Firecracker was the most tricky part of whole exercise.

Virtio

Unlike booting process, enhancing virtio layer in OSv was not as tricky and hard to debug, but it was the most labor intensive and required a lot of research that included reading the spec and Linux code for comparison.

Before diving in, let us first get a glimpse of VirtIO and its purpose. VirtIO specification defines standard virtual (sometimes called paravirtual) devices including network, block, scsi, etc ones. It effectively dictates how hypervisor (host) should expose those devices as well as how guest should detect, configure and interact with them in runtime in form of a driver. The objective is to define devices that can operate in most efficient way and minimize number of costly performance-wise exits from guest to host.

Firecracker implements virtio MMIO block and net devices. The MMIO (Memory-Mapped IO) is one of three VirtIO transport layers (MMIO, PCI, CCW) and was modeled after PCI and differs mainly in how MMIO devices are configured and initialized. Unfortunately to my despair OSv only implemented PCI transport and was missing mmio implementation. On top of that to make things worse it implemented the legacy (pre 1.0) version of virtio before it was finalized in 2016. So two things had to be done - refactor OSv virtio layer to support both legacy and modern PCI devices and implement virtio mmio.

In order to design and implement correct changes first I had to understand existing implementation of virtio layer. OSv has two orthogonal but related abstraction layers in this matter - driver and device classes. The virtio::virtio_driver serves as a base class with common driver logic and is extended by virtio::blk, virtio::net, virtio::scsi and virtio::rng classes to provide implementations for relevant type. For better illustration please look at this ascii art:


 hw_device <|---
               | 
       pci::function <|--- 
                         |
                  pci::device
                         ^                 |-- virtio::net
                  (uses) |                 |
                         |                 |-- virtio::blk
 hw_driver <|--- virtio::virtio_driver <|--|
                                           |-- virtio::scsi
                                           |
                                           |-- virtio::rng

As you can tell from the graphics above, virtio_driver interacts directly with pci::device so in order to add support of MMIO devices I had to refactor it to make it transport agnostic. From all the options I took into consideration, the least invasive and most flexible one involved creating new abstraction to model virtio device. To that end I ended up heavily refactoring virtio_driver class and defining following new virtual device classes:

  • virtio::virtio_device - abstract class to model interface of virtio device intended to be used by refactored virtio::virtio_driver
  • virtio::virtio_pci_device - base class extending virtio_device and implementing common virtio PCI logic that delegates to pci_device
  • virtio::virtio_legacy_pci_device - class extending virtio_pci_device and implementing legacy PCI device
  • virtio::virtio_modern_pci_device - class extending virtio_pci_device implementing modern PCI device; most differences between modern and legacy PCI devices lie in the initialization and configuration phase with special configuration register
  • virtio::mmio_device - class extending virtio_device and implementing mmio device

The method is_modern() declared in virtio_device class and overridden in its subclasses is used in few places in virtio_driver and its subclasses to mostly drive slightly different initialization logic of legacy and modern virtio devices.

For better illustration of the changes and relationship between new and old classes please see the ascii-art UML-like class diagram below:


               |-- pci::function <|--- pci::device
               |                              ^
               |               (delegates to) |
               |                              |        |-- virtio_legacy_pci_device
 hw_device <|--|             --- virtio_pci_device <|--|
               |             |                         |-- virtio_modern_pci_device
               |             _ 
               |             v
               |-- virtio::virtio_device <|--- virtio::mmio_device
                   ---------------------
                   | bool is_modern()  |
                   ---------------------
                             ^             |-- virtio::net
                      (uses) |             |
                             |             |-- virtio::blk
 hw_driver <|--- virtio::virtio_driver <|--|
                                           |-- virtio::scsi
                                           |
                                           |-- virtio::rng

To recap most of the coding went into major refactoring of virtio_driver class to make it transport agnostic and delegate to virtio_device, extracting out PCI logic from virtio_driver into virtio_pci_device and virtio_legacy_pci_device and finally implementing new virtio_modern_pci_device and virtio::mmio_device classes. Thanks to this approach changes to the subclasses of virtio_driver (virtio::netvirtio::block, etc) were pretty minimal and one of the critical classes - virtio::vring - stayed pretty much intact.

Big motivation for implementing modern virtio PCI device (as opposed to implementing legacy one only) was to have a way to exercise and test modern virtio device with QEMU. That way I could have extra confidence that most heavy refactoring in virtio_driver was correct even before testing it with Firecracker which exposes modern MMIO device. Also there is great chance it will make easier enhancing virtio layer to support new VirtIO 1.1 spec once finalized (for good overview see here).

Lastly given that MMIO devices cannot be detected in similar fashion as PCI ones and instead are passed by Firecracker as part of command line in format Linux kernel expects, I also had to enhance OSv command line parsing logic to extract relevant configuration bits. On top of that I added boot parameter to skip PCI enumeration and that way save extra 4-5 ms of boot time.

ACPI

The last and simplest part of the exercise was to fill in the gaps in OSv to make it deal with situation when ACPI is unavailable.

Firecracker does not implement ACPI which is used by OSv to implement power handling and to discover CPUs. Instead OSv had to be changed to boot without ACPI and read CPU info from MP table. For more information about MP table read here or there. All in all I had to enhance OSv in following ways:

  • modify ACPI related logic to detect if it is present
  • modify relevant places (CPU detection, power off) that rely on ACPI to continue and use alternative mechanism if ACPI not present instead of aborting
  • modify pvpanic probing logic to skip if ACPI not available

Epilogue

With all changes implemented as described above OSv can boot on Firecracker.

OSv v0.53.0-6-gc8395118
2019-04-17T22:28:29.467736397 [anonymous-instance:WARN:vmm/src/lib.rs:1080] Guest-boot-time =   9556 us 9 ms,  10161 CPU us 10 CPU ms
	disk read (real mode): 0.00ms, (+0.00ms)
	uncompress lzloader.elf: 0.00ms, (+0.00ms)
	TLS initialization: 1.13ms, (+1.13ms)
	.init functions: 2.08ms, (+0.94ms)
	SMP launched: 3.43ms, (+1.35ms)
	VFS initialized: 4.12ms, (+0.69ms)
	Network initialized: 4.45ms, (+0.33ms)
	pvpanic done: 5.07ms, (+0.62ms)
	drivers probe: 5.11ms, (+0.03ms)
	drivers loaded: 5.46ms, (+0.35ms)
	ROFS mounted: 5.62ms, (+0.17ms)
	Total time: 5.62ms, (+0.00ms)
Hello from C code

The console log with bootchart information above from an example run of OSv with Read-Only-FS on Firecracker, shows it took slightly less than 6 ms to boot. As you can notice OSv spent no time loading its image in real mode and decompressing it which is expected because OSv gets booted as ELF and these two phases completely bypassed.

Even though 5 ms is already very low number, one can see that possibly TLS initialization and ‘SMP lauched’ phases need to be looked at to see if we can optimize it further. Other areas of interest to improve are memory utilization - OSv needs minimum of 18MB to run on Firecracker and network performance which suffers a little comparing to QEMU/KVM which might need to be optimized on Firecracker itself.

On other hand it is worth noting that block device seems to work much faster - for example mounting ZFS filesystem is at least 5 times faster on Firecracker - on average 60ms on firecracker vs 260 ms on QEMU.

Looking toward future, Firecracker team is working on ARM support and given OSv already unofficially supports this platform and used to boot on XEN/ARM at some point, it might not be that difficult to make OSV boot on future Firecracker ARM version.

Finally this work might make it easier to boot OSv on NEMU and QEMU 4.0 in Linux direct kernel mode. It might also make it easier to implement support of new Virtio 1.1 spec.

 

Firecracker

Firecracker是一种使用Rust语言编写的,基于KVM的新轻量级hypervisor,在2018年AWS re:Invent上发布。与QEMU不同之处在于,Firecracker只支持Linux客户机,能够在~125ms内启动微虚拟机。Firecracker本身只能在Linux系统上运行,裸机硬件为Intel 64位处理器或者AWS i3.metal以及其他基于Nitro的EC2实例
Firecracker实现了如下I/O设备模型:

  • 半虚拟化VirtIO块设备和通过MMIO传输的网络设备
  • 串口控制台
  • 部分键盘控制器
  • PICs(Programmable Interrupt Controllers)
  • IOAPIC(Advanced Programmable Interrupt Controller)
  • PIT(Programmable Interval Timer)
  • KVM块设备

此外,Firecracker通过UNIX域套接字暴露REST API,通过jailer限制来提高安全性,更多细节阅读Firecracker的设计文档和使用说明
如果你想在阅读这篇文章之前就尝试在Firecracker运行OSv,请跳转到这个Wiki页面

实现策略

OSv实现了VirtIO驱动,在QEMU/KVM上有很好的支持。鉴于Firecracker基于KVM并且暴露VirtIO设备,一开始似乎OSv只需要一小点修改就能立即在Firecracker上启动运行。但第一次实验和随后的研究表明,实际的任务不是寻常简单的,首次在Firecracker上启动OSv导致KVM退出,OSv甚至没能打印出它的第一条启动消息
首先我需要确认哪个OSv工件被作为Firecracker /boot-source API调用的参数,参数不可能是普通的usr.img或者其衍生物,因为Firecracker使用64位的ELF(Executable and Linking Format) vmlinux内核,与之最相关的是loader.elf(在usr.img中被包含),它是64位ELF文件,带有32位入口点start32,最后由于不可能使用gdb连接Firecracker上运行的OSv(像在QEMU上),我不能使用这项技术来弄清楚到底哪里出现了问题
在我看来,首先需要着重于让OSv在没有块设备和网络设备的Firecracker上启动。幸运的是OSv能够用Ram-FS构建,这样应用代码被放在loader.elf的bootfs部分
然后我应该让VirtIO层支持MMIO传输的块设备和网络设备,一开始这些改动看起来十分合理,但最终它们变得更加复杂
最后我不得不调整OSv的一部分让它能够在ACPI(Advanced Configuration and Power Interface)不可用的情况下工作
接下来三节详细地介绍该计划的每步操作

Booting

为了让OSv在Firecracker上启动,首先我需要理解现有OSv启动过程是怎么工作的
起初OSv被设计成在16位模式(也就是实模式)下启动,当它期望hypervisor加载MBR(Master Boot Record,主引导记录),也就是OSv镜像起始的512字节,起始地址为0x7c00,跳转到该地址后执行它。这时OSv引导程序(512字节中的代码)使用0x13号中断加载镜像接下来63.5KB中发现的命令行。接着它使用0x13中断加载镜像的剩余部分,也就是lzloader.elf(loader.elf+fastlz压缩算法),lzloader.elf的起始地址为0x100000,块大小为32KB,请在实模式和保护模式之间来回切换。接着使用0x15号中断读取可用的RAM的大小,并跳转到第1个MB代码的起点,该代码从尾部往后退以1MB块大小解压lzload.elf。最后当lzloader.elf放置到地址0x200000(第二个MB),boot16.S切换到保护模式并跳转到start32准备开始切换到长模式(64位)。请注意start32是laoder.elf的32位入口点,详情请阅读该Wiki
另一方面,Firecracker希望镜像是vmlinux 64位ELF文件,并将其LOAD段加载到ELF程序头指定地址的RAM中。Firecracker也将虚拟机设置为长模式(64位模式),设置相关寄存器的状态,设置页表将虚拟内存映射到物理内存,正如Linux所期待的。最后在boot_params结构体中传递内存信息和启动命令行,以及跳转到vmlinux的startup_64入口使得Linux内科能够继续启动过程
所以挑战在于:如何修改启动逻辑使得OSv能够最为64位vmlinux ELF格式来启动,同事保留使用传统usr.img文件以实模式启动的能力?当然我们需要用一个64位的vmlinux_entry64来代替现有的32位入口点start32,该入口点将会被Firecracker调用(同时也会加载loader.elf到0x200000地址,正如ELF头要求的)。同时我们也需要更改start32的内存位置到某个固定偏移点,这样boot16.S知道跳转到哪里
那新的vmlinux_entry64到底应该干些什么?Firecracker将虚拟机设置为64位状态,但是OSv已经提供了64位的start64函数,那有人会问为什么不直接跳转到start64就好了?不幸地是这样是无效的(我试过了)因为在内存页表和CPU设置方面Linux和OSv(以及Firecracker为Linux设置)期望的稍微有些不一样。那也许vmLinux_entry64需要以OSv方式重设页表和CPU?另外vmlinux_entry64可以切换回保护模式然后跳转到start32让它设置OSv方式的虚拟机,我也试过但还是因为某些原因不可行
幸运的是我们不需要担心分段,分段由Firecracker设置以扁平内存模型,这在长模式中是典型的,也是OSv期望的
最后,根据许多次的反复尝试,我得出结论,vmlinux_entry64应该执行以下操作:

  • 从Linux boot_params结构体中提取出命令行和内存信息,该结构体地址由Firecracker在RSI寄存器中传入并拷贝到另一个结构相同的地址,就好像OSv通过boot16.S引导一样(详情见extract_linux_boot_params)
  • 重置CR0和CR4控制寄存器来重置全局CPU特性
  • 重置CR3寄存器指向OSv PML4页表,将第一个1GB内存一对一映射为2MB大小页(有关内存分页请阅读这篇文章)
  • 最后跳转到start64完成引导过程并启动OSv

以下代码为vmlinux-boot64.S中稍微修改后的vmlinux_enrty64函数,该函数使用GAS(GNU Assembler)语言实现上述步骤

# Call extract_linux_boot_params with the address of # boot_params struct passed in RSI register to # extract cmdline and memory information mov %rsi, %rdi call extract_linux_boot_params # Reset paging tables and other CPU settings the way # OSv expects it mov $BOOT_CR4, %rax mov %rax, %cr4 lea ident_pt_l4, %rax mov %rax, %cr3 # Enable long mode by writing to EFER register by setting # LME (Long Mode Enable) and NXE (No-Execute Enable) bits mov $0xc0000080, %ecx mov $0x00000900, %eax xor %edx, %edx wrmsr mov $BOOT_CR0, %rax mov %rax, %cr0 # Continue 64-bit boot logic by jumping to start64 label mov $OSV_KERNEL_BASE, %rbp mov $0x1000, %rbx jmp start64 

正如你所见,在Firecracker上启动OSv是整个过程中最为棘手的部分

VirtIO

不像引导过程,在OSv中加强virtio层并没有那么复杂和难以调试,但却是最费力的工作,需要大量的研究,包括阅读规范和Linux代码并进行比较
在深入之前,首先让我们快速浏览一下VirtIO和它的目的。VirtIO规范定义了包括网络,块,scsi等在内的虚拟(有时也称为半虚拟化)设备。它有力地规定了hypervisor(或者宿主机)如何暴露这些设备以及客户机如何检测,配置,并在运行时以driver的形式和这些设备交互。VirtIO的目标是定义能够用最有效方式操作并且最小化性能开销昂贵的从客户机到宿主机的退出次数的设备
Firecracker实现VirtIO MMIO接口块设备和网络设备。MMIO(Memory-Mapped IO)是VirtIO三种传输层之一(MMIO,PCI,CCW),MMIO以PCI为原型,但在设备配置和初始化上和PCI有着很大不同。不幸地是,OSv只实现了PCI传输并没有实现MMIO,更糟糕的是,OSv在2016年完成之前实现了VirtIO的旧版本(1.0之前版本)。所以需要解决两件事情:重构OSv VirtIO层以支持老旧和现代版本PCI,并实现VirtIO MMIO
为了设计实现正确的改动,首先我需要理解现有VirtIO层的实现,这点OSv有两个正交但相关的抽象层:类driver和类device。virtio::virtio_driver类作为基类提供通用的驱动逻辑,并由类virtio::blk,virtio::net,virtio::scsi和virtio::rng继承实现各自特性,如下所示:

 hw_device <|---
               | 
       pci::function <|--- 
                         |
                  pci::device
                         ^                 |-- virtio::net
                  (uses) |                 |
                         |                 |-- virtio::blk
 hw_driver <|--- virtio::virtio_driver <|--|
                                           |-- virtio::scsi
                                           |
                                           |-- virtio::rng

你可以从上表中看出,virtio_driver直接和pci::device交互,为了添加对MMIO设备支持,我必须对其进行重构,使其与传输无关。考虑所有可能的选择,入侵性最小,最灵活的方法就是为VirtIO设备创建新的抽象,最后我对virtio_driver进行大量重构并定义了如下新的设备抽象类:

  • virtio::virtio_device - 对VirtIO设备抽象,被重构的virtio::virtio_driver使用
  • virtio::virtio_pci_device - 扩展virtio_device的基类,实现了代表PCI设备的通用的virtio PCI逻辑
  • virtio::virtio_legacy_pci_device - 继承virtio_pci_device类,抽象旧式PCI设备
  • virtio::virtio_modern_pci_device - 继承virtio_pci_device类,抽象现代PCI设备,旧式PCI设备和现代PCI设备不同之处在于初始化和配置阶段特定寄存器的不同
  • virtio::mmio_device - 继承virtio_device类,抽象MMIO设备

在类virtio_device中声明了方法is_modern(),该方法在子类中被重写,在virtio_driver及其子类的一些地方被用到,主要用于旧式和现代Virtio设备略微不同的初始化逻辑
为了更好地理解新类旧类的改变及其之间的关系,见如下所示类UML的字符表:

               |-- pci::function <|--- pci::device
               |                              ^
               |               (delegates to) |
               |                              |        |-- virtio_legacy_pci_device
 hw_device <|--|             --- virtio_pci_device <|--|
               |             |                         |-- virtio_modern_pci_device
               |             _ 
               |             v
               |-- virtio::virtio_device <|--- virtio::mmio_device
                   ---------------------
                   | bool is_modern()  |
                   ---------------------
                             ^             |-- virtio::net
                      (uses) |             |
                             |             |-- virtio::blk
 hw_driver <|--- virtio::virtio_driver <|--|
                                           |-- virtio::scsi
                                           |
                                           |-- virtio::rng

重新回顾下,大部分代码用于重构virtio_driver类使其与接口无关并将其委托给virtio_device,从virtio_driver中提取出PCI逻辑到类virtio_pci_device,virtio_legacy_pci_device和virtio_modern_pci_device中,最后实现了新的类virtio_modern_pci_device和virtio::mmio_device。由于这个方法对virtio_driver子类(virtio::net,virtio::block等)的改动相当小,一个关键的类virtio::vring几乎完好无损
实现现代的virtio PCI设备(而不只是实现老旧版本)的和主要动机是使用QEMU测试和运用现代virtio设备,这样我对最繁重的virtio_driver重构工作是正确的充满信心,甚至在暴露MMIO设备的Firecracker上测试之前。一旦完成,它也很有可能使得增强virtio层变得更加容易以支持新的VirtIO 1.1规范(有关概述,参见此处)
最后由于MMIO设备不能以类似PCI设备的方式被检测到,而是被Firecracker作为命令行的一部分以Linux内核期望的格式传递,我也不得不加强OSv命令解析逻辑来提取相关的配置位。最重要的是,我添加了启动参数来跳过PCI枚举,这样可以节省额外的4-5ms的启动时间

ACPI

最后同时也是最简单的部分就是补充OSv中的空白,使其能够处理ACPI不可用时的情况
Firecracker并没有实现OSv用于电源处理和发现CPU的ACPI,相反OSv不得不改成在没有ACPI下启动并从MP表中读取CPU。更多有关MP表的信息见这里和那里,总地来说我OSv通过以下方式来增强OSv:

  • 修改ACPI相关逻辑,检测其是否存在
  • 修改依赖于ACPI的相关位置(CPU检测,电源关闭),如果ACPI不存在,则使用替代机制而不是中止
  • 修改pvpanic探测逻辑,当ACPI不可用时跳过

后记

经过如上所述的所有修改,OSv可以在Firecracker上启动

OSv v0.53.0-6-gc8395118
2019-04-17T22:28:29.467736397 [anonymous-instance:WARN:vmm/src/lib.rs:1080] Guest-boot-time =   9556 us 9 ms,  10161 CPU us 10 CPU ms
	disk read (real mode): 0.00ms, (+0.00ms)
	uncompress lzloader.elf: 0.00ms, (+0.00ms)
	TLS initialization: 1.13ms, (+1.13ms)
	.init functions: 2.08ms, (+0.94ms)
	SMP launched: 3.43ms, (+1.35ms)
	VFS initialized: 4.12ms, (+0.69ms)
	Network initialized: 4.45ms, (+0.33ms)
	pvpanic done: 5.07ms, (+0.62ms)
	drivers probe: 5.11ms, (+0.03ms)
	drivers loaded: 5.46ms, (+0.35ms)
	ROFS mounted: 5.62ms, (+0.17ms)
	Total time: 5.62ms, (+0.00ms)
Hello from C code

如上所示的终端打印信息来自在Firecracker运行带只读文件系统的OSv的例子,可以看到启动时间不到6ms,你也可以注意到OSv在以实模式加载镜像并解压并没有花时间,这是预料之中的,因为OSv以ELF形式启动这两个阶段都被完全绕开了
即使5ms已经是很短的时间了,可以看到可能还需要对TLS初始化和启动SMP阶段进行研究,以确定是否可以进一步优化。另外需要改进的地方时内存利用率,OSv最少需要18MB的内存才能在Firecracker上运行,而网络性能与QEMU/KVM相比要稍差一些,这可能需要在Firecracker本身上进行优化
另一方面,值得注意的是块设备似乎工作地更加快速,比如挂载ZFS文件系统在Firecracker上(平均260ms)比在QEMU上(平均60ms)快5倍
展望未来,Firecracker团队正在致力于ARM支持,由于OSv已经非正式地支持ARM,并曾经在XEN和ARM上启动,未来让OSv在Firecracker ARM版本上启动可能并没有那么难
最后这项工作使得以Linux direct kernel模式在NEMU和QEMU 4.0上启动OSv更加简单,还可能使新的Virtio 1.1规范的支持更容易实现。

译者笔记

翻译比较仓促,有些特定的技术名词或者原文作者从技术层面上的操作描述因为没有好好地深入研究过代码(当然估计也看不懂)会存在一定的偏差错误,但总体的意思是弄懂的,总地来说,让OSv能够在Firecracker上运行需要解决这么几个问题:

    • 启动过程不同,需要修改
    • OSv补充VirtIO MMIO接口
    • OSv支持无ACPI时能够启动 这里面涉及的技术很繁杂,需要阅读大量的文档研究才能下笔改代码重构,可见得作者在内核相关工作的功力,可惜我现在还是小白一个,日后长进了再来补充而不是仅仅是直白地翻译

在Firecracker上运行OSv

标签:ttl   修改   example   security   bre   cto   rust   including   could not   

原文地址:https://www.cnblogs.com/dream397/p/14187620.html

版权声明:完美者 发表于 2020-12-30 11:12:12。
转载请注明:在Firecracker上运行OSv | 完美导航

暂无评论

暂无评论...