Goal
to imporove ner-vm communicataion performance in data center
focusing point
i/o overheads
Exisiting technology does not assume
10Gbps over inter-vm communication
More faster inter-vm communication is needed
Surpressing load of hyper visor Shareing network devices
importance of para-virtualized driver
some implementatios such like virtio-net and vhost-net are well known
para-virtualized driver is useful
- Flexibility
- no hardware requirement
- higher performance than full-virtualized driver
Focusing point
I/O overheads of pra-virtualization driver
- VM Entry/Exit raised by interruput
Memory copy from hyper visor to VM
Para-virtualization deriver needs memory copy from HV
- Vm's memory region is allocated in user-land
- Even if VM is executed in kernel-land on hyper visor, memory copy occurs when VM does I/O
Why Infiniband?
- zero-copy
- infiniband can easily achive zero-copy inter-vm communication
Ethernet can not easily archive zero-copy inter-vm communication when multiple Vm's share a device
Low-cost
- Infiniband HCA is cheaper than Ethernet NIC at the same link speed range
Approach
- Requirement
- Maximize bandwidth
- Delay is less than or comparable to the existing implementation
CPU load of HV is less than or comarable to the exising implementation
Approach of this study
- Use basic techniques to mitigate I/O overheads
- In addition archive zero-copy inter-vm communication
- Use Ethernet as pseudo media
Summary of Infiniband
- Identifier
- Globally Unique Identifier(GUID)
- Loadl Identifier(LID)
- Queue Pair Number (QPN)
- Reliable Connection(RC)
Reliable Datagram(RD)
Key points of infiniband
- Memory Registration(MR) for DMA
Asynchronous method for communication using Queue Pair(QP) and Completion Queue(cQ)
API
- Openfabrics Enterprie Distribution(OFED)
- BY OpenFabrics(OFA)
- Merged in Linux kernel
- libverbs
Design
- Pseudo PCI device module
- Sharing pointers of packet buffer with shared ring buffer on Guest OS
- Being Applied techniques to mitgate I/O overheads
Packet sending procedure
Summary of constructed network
- Ethernet is used as pseudo media on Infiniaband network
- Ethernat frame is tnnueld through the infiniband network
- Infiniband's Qkey(32bit) is used to split Ethernet segments
- Each VM execution process has its own FDB
- FDB is updated only when Qkey of received packet is correct
FDB lookup
- FDB is implemented as a hashtable whoose key is MAC address
- LID/GUID/QPN of the destination HV are stored in FDB
- If there is no matching entry in FDB at packet sending
Evaluation
- Points of measurement
- Packet processing performance of inter-vm communication
Evaluation target
- Directory connected 2 HVs with Infiniband QDR
- Evaluation target
- A system constructed with vhost-net and elPolB
- vhost-net is a kernel-land implementation of virtio-net
- elPolB is a protocol realize ethernet communication over Infiniband network
This para-virtualized driver using Infiniband
Traffices are generated from kernel-land on Guest OS
Measured with ping between 2 Guest OSs(each VM are placed different HV)
Result - CPU load of HV
- Measured load of HV's CPU when 2 guest OSs communicate with fixed rate
- Trafic generation is executed while 60 seconds, and measured HV's CPU load while 20 seconds excepted for around 20 seconds
Conclusion
Desinged and implemented para-virtualized driver using Infiniband
zero-copy and other techniques for mitgation of I/O overheads are applied Meeted the requirements in terms of performance Packet processing rate is about 800kpbs RIT is about 0.8 msec CPU load of HV is accetable
Changes for future Further improvement of the performance
InfiniBand(インフィニバンド)とは、非常に高いRAS(信頼性・可用性・保守性)を持つ基幹系・HPC系のサーバ/クラスター用高速I/Oバスアーキテクチャ及びインターコネクトのこと。システム間インターコネクト機構としては、RAS機能の他、他機構に比較して、低レイテンシである点も特徴である。