by shigemk2

当面は技術的なことしか書かない

(LinuxCon Staging) Design & Implementation of Para-Virtualized Driver For Faster Inter-VM Communication Using Infiniband #linuxcon

Goal

to imporove ner-vm communicataion performance in data center

focusing point

i/o overheads

Exisiting technology does not assume

10Gbps over inter-vm communication

More faster inter-vm communication is needed

Surpressing load of hyper visor Shareing network devices

importance of para-virtualized driver

some implementatios such like virtio-net and vhost-net are well known

para-virtualized driver is useful

  • Flexibility
  • no hardware requirement
  • higher performance than full-virtualized driver

Focusing point

I/O overheads of pra-virtualization driver

  • VM Entry/Exit raised by interruput
  • Memory copy from hyper visor to VM

  • Para-virtualization deriver needs memory copy from HV

  • Vm's memory region is allocated in user-land
  • Even if VM is executed in kernel-land on hyper visor, memory copy occurs when VM does I/O

Why Infiniband?

  • zero-copy
  • infiniband can easily achive zero-copy inter-vm communication
  • Ethernet can not easily archive zero-copy inter-vm communication when multiple Vm's share a device

  • Low-cost

  • Infiniband HCA is cheaper than Ethernet NIC at the same link speed range

Approach

  • Requirement
  • Maximize bandwidth
  • Delay is less than or comparable to the existing implementation
  • CPU load of HV is less than or comarable to the exising implementation

  • Approach of this study

  • Use basic techniques to mitigate I/O overheads
  • In addition archive zero-copy inter-vm communication
  • Use Ethernet as pseudo media

Summary of Infiniband

  • Identifier
  • Globally Unique Identifier(GUID)
  • Loadl Identifier(LID)
  • Queue Pair Number (QPN)
  • Reliable Connection(RC)
  • Reliable Datagram(RD)

  • Key points of infiniband

  • Memory Registration(MR) for DMA
  • Asynchronous method for communication using Queue Pair(QP) and Completion Queue(cQ)

  • API

  • Openfabrics Enterprie Distribution(OFED)
  • BY OpenFabrics(OFA)
  • Merged in Linux kernel
  • libverbs

Design

  • Pseudo PCI device module
  • Sharing pointers of packet buffer with shared ring buffer on Guest OS
  • Being Applied techniques to mitgate I/O overheads

Packet sending procedure

Summary of constructed network

  • Ethernet is used as pseudo media on Infiniaband network
  • Ethernat frame is tnnueld through the infiniband network
  • Infiniband's Qkey(32bit) is used to split Ethernet segments
  • Each VM execution process has its own FDB
  • FDB is updated only when Qkey of received packet is correct

FDB lookup

  • FDB is implemented as a hashtable whoose key is MAC address
  • LID/GUID/QPN of the destination HV are stored in FDB
  • If there is no matching entry in FDB at packet sending

Evaluation

  • Points of measurement
  • Packet processing performance of inter-vm communication

Evaluation target

  • Directory connected 2 HVs with Infiniband QDR
  • Evaluation target
  • A system constructed with vhost-net and elPolB
  • vhost-net is a kernel-land implementation of virtio-net
  • elPolB is a protocol realize ethernet communication over Infiniband network
  • This para-virtualized driver using Infiniband

  • Traffices are generated from kernel-land on Guest OS

  • Measured with ping between 2 Guest OSs(each VM are placed different HV)

Result - CPU load of HV

  • Measured load of HV's CPU when 2 guest OSs communicate with fixed rate
  • Trafic generation is executed while 60 seconds, and measured HV's CPU load while 20 seconds excepted for around 20 seconds

Conclusion

Desinged and implemented para-virtualized driver using Infiniband

zero-copy and other techniques for mitgation of I/O overheads are applied Meeted the requirements in terms of performance Packet processing rate is about 800kpbs RIT is about 0.8 msec CPU load of HV is accetable

Changes for future Further improvement of the performance

edenden · GitHub

InfiniBand - Wikipedia

InfiniBand(インフィニバンド)とは、非常に高いRAS(信頼性・可用性・保守性)を持つ基幹系・HPC系のサーバ/クラスター用高速I/Oバスアーキテクチャ及びインターコネクトのこと。システム間インターコネクト機構としては、RAS機能の他、他機構に比較して、低レイテンシである点も特徴である。

第15章 KVM Para-virtualized ドライバー