by shigemk2


(LinuxCon Staging) Design & Implementation of Para-Virtualized Driver For Faster Inter-VM Communication Using Infiniband #linuxcon


to imporove ner-vm communicataion performance in data center

focusing point

i/o overheads

Exisiting technology does not assume

10Gbps over inter-vm communication

More faster inter-vm communication is needed

Surpressing load of hyper visor Shareing network devices

importance of para-virtualized driver

some implementatios such like virtio-net and vhost-net are well known

para-virtualized driver is useful

  • Flexibility
  • no hardware requirement
  • higher performance than full-virtualized driver

Focusing point

I/O overheads of pra-virtualization driver

  • VM Entry/Exit raised by interruput
  • Memory copy from hyper visor to VM

  • Para-virtualization deriver needs memory copy from HV

  • Vm's memory region is allocated in user-land
  • Even if VM is executed in kernel-land on hyper visor, memory copy occurs when VM does I/O

Why Infiniband?

  • zero-copy
  • infiniband can easily achive zero-copy inter-vm communication
  • Ethernet can not easily archive zero-copy inter-vm communication when multiple Vm's share a device

  • Low-cost

  • Infiniband HCA is cheaper than Ethernet NIC at the same link speed range


  • Requirement
  • Maximize bandwidth
  • Delay is less than or comparable to the existing implementation
  • CPU load of HV is less than or comarable to the exising implementation

  • Approach of this study

  • Use basic techniques to mitigate I/O overheads
  • In addition archive zero-copy inter-vm communication
  • Use Ethernet as pseudo media

Summary of Infiniband

  • Identifier
  • Globally Unique Identifier(GUID)
  • Loadl Identifier(LID)
  • Queue Pair Number (QPN)
  • Reliable Connection(RC)
  • Reliable Datagram(RD)

  • Key points of infiniband

  • Memory Registration(MR) for DMA
  • Asynchronous method for communication using Queue Pair(QP) and Completion Queue(cQ)

  • API

  • Openfabrics Enterprie Distribution(OFED)
  • BY OpenFabrics(OFA)
  • Merged in Linux kernel
  • libverbs


  • Pseudo PCI device module
  • Sharing pointers of packet buffer with shared ring buffer on Guest OS
  • Being Applied techniques to mitgate I/O overheads

Packet sending procedure

Summary of constructed network

  • Ethernet is used as pseudo media on Infiniaband network
  • Ethernat frame is tnnueld through the infiniband network
  • Infiniband's Qkey(32bit) is used to split Ethernet segments
  • Each VM execution process has its own FDB
  • FDB is updated only when Qkey of received packet is correct

FDB lookup

  • FDB is implemented as a hashtable whoose key is MAC address
  • LID/GUID/QPN of the destination HV are stored in FDB
  • If there is no matching entry in FDB at packet sending


  • Points of measurement
  • Packet processing performance of inter-vm communication

Evaluation target

  • Directory connected 2 HVs with Infiniband QDR
  • Evaluation target
  • A system constructed with vhost-net and elPolB
  • vhost-net is a kernel-land implementation of virtio-net
  • elPolB is a protocol realize ethernet communication over Infiniband network
  • This para-virtualized driver using Infiniband

  • Traffices are generated from kernel-land on Guest OS

  • Measured with ping between 2 Guest OSs(each VM are placed different HV)

Result - CPU load of HV

  • Measured load of HV's CPU when 2 guest OSs communicate with fixed rate
  • Trafic generation is executed while 60 seconds, and measured HV's CPU load while 20 seconds excepted for around 20 seconds


Desinged and implemented para-virtualized driver using Infiniband

zero-copy and other techniques for mitgation of I/O overheads are applied Meeted the requirements in terms of performance Packet processing rate is about 800kpbs RIT is about 0.8 msec CPU load of HV is accetable

Changes for future Further improvement of the performance

edenden · GitHub

InfiniBand - Wikipedia


第15章 KVM Para-virtualized ドライバー