Check out the new USENIX Web site.

Latency vs. Queue Length

Next we examine how IO latency varies with increases in overall load (queue length) at the array. We experimented with one to five hosts accessing the same array. Each host generates a uniform workload of 16 KB IOs, 67% reads and 70% random, keeping 32 IOs outstanding. Figure 6 shows the aggregate throughput and average latency observed in the system, with increasing contention at the array. Throughput peaks at three hosts, but overall latency continues to increase with load. Ideally, we would like to operate at the lowest latency where bandwidth is high, in order to fully utilize the array without excessive queuing delay.

For uniform workloads, we also expect a good correlation between queue size and overall throughput. To verify this, we configured seven hosts to access a 400 GB volume on a 5-disk RAID-5 disk group. Each host runs one VM with an 8 GB virtual disk. We report data for a workload of 32 KB IOs with 67% reads, 70% random and 32 IOs pending. Figure 7 presents results for two different static host-level window size settings: (a) 32 for all hosts and (b) 16 for hosts 5, 6 and 7.

We observe that the VMs on the throttled hosts receive approximately half the throughput ($ \sim 42$  IOPS) compared to other hosts ($ \sim
85$  IOPS) and their latency ($ \sim$  780 ms) is doubled compared to others ($ \sim$  360 ms). Their reduced performance is a direct result of throttling, and the increased latency arises from the fact that a VM's IOs were queued at its host. The device latency measured at the hosts (as opposed to in the VM, which would include time spent in host queues) is similar for all hosts in both experiments. The overall latency decreases when one or more hosts are throttled, since there is less load on the array. For example, in the second experiment, the overall average latency changes from $ \sim$  470 ms at each host to $ \sim$  375 ms at each host when the window size is 16 for hosts 5, 6, and 7.

Figure 6: Overall bandwidth and latency observed by multiple hosts as the number of hosts is increased from 1 to 5.

\epsfig{figure=plots/sec52-exp1-th-bigfont.ps,height=1.25in}
\epsfig{figure=plots/sec52-exp1-lat-bigfont.ps,height=1.25in}
(a) Aggregate Bandwidth (MB/s) (b) Average Latency (ms)


Table 2: Throughput ($ T$ IOPS) and latencies ($ L$ ms) observed by four hosts for different workloads and queue lengths ($ Q$ ).
Workload Phase1 Phase2
Size Read Random  $ Q$ $ T$ $ L$ $ Q$ $ T$ $ L$
16K 70% 60%  32 1160 26 16 640 24
16K 100% 100%  32 880 35 32 1190 27
8K 75% 0%  32 1280 25 16 890 17
8K 90% 100%  32 900 36 32 1240 26


Figure 7: VM bandwidth and latency observed when queue length $ Q=32$ for all hosts, and when $ Q=16$ for some hosts.

\epsfig{figure=plots/effect-of-queue-limits.eps,height=1.4in}
\epsfig{figure=plots/effect-of-queue-limits-latency.eps,height=1.4in}
(a) Average IOPS (b) Average latency (ms)

We also experimented with four hosts sending different workloads to the array while we varied their queue lengths in two phases. Table 2 reports the workload description and corresponding throughput and latency values observed at the hosts. In phase 1, each host has a queue length of 32 while in phase 2, we lowered the queue length for two of the hosts to 16. This experiment demonstrates two important properties. First, overall throughput reduces roughly in proportion to queue length. Second, if a host is receiving higher throughput at some queue length $ Q$ due to its workload being treated preferentially, then even for a smaller queue length $ Q/2$ , the host still obtains preferential treatment from the array. This is desirable because overall efficiency is improved by giving higher throughput to request streams that are less expensive for the array to process.

Ajay Gulati 2009-01-14