A key performance attribute of a protocol is its scalability with respect to the number of clients that can be supported by the server. If the network paths or I/O channels are not the bottleneck, the scalability is determined by the server CPU utilization for a particular benchmark.
Table 9 depicts the percentile of the server CPU utilization reported every 2 seconds by vmstat for the various benchmarks. The table shows that, the server utilization for iSCSI is lower than that of NFS. The server utilization is governed by the processing path and the amount of processing for each request. The lower utilization of iSCSI can be attributed to the smaller processing path seen by iSCSI requests. In case of iSCSI, a block read or write request at the server traverses through the network layer, the SCSI server layer, and the low-level block device driver. In case of NFS, an RPC call received by the server traverses through the network layer, the NFS server layer, the VFS layer, the local file system, the block layer, and the low-level block device driver. Our measurements indicate that the server processing path for NFS requests is twice that of iSCSI requests. This is confirmed by the server CPU utilization measurements for data intensive TPC-C and TPC-H benchmarks. In these benchmarks, the server CPU utilization in for NFS is twice that of iSCSI.
The difference is exacerbated for meta-data intensive workloads. A NFS request that triggers a meta-data lookup at the server can greatly increase the processing path--meta-data reads require multiple traversals of the VFS layer, the file system, the block layer and the block device driver. The number of traversals depends on the degree of meta-data caching in the NFS server. The increased processing path explains the large disparity in the observed CPU utilizations for PostMark. The PostMark benchmark tends to defeat the meta-data caching on the NFS server because of the random nature of transaction selection. This causes the server CPU utilization to increase significantly since multiple block reads may be needed to satisfy a single NFS data read.
While the iSCSI protocol demonstrates a better profile in server CPU utilization statistics, it is worthwhile to investigate the effect of these two protocols on client CPU utilization. If the client CPU utilization of one protocol has a better profile than that of the other protocol, then the first protocol will be able to scale to a larger number of servers per client.
Table 10 depicts the percentile of the client CPU utilization reported every 2 seconds by vmstat for the various benchmarks. For the data-intensive TPC-C and TPC-H benchmarks, the clients are CPU saturated for both the NFS and iSCSI protocols and thus there is no difference in the client CPU utilizations for these macro-benchmarks. However, for the meta-data intensive PostMark benchmark, the NFS client CPU utilization is an order of magnitude lower than that of iSCSI. This is not surprising because the bulk of the meta-data processing is done at the server in the case of NFS while the reverse is true in the case of the iSCSI protocol.