We face the following situations in implementing a layered device driver in Linux:
Blocking in Interrupt Context For interrupt driven block drivers, the strategy routine (request_fn) can be called from interrupt context but it cannot block. On Solaris, this is possible as it has interrupt threads. For a layered implementation, one needs to call the ll_rw_block routine from the request_fn, so that it can put the buffers in the request queue of the underlying device.
But ll_rw_block routine in Linux can block as it has a global array of request structures, and if all the slots in the array are filled then the function has to block. One solution could be to modify the ll_rw_block code so that if we cannot find a request structure, we return immediately and queue a task in schedule queue, to be executed later.
A better solution would be to make sure that we never need to call strategy routine in the interrupt context. This can be done by consuming all the requests queued to the device queue in a single invocation of the request_fn. This is so as the kernel calls the request_fn from process context only if the device queue is empty.
The solution to this problem is to design the request_fn() in such a way that it keeps on executing till all the requests in the device queue are exhausted. Thus it will always execute from the process context. One drawback of this scheme is that one process may have to delayed or blocked for I/O requested by some other process, but this is acceptable as the situation will occur only when all the request structures are exhausted which is likely to be infrequent. The pseudocode for request_fn() is as below:
tss_strategy() { while (1) { if (no request in queue) return remove first request from queue get tss dev corresp to minor# in request call personality specific strategy if (error in delegating I/O) call end_request with buffers not uptodate } }
Fixed Size Buffer The buffer size for a device is fixed, unlike Solaris where we can have variable sized buffers. For example, to implement RAID5 efficiently, we need to distinguish between the full stripe write and partial stripe write as the latter involves a read-modify-write cycle. In Solaris, this is easier as one buffer can span across stripes. In Linux, each logical buffer is already split into smaller fixed sized buffers, so one has to rediscover the logical buffer to distinguish between the two cases and do the processing accordingly.
In addition, reporting of errors when they occur has to be at buffer granularity. We can keep track of errors only at the individual buffer and therefore cannot do error reporting at the stripe level. end_request If we need to use multiple queues, then the current end_request does not work. We need a new implementation.