To increase the performance of disk writes when the snapshot is on, we can cache snapshot maps and snapshot blocks. The following methods can be used for better performance:
Method A -
Each node that does a COW push first takes a clustered wide lock
and logs the transaction onto the log disk. Then it broadcasts the
updated map information along with the original disk block to all the nodes.
Each node updates its map and caches the original block. Whenever a node
tries to read a snapshot block (for backup), it will check in its local cache for its map. If there is a mapping and the original block is in the cache,
then that is used. If the original block is not in the
cache, the node broadcasts a message requesting the snapshot
block. Nodes that have this block in their cache will respond to this message.
If no node responds to this message, it means that the block is already written
to the snapshot disk and can be read from the disk. If there is no mapping for that block, then the node can take a cluster wide lock on the map section and read the block from the original disk. Similarly, if the map section
is not in the cache, then the node can broadcast a request for it.
If no node responds to that request, the map section can again be read from the snapshot disk.
Advantages
If a block is already COW pushed, there is no need to take a cluster wide lock on the snapshot map section to check whether to do a COW push.
Similarly, a snapshot-read need not take a cluster wide lock to read the snapshot
block if it is already COW pushed. This method performs better when large
number of nodes in the cluster do a large number of disk writes in a small,
concentrated portion of the original disk.
Disadvantages
We need one broadcast message for each COW and one broadcast message, if the
map-section is not in cache, and one more broadcast message for snapshot block
if it is not in the cache.
Augmentations to Method A -
We can keep additional information in the map: the node that has COW pushed the block along with the snapshot block number where it is copied. For example, if some node A
does a COW push of block x, x's map information contains A also and
the snapshot block is in A's cache. Later,
if some node B tries to read x when it is not
in its cache, it gets the node number from the map and contacts
that node (in this case, node A). If map is not
in its cache, it broadcasts a message for it. Node B now requests node A
for snapshot block x. If A has it in its cache, it
can reply with that block. Otherwise, we can do the read of the snapshot
block x in 2 ways.
1. Node A reads block x and sends it to node B
2. Node B reads block x and broadcasts
that it has block x so that all other nodes
can update the node information for the map-entry of
snapshot block x
Advantages
The broadcast messages in method A become unicast messages
when map or snapshot block is not in cache. This method is useful if many nodes are write to the original disk, but the writes are not concentrated in
one region.
Disadvantages
The map size increases as we incorporate node information also in the
maps. This results in an increase in the number of locks that are used to
serialize access to different sections of maps.
Method B -
To do a write, a node x takes a cluster lock for the map section and logs its COW push onto the log disk. Whenever some other node y asks for a lock on the map section, node x transfers
the log along with the lock to y. After transferring the log,
x logs an entry in its log disk that the log has been
transferred to y. If x fails, y
need not replay the log as x's log has already been transferred to y's log disk.
Advantages
There are no broadcast messages. This method is useful if only a few nodes in the cluster write to the original disk.
Disadvantages
A node has to take a cluster wide lock on the snapshot map section
to check whether to do a COW push, even if the block is already COW pushed.
Similarly, a snapshot-read needs a cluster wide lock to read the snapshot
block even if it is already COW pushed. When a cluster wide lock is transferred from one node to another node, the dirty log has to be transferred also.