Check out the new USENIX Web site. next up previous
Next: 2.2 Passing parameters and Up: 2. Base principles Previous: 2. Base principles


2.1 Synchronization

The master-slave approach relies on process synchronization at start and end of BLAS call. Under Linux process synchronization can usually be done in two way: One third way is to use a shared memory variable (synchronization variable): the slave spins on testing the value change of this variable. In this case the slave is always running even if it doesn't make useful work and a normal (observed) behavior of the Linux scheduler is to place the each running process on different processors so when the master process need synchronization the slave is immediately ready and running.

We compare the three alternatives in a ping-pong test: we make the slave wait for the master and the master ``signals'' the slave; next the master wait for the slave and the slave ``signals'' the master . We measure the number of cycles on the master to get the whole job done. Figure [*] may help in understanding the ping-pong test. Moreover each synchronization method must ensure that:

Figure: ping-pong test events.
\begin{figure}
\begin{center}
\epsfig{file=pingpong.eps,height=5cm} \end{center} \end{figure}

The synchronization variable method fulfills the previous requirements. For IPC and Threads semaphores we must use 2 semaphores4 each one indicates when the master and slave are ready to enter in parallel section; the ping-pong is done with 2 barriers that act like this:

Figure: min, average and max times for ping-pong test ( 1 cycle = 1/400e6 s.).
\begin{figure}
\begin{center}
\epsfig{file=synchro.eps,height=7cm,angle=-90} \end{center}\end{figure}

Experiments are realized on a dual PII 400 and the time measurement is done using the time stamp counter5 (tsc) of Intel Pentium processors, we suppose that the tsc of each processor holds roughly the same value. Results are presented figure [*]: for each method we make 1000 runs and presents the smallest, average and largest time for ping-pong. These results show that synchronization variable is by far the fastest method. As we said previously the difference between synchronization variable and the 2 other methods is that the slave process is ready and running so synchronization does not pay the cost of a system call and moving process from the wait/suspend queue to the running queue. On the other hand having an active slave may interfere with other multi-threaded library.


next up previous
Next: 2.2 Passing parameters and Up: 2. Base principles Previous: 2. Base principles
Thomas Guignon
2000-08-24