Next: 5. Conclusions and perspectives
Up: 4. Memory bandwidth and
Previous: data blocking.
Stack alignment has been an issue because older gcc versions cause doubles
not being aligned on an 8 bytes boundary which make access cost extra cycles.
We have face this problem with level 3 BLAS that use fixed size arrays on
stack for blocking resulting in poor performances.
Recent gcc versions (i.e. 2.95.x) solve this problem and
propose various option to control the stack alignment such as -malign-double and
-mpreferred-stack-boundary=x.
Thread/processor affinity is a general issue in smp systems; the cache
efficiency can be reduce if the task scheduler moves thread from one processor
to another. At this time there is no way to force thread/processor affinity on
a standard Linux kernel but as we said in section 2.1
the normal behavior of the Linux scheduler is to place each running process
(master and slave) on 2 different processors and a kernel patch is available at
https://isunix.it.ilstu.edu/~thockin/pset/ that add some control on the
thread/processor binding.
Thomas Guignon
2000-08-24