Next: continuous datas.
Up: BLASTH, a BLAS library
Previous: 3.2 dgemm and block
In this section we present some well known key points of cache use and how
they impact on performances in BLAS subroutines for single and dual cpu
execution, we will discuss on effect of non continuous data,
false sharing, mutual exclude, data blocking, stack alignment and
thread/processor affinity. All examples suppose we are
using an Intel P6 class processor which suppose that L1 caches lines are 32
bytes long and L1 is 2 way set associative, reader may refer
to[1] for full information on optimizing codes for Pentium processors.
Thomas Guignon
2000-08-24