A dedicated kernel for multi-threading applications.

Saturday, January 15, 2011

Memory organization in a Multicore system

This paper is a part of the final project called "Parallel Algorithm with TORO kernel", Electronic Engineering, Universidad Nacional de La Plata. In the next months I will publish more papers about my final project. Enjoy!

Memory organization in a Multicore system

Actually, the "Uniform memory access" is the common way to access the memory (See SMP). In this kind of arquitecture, every processor can read every byte of memory, and the processors are independent. In this case, a shared bus is used and the processors compite but only one can write or read. In this environments just one processor can access to a byte in a gived time. For the programmers the memory access is transparent.

In 1992 Intel made the first SMP processor called Pentium PRO. And the memory bus was called Front Side Bus.

That is a bi-directional bus, it is too simple and very cheap, and in theory it scales well.

The next intel step was partition the FSB in two independent bus, but the cache coherency was a bootle-neck.

In 2007 it was implemented a bus per processor.

This kind of architecture is used by Atom, Celeron, Pentium and Core2 of intel.

In a system with many cores, the traffic through the FSB is heavy. The FSB doesn´t scale and it has a limit of 16 processor per bus. So the FSB is wall for the new multicores technology.

We can have CPU that it executes instructions fastly but we waste time if we can´t make the capture and decodification fastly. In the best case, we lose one cycle more reading from the memory.

Since 2001 the FSB has been replaced with point to point devices as Hypertransport or Intel QuickPath Interconnect. That changed the model memory to non uniform memory access

Matias E. Vara www.torokernel.org