Data Sharing or Resource Contention: Towards Performance Transparency on Multicore Systems

Posted on February 5, 2016

For my CS854 class I have to read a trio of research papers each week, and post summaries, which are then updated after the class that a student gives a presentation on it. Don’t rely on my summaries for anything, but it might interest some of you, so I’m posting it here.

Just read the paper.

Data Sharing or Resource Contention: Toward Performance Transparency on Multicore Systems


Idea: Use the hardware counters to see when tasks are slowing each other down with shared memory contention, and put those tasks on the same for to make it go faster. Or put tasks which each want to use a lot of different memory on different cores, to reduce cache space contention.


Counters

The paper uses 4 counters:

  1. Last Level Cache (LLC) hits,
  2. LLC misses,
  3. misses at the last private (per socket/core) level of cache,
  4. Remote memory accesses (NUMA)

Note: Last Level Cache means the one that is furthest from the CPU, as in, the last one you check before going to main memory.

Oh neat, DRAM also benefits from spatial locality. For Random Access Memory, that’s a little surprising, but it makes sense.

Their SAM technique focuses more on making full use of the memory bandwidth for each core, while still keeping intra-core sharing in mind.

I don’t have that much to say about it, without going into detail on their algorithm or going into detail on the results.

Their technique appears to work well, and beats Linux in their benchmarks.

I’m curious if it will be adopted, or if not, why?