Skip to main content
Filter by
Sorted by
Tagged with
3 votes
3 answers
360 views

How does cache locality impact the performance of ArrayList compared to LinkedList in Java? I've often heard that ArrayList has an advantage in terms of cache locality, but I don't fully understand ...
Marat Tim's user avatar
1 vote
0 answers
82 views

I'm trying to understand the practical value of "cache-friendly" design in lock-free queues. I often see people go to great lengths to pad structures, align data, and avoid false sharing — ...
SpeakX's user avatar
  • 427
0 votes
0 answers
216 views

I'm developing a C++ API and I want to hide private implementation details from the public interface. Currently, I'm employing the Pimpl idiom for this purpose. However, I'm also mindful of minimizing ...
ehopperdietzel's user avatar
0 votes
2 answers
134 views

fn main() { let vec0 = vec![0; 10]; let mut vec1 = vec![]; for _ in 0..10 { vec1.push(0); } assert_eq!(vec0.len(), vec1.len()); } In this example, vec0 and vec1 are ...
Rahn's user avatar
  • 5,565
0 votes
2 answers
271 views

This is a rather hypothetical question. I only have limited knowledge about how the cpu cache works. I know a cpu loads subsequent bytes into the cache. Since a list uses pointers/indirection into ...
Raildex's user avatar
  • 5,428
-1 votes
2 answers
1k views

From my understanding the constructs which give rise to the high level concept of "cache locality" are the following: Translation Lookaside Buffer (TLB) for virtual memory translation. ...
gmaggiol's user avatar
3 votes
1 answer
92 views

I was accidentally surprised to found that inserting sorted keys into std::set is much much faster than inserting shuffled keys. This is somewhat counterintuitive since a red-black tree (I verified ...
Leon Cruz's user avatar
  • 375
2 votes
2 answers
2k views

So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1. Assume that the code below is a valid compilable C code. Vector2: void ...
Pengibaby's user avatar
  • 373
0 votes
1 answer
1k views

I am trying to implement blocked (tiled) matrix multiplication on a single processor. I have read the literature on why blocking improves memory performance, but I just wanted to ask how to determine ...
user avatar
1 vote
0 answers
454 views

Binary search of a sorted array may have poor cache locality, due to random access of memory, but linear search is slow for a large array. Is it possible to design a hybrid algorithm? For example, you ...
felix's user avatar
  • 647
0 votes
0 answers
83 views

I have been trying to get better awareness of cache locality. I produced the 2 code snippets to gain better understanding of the cache locality characteristics of both. vector<int> v1(1000, some ...
roulette01's user avatar
  • 2,502
1 vote
0 answers
48 views

Is there a practical tool to detect whether a cache line is reused (a cache miss is avoided) due to either spatial or temporal locality? I could not find a related discussion in cachegrind. I was able ...
Kadir's user avatar
  • 1,715
2 votes
0 answers
73 views

In its simplest form, RDD is merely a placeholder of chained computations that can be arbitrarily scheduled to be executed on any machine: val src = sc.parallelize(0 to 1000) val rdd = src....
tribbloid's user avatar
  • 3,822
2 votes
2 answers
1k views

I am trying to implement a heap (implicit free list with header/footer) and deciding on whether I should add padding to it. What are the tangible benefits of adding pads? I read that it somehow ...
Silver Flash's user avatar
  • 1,121
0 votes
1 answer
585 views

I have been browsing stackoverflow could not really find a example regarding to this one. I understand the concept of Temporal and Spatial locality for data cache: Temporarl locality: address ...
nihulus's user avatar
  • 1,495
2 votes
0 answers
240 views

I was trying to observe the effects of CPU cache spatial locality by benchmarking sequential/random reads to an array with JMH. Interestingly, the results are almost the same. So I wonder, is this ...
pistolPanties's user avatar
6 votes
2 answers
6k views

I'm trying to understand how the hardware cache works by writing and running a test program: #include <stdio.h> #include <stdint.h> #include <x86intrin.h> #define LINE_SIZE 64 #...
xiaogw's user avatar
  • 765
2 votes
1 answer
172 views

In the textbook Computer Systems: a Programmer's Perspective there are some impressive benchmarks for optimizing row-major order access. I created a small program to test for myself if a simple ...
Adam Thompson's user avatar
1 vote
0 answers
109 views

Given we have an application that is heavily polluted with concurrency constructs, multiple techniques are used (different people worked without clear architecture in mind), multiple questionable ...
vach's user avatar
  • 11.5k
0 votes
3 answers
392 views

I am a beginner in operating systems, and I am trying to understand some code snippets. Can you please explain to me the difference between these code snippets?? int sum_array_rows(int a[M][N]) { ...
Agapi's user avatar
  • 107
3 votes
2 answers
2k views

I was studying for my architecture final and came across the following lines of code: for(i = 0; i <= N ;i++){ a[i] = b[i] + c[i]; } The question is: "How does this code snippet demonstrate ...
Carlos Romero's user avatar
0 votes
1 answer
1k views

I have read this blog and I am still unsure about the importance of locality. Why is locality important for cache performance? Is it because it leads to fewer cache misses? Furthermore, how is a ...
Bab's user avatar
  • 443
5 votes
2 answers
1k views

My understanding of the L1 cache was that a memory fetch loads a cache line. Assuming the cache line size is 64 bytes, if I access memory at address p, it will load the entire block from p to p + 64 ...
user1413793's user avatar
  • 9,427
0 votes
1 answer
164 views

I have the following nested for loop: int n = 8; int counter = 0; for (int i = 0; i < n; i++) { for (int j = i + 1; j < n; j++) { printf("(%d, %d)\n", i, j); counter++; ...
BodneyC's user avatar
  • 110
0 votes
1 answer
176 views

I have two servers. The first server (A) contains the zookeeper, a mongodb database and a drillbit. The second server (B) contains a hadoop distribution with several hive tables, a postgresql database ...
Ivan's user avatar
  • 1
1 vote
5 answers
2k views

Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays). double **allocate_matrix(int NumRows, int NumCol) { double **x; int ...
John Smith's user avatar
  • 1,109
0 votes
1 answer
2k views

I am looking for a library/solution that will alleviate the rather important number of cache miss I am experiencing in my program class Foo{ std::vector<Foo*> myVec; // Rest of the ...
B. D's user avatar
  • 7,818
17 votes
4 answers
43k views

I was reading this question, I wanted to ask more about the code that he showed i.e for(i = 0; i < 20; i++) for(j = 0; j < 10; j++) a[i] = a[i]*j; The questions are, I understand ...
user avatar