Skip to main content
Filter by
Sorted by
Tagged with
12 votes
4 answers
6k views

When executing a series of _mm_stream_load_si128() calls (MOVNTDQA) from consecutive memory locations, will the hardware pre-fetcher still kick-in, or should I use explicit software prefetching (with ...
BlueStrat's user avatar
  • 2,324
79 votes
5 answers
44k views

Can anyone give an example or a link to an example which uses __builtin_prefetch in GCC (or just the asm instruction prefetcht0 in general) to gain a substantial performance advantage? In particular, ...
Shaun Harker's user avatar
47 votes
4 answers
22k views

I would like to programmatically disable hardware prefetching. From Optimizing Application Performance on Intel® Core™ Microarchitecture Using Hardware-Implemented Prefetchers and How to Choose ...
Carlos's user avatar
  • 1,525
58 votes
2 answers
5k views

I am aware of multiple questions on this topic, however, I haven't seen any clear answers nor any benchmark measurements. I thus created a simple program that works with two arrays of integers. The ...
Daniel Langr's user avatar
  • 24.2k
105 votes
2 answers
91k views

suppose I have this model: class PhotoAlbum(models.Model): title = models.CharField(max_length=128) author = models.CharField(max_length=128) class Photo(models.Model): album = models....
Timmmm's user avatar
  • 99.2k
6 votes
1 answer
1k views

I have a rather non-trivial problem, where my computational graph has cycles and multiple "computational paths". Instead of making a dispatcher loop, where each vertex will be called one-by-one, I had ...
artemonster's user avatar
3 votes
2 answers
5k views

I'm writing a program to analyze a graph of social network. It means the program needs a lot of random memory accesses. It seems to me prefetch should help. Here is a small piece of the code of ...
Da Zheng's user avatar
  • 121
41 votes
2 answers
16k views

I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other ...
read Read's user avatar
  • 6,113
11 votes
6 answers
13k views

Some CPU and compilers supply prefetch instructions. Eg: __builtin_prefetch in GCC Document. Although there is a comment in GCC's document, but it's too short to me. I want to know, in practice, when ...
superK's user avatar
  • 4,002
5 votes
2 answers
3k views

The GCC doc here specifies the usage of _buitin_prefetch. Third argument is perfect. If it is 0, compiler generates prefetchtnta (%rax) instruction If it is 1, compiler generates prefetcht2 (%rax) ...
ANTHONY's user avatar
  • 373
33 votes
1 answer
27k views

The intrinsics guide says only this much about void _mm_prefetch (char const* p, int i) : Fetch the line of data from memory that contains address p to a location in the cache heirarchy specified ...
Serge Rogatch's user avatar
28 votes
2 answers
6k views

I've realized that Little's Law limits how fast data can be transferred at a given latency and with a given level of concurrency. If you want to transfer something faster, you either need larger ...
Nathan Kurz's user avatar
  • 1,729
19 votes
1 answer
17k views

I am trying to vectorize a loop, computing dot product of a large float vectors. I am computing it in parallel, utilizing the fact that CPU has large amount of XMM registers, like this: __m128* A, B; ...
xakepp35's user avatar
  • 3,332
13 votes
2 answers
4k views

I am reading about different prefetcher available in Intel Core i7 system. I have performed experiments to understand when these prefetchers are invoked. These are my findings L1 IP prefetchers ...
bholanath's user avatar
  • 1,761
4 votes
3 answers
2k views

Let's say I have a function that I plan to execute as part of a benchmark. I want to bring this code into the L1 instruction cache prior to executing since I don't want to measure the cost of I$ ...
BeeOnRope's user avatar
  • 66.3k
17 votes
2 answers
9k views

Let's say I have following models class Foo(models.Model): ... class Prop(models.Model): ... class Bar(models.Model): foo: models.ForeignKey(Foo, related_name='bars', ...) prop: ...
Pavan Kumar's user avatar
  • 1,971
82 votes
5 answers
37k views

preload and prefetch are both used to request resources in advance so that later resource loading can be quick. It seems that I can interchange the two without noticing anything difference: <link ...
Chiawen's user avatar
  • 12k
35 votes
9 answers
62k views

I am still learning tensorflow and keras, and I suspect this question has a very easy answer I'm just missing due to lack of familiarity. I have a PrefetchDataset object: > print(tf_test) $ <...
jda's user avatar
  • 536
23 votes
3 answers
9k views

I'm trying out the <link rel="dns-prefetch"> and <link rel="preconnect"> tags and I'm trying to see whether they help for my site. I can't find any online resources about how verify if ...
webbower's user avatar
  • 786
22 votes
3 answers
11k views

It appears the general logic for prefetch usage is that prefetch can be added, provided the code is busy in processing until the prefetch instruction completes its operation. But, it seems that if too ...
Karthik Balaguru's user avatar
21 votes
1 answer
8k views

The PREFETCHNTA instruction is basically used to bring the data from main memory to caches by the prefetcher, but instructions with the NT suffix are known to skip caches and avoid cache pollution. ...
Abhishek Nikam's user avatar
14 votes
1 answer
707 views

What is the cost of a late prefetch done with a __builtin_prefetch(..., 1) intrinsic (prefetch in preparation for a write)? That is, a prefetch that does not arrive in the L1 cache before the demand ...
Curious's user avatar
  • 21.3k
4 votes
2 answers
5k views

I want to prefetch some code into the instruction cache. The code path is used infrequently but I need it to be in the instruction cache or at least in L2 for the rare cases that it is used. I have ...
Carlos Pinto Coelho's user avatar
21 votes
3 answers
13k views

I'm debugging a web application running in visual studio with some breakpoints on some code that runs on every request to my web application. I find that in Chrome, as I type the URL past the host, it ...
Ryan Mann's user avatar
  • 5,387
17 votes
2 answers
18k views

In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). What I was thinking was to keep prefetching the part of the block my program ...
pythonic's user avatar
  • 21.9k
11 votes
4 answers
3k views

I have a very simple UICollectionView that uses compositional layout to easily achieve dynamic cell heights. Unfortunately doing that seems to disable content prefetching using ...
Gereon's user avatar
  • 18k
10 votes
2 answers
6k views

I am working on data prefetch in CUDA (Fermi GPU) through C code. Cuda reference manual talks about the prefetching at ptx level code not at C level code. Can anyone connect me with some documents ...
user1805482's user avatar
7 votes
2 answers
4k views

I am getting Error while trying to disable Hardware prefetcher in my Core i7 system. I am following the method as per the link How do I programmatically disable hardware prefetching? In my system ...
bholanath's user avatar
  • 1,761
7 votes
2 answers
612 views

I have an object of 64 byte in size: typedef struct _object{ int value; char pad[60]; } object; in main I am initializing array of object: volatile object * array; int arr_size = 1000000; array =...
Ana Khorguani's user avatar
6 votes
1 answer
3k views

please help! [+] What I have: A lot of blobs in every bucket. Blobs can vary in size from being less than a Kilo-byte to being lots of Giga-bytes. [+] What I'm trying to do: I need to be able to ...
user avatar
2 votes
2 answers
3k views

Does corei3 support hardware prefetching through hardware prefetcher? If yes, how do I enable/disable it?
Madhu Sagar's user avatar
0 votes
1 answer
164 views

Is there exist 'standard' way to force C compiler to not skip 'dummy load' operation that is forcing 'load prefetch' to CPU cache ? In assembler it is simply load operation like mov eax,[ebx] and ...
DTL2020's user avatar
  • 101
20 votes
2 answers
8k views

Given a JSON API endpoint /api/config, we're trying to use <link rel="prefetch" href="/api/config"> in the head of an HTML document. Chrome downloads the data as expected when it hits the link ...
bsa's user avatar
  • 2,811
19 votes
1 answer
17k views

I am trying to use new features of TF, namely Data API, and I am not sure how prefetch works. In the code below def dataset_input_fn(...) dataset = tf.data.TFRecordDataset(filenames, ...
MPękalski's user avatar
  • 7,104
14 votes
4 answers
14k views

I want to use prefetch and I can't have it working ! Here is my code : function initAutocompletion() { $("input[data-autocomplete-prefetch-url]").each(function () { var $this = $(this); ...
boblemar's user avatar
  • 1,163
12 votes
2 answers
2k views

I did a test with this for (i32 i = 0; i < 0x800000; ++i) { // Hopefully this can disable hardware prefetch i32 k = (i * 997 & 0x7FFFFF) * 0x40; _mm_prefetch(...
BlueWanderer's user avatar
  • 2,701
12 votes
2 answers
14k views

<!DOCTYPE html> <meta charset="utf-8"> <title>An HTML Document</title> <link rel="prefetch" href="https://www.apple.com/"> <link rel="prerender" href="https://www....
weilou's user avatar
  • 4,668
11 votes
2 answers
9k views

I have started to load a few key resources and pages using the prefetch/prerender system. Is there a way to ensure that the resources in question are actually being preloaded?
Mild Fuzz's user avatar
  • 31.3k
10 votes
3 answers
6k views

I tried the following code: RootPanel root = RootPanel.get("root"); root.clear(); final FlowPanel p = new FlowPanel(); root.add(p); for (int i=0; i<20; ++i) { String url = "/thumb/"+i; ...
Anthony's user avatar
  • 12.9k
10 votes
1 answer
2k views

CPU's use branch prediction to speed up code, but only if the first branch is actually taken. Why not simply take both branches? That is, assume both branches will be hit, cache both sides, and the ...
AbstractDissonance's user avatar
9 votes
3 answers
8k views

I'm experiencing some severe performances issues with prefetch_related on a Model with 5 m2m fields and I'm pre-fetching also few nested m2m fields. class TaskModelManager(models.Manager): def ...
Lucas Miller's user avatar
9 votes
1 answer
5k views

Using Oracle java JDBC (ojdbc14 10.2.x), loading a query with many rows takes forever (high latency environment. This is apparently the default prefetch in Oracle JDBC is default size "10" which ...
rogerdpack's user avatar
  • 67.7k
9 votes
1 answer
11k views

I am following TensorFlow's Image Segmentation tutorial. In there there are the following lines: train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat() train_dataset = ...
robertspierre's user avatar
7 votes
1 answer
1k views

Here are three tables: product, model, and product_model that maps products and models in N:M relationship. product product_model model id name product_id model_id ...
gypark's user avatar
  • 632
7 votes
0 answers
598 views

Intel hardware Prefetcher Intel website shows that there are four kinds of hardware prefechers. The prefetcher controlled by bit 3 is the L1 stride prefetcher. I am running a test code to test what'...
JasperMa's user avatar
7 votes
2 answers
3k views

The gcc docs talk about a difference between prefetch for read and prefetch for write. What is the technical difference?
user1978011's user avatar
  • 3,629
6 votes
1 answer
268 views

I want to prefetch data into the L1 cache and perform other work while waiting for the data to arrive, to avoid stalling the loop. Is there a way to determine if the prefetched data has arrived in the ...
wepajakeg's user avatar
6 votes
2 answers
2k views

In my Django project I have a Profile for each django User, and the Profile is related to an Info model. Both relationships are OneToOne. Since most of the time I am using both the Profile and the ...
George Octavian Rabanca's user avatar
6 votes
2 answers
2k views

is there a way to programmatically disable the hardware prefetcher on an AMD system like you can in an Intel system as discussed in this topic Specifically for the AMD Opteron Barcelona or Istanbul ...
Mark's user avatar
  • 3,215
5 votes
3 answers
4k views

Has anyone had experience using prefetch instructions for the Core 2 Duo processor? I've been using the (standard?) prefetch set (prefetchnta, prefetcht1, etc) with success for a series of P4 ...
Darren Engwirda's user avatar