78 questions
12
votes
4
answers
6k
views
Non-temporal loads and the hardware prefetcher, do they work together?
When executing a series of _mm_stream_load_si128() calls (MOVNTDQA) from consecutive memory locations, will the hardware pre-fetcher still kick-in, or should I use explicit software prefetching (with ...
79
votes
5
answers
44k
views
Prefetching Examples?
Can anyone give an example or a link to an example which uses __builtin_prefetch in GCC (or just the asm instruction prefetcht0 in general) to gain a substantial performance advantage? In particular, ...
47
votes
4
answers
22k
views
How do I programmatically disable hardware prefetching?
I would like to programmatically disable hardware prefetching.
From Optimizing Application Performance on Intel® Core™ Microarchitecture Using Hardware-Implemented Prefetchers and
How to Choose ...
58
votes
2
answers
5k
views
Do current x86 architectures support non-temporal loads (from "normal" memory)?
I am aware of multiple questions on this topic, however, I haven't seen any clear answers nor any benchmark measurements. I thus created a simple program that works with two arrays of integers. The ...
105
votes
2
answers
91k
views
Why does django's prefetch_related() only work with all() and not filter()?
suppose I have this model:
class PhotoAlbum(models.Model):
title = models.CharField(max_length=128)
author = models.CharField(max_length=128)
class Photo(models.Model):
album = models....
6
votes
1
answer
1k
views
X86 prefetching optimizations: "computed goto" threaded code
I have a rather non-trivial problem, where my computational graph has cycles and multiple "computational paths". Instead of making a dispatcher loop, where each vertex will be called one-by-one, I had ...
3
votes
2
answers
5k
views
why does GCC __builtin_prefetch not improve performance?
I'm writing a program to analyze a graph of social network. It means the program needs a lot of random memory accesses. It seems to me prefetch should help. Here is a small piece of the code of ...
41
votes
2
answers
16k
views
How to prefetch data using a custom python function in tensorflow
I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other ...
11
votes
6
answers
13k
views
When should we use prefetch?
Some CPU and compilers supply prefetch instructions. Eg: __builtin_prefetch in GCC Document. Although there is a comment in GCC's document, but it's too short to me.
I want to know, in practice, when ...
5
votes
2
answers
3k
views
What is the effect of second argument in _builtin_prefetch()?
The GCC doc here specifies the usage of _buitin_prefetch.
Third argument is perfect.
If it is 0, compiler generates prefetchtnta (%rax) instruction
If it is 1, compiler generates prefetcht2 (%rax) ...
33
votes
1
answer
27k
views
What are _mm_prefetch() locality hints?
The intrinsics guide says only this much about void _mm_prefetch (char const* p, int i) :
Fetch the line of data from memory that contains address p to a
location in the cache heirarchy specified ...
28
votes
2
answers
6k
views
Does software prefetching allocate a Line Fill Buffer (LFB)?
I've realized that Little's Law limits how fast data can be transferred at a given latency and with a given level of concurrency. If you want to transfer something faster, you either need larger ...
19
votes
1
answer
17k
views
How to properly use prefetch instructions?
I am trying to vectorize a loop, computing dot product of a large float vectors. I am computing it in parallel, utilizing the fact that CPU has large amount of XMM registers, like this:
__m128* A, B;
...
13
votes
2
answers
4k
views
In which condition DCU prefetcher start prefetching?
I am reading about different prefetcher available in Intel Core i7 system.
I have performed experiments to understand when these prefetchers are invoked.
These are my findings
L1 IP prefetchers ...
4
votes
3
answers
2k
views
Bring code into the L1 instruction cache without executing it
Let's say I have a function that I plan to execute as part of a benchmark. I want to bring this code into the L1 instruction cache prior to executing since I don't want to measure the cost of I$ ...
17
votes
2
answers
9k
views
Django chaining prefetch_related and select_related
Let's say I have following models
class Foo(models.Model):
...
class Prop(models.Model):
...
class Bar(models.Model):
foo: models.ForeignKey(Foo, related_name='bars', ...)
prop: ...
82
votes
5
answers
37k
views
What are the differences between preload and prefetch in HTML?
preload and prefetch are both used to request resources in advance so that later resource loading can be quick. It seems that I can interchange the two without noticing anything difference:
<link ...
35
votes
9
answers
62k
views
Extract target from Tensorflow PrefetchDataset
I am still learning tensorflow and keras, and I suspect this question has a very easy answer I'm just missing due to lack of familiarity.
I have a PrefetchDataset object:
> print(tf_test)
$ <...
23
votes
3
answers
9k
views
How do you test the effects of dns-prefetch and preconnect
I'm trying out the <link rel="dns-prefetch"> and <link rel="preconnect"> tags and I'm trying to see whether they help for my site. I can't find any online resources about how verify if ...
22
votes
3
answers
11k
views
The prefetch instruction
It appears the general logic for prefetch usage is that prefetch can be added, provided the code is busy in processing until the prefetch instruction completes its operation. But, it seems that if too ...
21
votes
1
answer
8k
views
Difference between PREFETCH and PREFETCHNTA instructions
The PREFETCHNTA instruction is basically used to bring the data from main memory to caches by the prefetcher, but instructions with the NT suffix are known to skip caches and avoid cache pollution.
...
14
votes
1
answer
707
views
Cost of a sub-optimal cacheline prefetch
What is the cost of a late prefetch done with a __builtin_prefetch(..., 1) intrinsic (prefetch in preparation for a write)? That is, a prefetch that does not arrive in the L1 cache before the demand ...
4
votes
2
answers
5k
views
How can I prefetch infrequently used code?
I want to prefetch some code into the instruction cache. The code path is used infrequently but I need it to be in the instruction cache or at least in L2 for the rare cases that it is used. I have ...
21
votes
3
answers
13k
views
How to disable Pre-loading of pages or Prefetch in Google Chrome?
I'm debugging a web application running in visual studio with some breakpoints on some code that runs on every request to my web application.
I find that in Chrome, as I type the URL past the host, it ...
17
votes
2
answers
18k
views
Prefetching data to cache for x86-64
In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). What I was thinking was to keep prefetching the part of the block my program ...
11
votes
4
answers
3k
views
UICollectionView: compositional layout disables prefetching?
I have a very simple UICollectionView that uses compositional layout to easily achieve dynamic cell heights. Unfortunately doing that seems to disable content prefetching using ...
10
votes
2
answers
6k
views
Prefetch in cuda (through C code)
I am working on data prefetch in CUDA (Fermi GPU) through C code. Cuda reference manual talks about the prefetching at ptx level code not at C level code.
Can anyone connect me with some documents ...
7
votes
2
answers
4k
views
Unable to disable Hardware prefetcher in Core i7
I am getting Error while trying to disable Hardware prefetcher in my Core i7 system. I am following the method as per the link How do I programmatically disable hardware prefetching?
In my system
...
7
votes
2
answers
612
views
Why does using MFENCE with store instruction block prefetching in L1 cache?
I have an object of 64 byte in size:
typedef struct _object{
int value;
char pad[60];
} object;
in main I am initializing array of object:
volatile object * array;
int arr_size = 1000000;
array =...
6
votes
1
answer
3k
views
Reading really big blobs without downloading them in Google Cloud (streaming?)
please help!
[+] What I have:
A lot of blobs in every bucket. Blobs can vary in size from being less than a Kilo-byte to being lots of Giga-bytes.
[+] What I'm trying to do:
I need to be able to ...
2
votes
2
answers
3k
views
Hardware prefetching in corei3
Does corei3 support hardware prefetching through hardware prefetcher? If yes, how do I enable/disable it?
0
votes
1
answer
164
views
.Force optimizing C compiler to not skip 'load-prefetch' operation
Is there exist 'standard' way to force C compiler to not skip 'dummy load' operation that is forcing 'load prefetch' to CPU cache ?
In assembler it is simply load operation like
mov eax,[ebx]
and ...
20
votes
2
answers
8k
views
Can link prefetch be used to cache a JSON API response for a later XHR request?
Given a JSON API endpoint /api/config, we're trying to use <link rel="prefetch" href="/api/config"> in the head of an HTML document. Chrome downloads the data as expected when it hits the link ...
19
votes
1
answer
17k
views
Tensorflow Data API - prefetch
I am trying to use new features of TF, namely Data API, and I am not sure how prefetch works. In the code below
def dataset_input_fn(...)
dataset = tf.data.TFRecordDataset(filenames, ...
14
votes
4
answers
14k
views
typeahead, bloodhound : remote works but not prefetch
I want to use prefetch and I can't have it working !
Here is my code :
function initAutocompletion() {
$("input[data-autocomplete-prefetch-url]").each(function () {
var $this = $(this);
...
12
votes
2
answers
2k
views
When program will benefit from prefetch & non-temporal load/store?
I did a test with this
for (i32 i = 0; i < 0x800000; ++i)
{
// Hopefully this can disable hardware prefetch
i32 k = (i * 997 & 0x7FFFFF) * 0x40;
_mm_prefetch(...
12
votes
2
answers
14k
views
What's the right method to set a new prerender or prefetch in HTML?
<!DOCTYPE html>
<meta charset="utf-8">
<title>An HTML Document</title>
<link rel="prefetch" href="https://www.apple.com/">
<link rel="prerender" href="https://www....
11
votes
2
answers
9k
views
how to test prefetch/prerender
I have started to load a few key resources and pages using the prefetch/prerender system.
Is there a way to ensure that the resources in question are actually being preloaded?
10
votes
3
answers
6k
views
How to prefetch image in GWT?
I tried the following code:
RootPanel root = RootPanel.get("root");
root.clear();
final FlowPanel p = new FlowPanel();
root.add(p);
for (int i=0; i<20; ++i) {
String url = "/thumb/"+i;
...
10
votes
1
answer
2k
views
Why not just predict both branches?
CPU's use branch prediction to speed up code, but only if the first branch is actually taken.
Why not simply take both branches? That is, assume both branches will be hit, cache both sides, and the ...
9
votes
3
answers
8k
views
Django prefetch_related optimize query but still very slow
I'm experiencing some severe performances issues with prefetch_related on a Model with 5 m2m fields and I'm pre-fetching also few nested m2m fields.
class TaskModelManager(models.Manager):
def ...
9
votes
1
answer
5k
views
Oracle JDBC prefetch: how to avoid running out of RAM/how to make oracle faster high latency
Using Oracle java JDBC (ojdbc14 10.2.x), loading a query with many rows takes forever (high latency environment. This is apparently the default prefetch in Oracle JDBC is default size "10" which ...
9
votes
1
answer
11k
views
What do the TensorFlow Dataset's functions cache() and prefetch() do?
I am following TensorFlow's Image Segmentation tutorial. In there there are the following lines:
train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = ...
7
votes
1
answer
1k
views
DBIx::Class - get all relationship that was used as a condition using prefetch?
Here are three tables: product, model, and product_model that maps products and models in N:M relationship.
product product_model model
id name product_id model_id ...
7
votes
0
answers
598
views
In which conditions the L1 IP-based stride prefetcher will be triggered?
Intel hardware Prefetcher Intel website shows that there are four kinds of hardware prefechers. The prefetcher controlled by bit 3 is the L1 stride prefetcher. I am running a test code to test what'...
7
votes
2
answers
3k
views
Difference between prefetch for read or write
The gcc docs talk about a difference between prefetch for read and prefetch for write. What is the technical difference?
6
votes
1
answer
268
views
Can I read a CPU x86 flag to determine if prefetched data has arrived in the L1 cache?
I want to prefetch data into the L1 cache and perform other work while waiting for the data to arrive, to avoid stalling the loop. Is there a way to determine if the prefetched data has arrived in the ...
6
votes
2
answers
2k
views
Automatically select related for OneToOne field
In my Django project I have a Profile for each django User, and the Profile is related to an Info model. Both relationships are OneToOne. Since most of the time I am using both the Profile and the ...
6
votes
2
answers
2k
views
programmatically disable hardware prefetching on AMD systems
is there a way to programmatically disable the hardware prefetcher on an AMD system like you can in an Intel system as discussed in this topic
Specifically for the AMD Opteron Barcelona or Istanbul ...
5
votes
3
answers
4k
views
Prefetch for Intel Core 2 Duo
Has anyone had experience using prefetch instructions for the Core 2 Duo processor?
I've been using the (standard?) prefetch set (prefetchnta, prefetcht1, etc) with success for a series of P4 ...