Frequent 'prefetch' Questions

12 votes

4 answers

6k views

Non-temporal loads and the hardware prefetcher, do they work together?

When executing a series of _mm_stream_load_si128() calls (MOVNTDQA) from consecutive memory locations, will the hardware pre-fetcher still kick-in, or should I use explicit software prefetching (with ...

BlueStrat

2,324

asked Aug 19, 2015 at 19:23

79 votes

5 answers

44k views

Prefetching Examples?

Can anyone give an example or a link to an example which uses __builtin_prefetch in GCC (or just the asm instruction prefetcht0 in general) to gain a substantial performance advantage? In particular, ...

Shaun Harker

963

asked Sep 7, 2011 at 1:37

47 votes

4 answers

22k views

How do I programmatically disable hardware prefetching?

I would like to programmatically disable hardware prefetching. From Optimizing Application Performance on Intel® Core™ Microarchitecture Using Hardware-Implemented Prefetchers and How to Choose ...

Carlos

1,525

asked Apr 23, 2009 at 23:56

58 votes

2 answers

5k views

Do current x86 architectures support non-temporal loads (from "normal" memory)?

I am aware of multiple questions on this topic, however, I haven't seen any clear answers nor any benchmark measurements. I thus created a simple program that works with two arrays of integers. The ...

Daniel Langr

24.2k

asked Oct 17, 2016 at 22:52

105 votes

2 answers

91k views

Why does django's prefetch_related() only work with all() and not filter()?

suppose I have this model: class PhotoAlbum(models.Model): title = models.CharField(max_length=128) author = models.CharField(max_length=128) class Photo(models.Model): album = models....

Timmmm

99.2k

asked Oct 19, 2012 at 12:05

6 votes

1 answer

1k views

X86 prefetching optimizations: "computed goto" threaded code

I have a rather non-trivial problem, where my computational graph has cycles and multiple "computational paths". Instead of making a dispatcher loop, where each vertex will be called one-by-one, I had ...

artemonster

773

asked Sep 20, 2017 at 12:01

3 votes

2 answers

5k views

why does GCC __builtin_prefetch not improve performance?

I'm writing a program to analyze a graph of social network. It means the program needs a lot of random memory accesses. It seems to me prefetch should help. Here is a small piece of the code of ...

Da Zheng

121

asked Mar 23, 2015 at 4:30

41 votes

2 answers

16k views

How to prefetch data using a custom python function in tensorflow

I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other ...

read Read

6,113

asked Jan 4, 2016 at 15:14

11 votes

6 answers

13k views

When should we use prefetch?

Some CPU and compilers supply prefetch instructions. Eg: __builtin_prefetch in GCC Document. Although there is a comment in GCC's document, but it's too short to me. I want to know, in practice, when ...

superK

4,002

asked Dec 20, 2013 at 5:54

5 votes

2 answers

3k views

What is the effect of second argument in _builtin_prefetch()?

The GCC doc here specifies the usage of _buitin_prefetch. Third argument is perfect. If it is 0, compiler generates prefetchtnta (%rax) instruction If it is 1, compiler generates prefetcht2 (%rax) ...

ANTHONY

373

asked Nov 9, 2016 at 18:02

33 votes

1 answer

27k views

What are _mm_prefetch() locality hints?

The intrinsics guide says only this much about void _mm_prefetch (char const* p, int i) : Fetch the line of data from memory that contains address p to a location in the cache heirarchy specified ...

Serge Rogatch

15.3k

asked Oct 2, 2017 at 8:06

28 votes

2 answers

6k views

Does software prefetching allocate a Line Fill Buffer (LFB)?

I've realized that Little's Law limits how fast data can be transferred at a given latency and with a given level of concurrency. If you want to transfer something faster, you either need larger ...

Nathan Kurz

1,729

asked Oct 19, 2013 at 22:54

19 votes

1 answer

17k views

How to properly use prefetch instructions?

I am trying to vectorize a loop, computing dot product of a large float vectors. I am computing it in parallel, utilizing the fact that CPU has large amount of XMM registers, like this: __m128* A, B; ...

xakepp35

3,332

asked Feb 26, 2018 at 18:04

13 votes

2 answers

4k views

In which condition DCU prefetcher start prefetching?

I am reading about different prefetcher available in Intel Core i7 system. I have performed experiments to understand when these prefetchers are invoked. These are my findings L1 IP prefetchers ...

bholanath

1,761

asked Nov 28, 2018 at 10:47

4 votes

3 answers

2k views

Bring code into the L1 instruction cache without executing it

Let's say I have a function that I plan to execute as part of a benchmark. I want to bring this code into the L1 instruction cache prior to executing since I don't want to measure the cost of I$ ...

BeeOnRope

66.3k

asked Feb 1, 2018 at 20:32

17 votes

2 answers

9k views

Django chaining prefetch_related and select_related

Let's say I have following models class Foo(models.Model): ... class Prop(models.Model): ... class Bar(models.Model): foo: models.ForeignKey(Foo, related_name='bars', ...) prop: ...

Pavan Kumar

1,971

asked Feb 7, 2019 at 8:45

82 votes

5 answers

37k views

What are the differences between preload and prefetch in HTML?

preload and prefetch are both used to request resources in advance so that later resource loading can be quick. It seems that I can interchange the two without noticing anything difference: <link ...

Chiawen

12k

asked Oct 11, 2018 at 16:01

35 votes

9 answers

62k views

Extract target from Tensorflow PrefetchDataset

I am still learning tensorflow and keras, and I suspect this question has a very easy answer I'm just missing due to lack of familiarity. I have a PrefetchDataset object: > print(tf_test) $ <...

jda

536

asked Jun 17, 2020 at 18:51

23 votes

3 answers

9k views

How do you test the effects of dns-prefetch and preconnect

I'm trying out the <link rel="dns-prefetch"> and <link rel="preconnect"> tags and I'm trying to see whether they help for my site. I can't find any online resources about how verify if ...

webbower

786

asked Sep 22, 2016 at 2:33

22 votes

3 answers

11k views

The prefetch instruction

It appears the general logic for prefetch usage is that prefetch can be added, provided the code is busy in processing until the prefetch instruction completes its operation. But, it seems that if too ...

Karthik Balaguru

7,938

asked Jun 26, 2010 at 6:06

21 votes

1 answer

8k views

Difference between PREFETCH and PREFETCHNTA instructions

The PREFETCHNTA instruction is basically used to bring the data from main memory to caches by the prefetcher, but instructions with the NT suffix are known to skip caches and avoid cache pollution. ...

Abhishek Nikam

708

asked Nov 12, 2018 at 21:33

14 votes

1 answer

707 views

Cost of a sub-optimal cacheline prefetch

What is the cost of a late prefetch done with a __builtin_prefetch(..., 1) intrinsic (prefetch in preparation for a write)? That is, a prefetch that does not arrive in the L1 cache before the demand ...

Curious

21.3k

asked Feb 22, 2019 at 7:11

4 votes

2 answers

5k views

How can I prefetch infrequently used code?

I want to prefetch some code into the instruction cache. The code path is used infrequently but I need it to be in the instruction cache or at least in L2 for the rare cases that it is used. I have ...

Carlos Pinto Coelho

199

asked Apr 25, 2013 at 15:24

21 votes

3 answers

13k views

How to disable Pre-loading of pages or Prefetch in Google Chrome?

I'm debugging a web application running in visual studio with some breakpoints on some code that runs on every request to my web application. I find that in Chrome, as I type the URL past the host, it ...

Ryan Mann

5,387

asked Jan 19, 2015 at 8:01

17 votes

2 answers

18k views

Prefetching data to cache for x86-64

In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). What I was thinking was to keep prefetching the part of the block my program ...

pythonic

21.9k

asked Apr 25, 2012 at 20:40

11 votes

4 answers

3k views

UICollectionView: compositional layout disables prefetching?

I have a very simple UICollectionView that uses compositional layout to easily achieve dynamic cell heights. Unfortunately doing that seems to disable content prefetching using ...

Gereon

18k

asked Jul 4, 2020 at 8:22

10 votes

2 answers

6k views

Prefetch in cuda (through C code)

I am working on data prefetch in CUDA (Fermi GPU) through C code. Cuda reference manual talks about the prefetching at ptx level code not at C level code. Can anyone connect me with some documents ...

user1805482

101

asked Nov 7, 2012 at 8:42

7 votes

2 answers

4k views

Unable to disable Hardware prefetcher in Core i7

I am getting Error while trying to disable Hardware prefetcher in my Core i7 system. I am following the method as per the link How do I programmatically disable hardware prefetching? In my system ...

bholanath

1,761

asked Oct 17, 2013 at 19:43

7 votes

2 answers

612 views

Why does using MFENCE with store instruction block prefetching in L1 cache?

I have an object of 64 byte in size: typedef struct _object{ int value; char pad[60]; } object; in main I am initializing array of object: volatile object * array; int arr_size = 1000000; array =...

Ana Khorguani

926

asked May 13, 2019 at 17:46

6 votes

1 answer

3k views

Reading really big blobs without downloading them in Google Cloud (streaming?)

please help! [+] What I have: A lot of blobs in every bucket. Blobs can vary in size from being less than a Kilo-byte to being lots of Giga-bytes. [+] What I'm trying to do: I need to be able to ...

user9773014

asked May 16, 2018 at 21:34

2 votes

2 answers

3k views

Hardware prefetching in corei3

Does corei3 support hardware prefetching through hardware prefetcher? If yes, how do I enable/disable it?

Madhu Sagar

29

asked Jul 12, 2011 at 9:38

0 votes

1 answer

164 views

.Force optimizing C compiler to not skip 'load-prefetch' operation

Is there exist 'standard' way to force C compiler to not skip 'dummy load' operation that is forcing 'load prefetch' to CPU cache ? In assembler it is simply load operation like mov eax,[ebx] and ...

DTL2020

101

asked Dec 12, 2021 at 10:33

20 votes

2 answers

8k views

Can link prefetch be used to cache a JSON API response for a later XHR request?

Given a JSON API endpoint /api/config, we're trying to use <link rel="prefetch" href="/api/config"> in the head of an HTML document. Chrome downloads the data as expected when it hits the link ...

bsa

2,811

asked Jul 26, 2016 at 15:05

19 votes

1 answer

17k views

Tensorflow Data API - prefetch

I am trying to use new features of TF, namely Data API, and I am not sure how prefetch works. In the code below def dataset_input_fn(...) dataset = tf.data.TFRecordDataset(filenames, ...

MPękalski

7,104

asked Nov 1, 2017 at 22:31

14 votes

4 answers

14k views

typeahead, bloodhound : remote works but not prefetch

I want to use prefetch and I can't have it working ! Here is my code : function initAutocompletion() { $("input[data-autocomplete-prefetch-url]").each(function () { var $this = $(this); ...

boblemar

1,163

asked May 9, 2014 at 16:13

12 votes

2 answers

2k views

When program will benefit from prefetch & non-temporal load/store?

I did a test with this for (i32 i = 0; i < 0x800000; ++i) { // Hopefully this can disable hardware prefetch i32 k = (i * 997 & 0x7FFFFF) * 0x40; _mm_prefetch(...

BlueWanderer

2,701

asked Jun 26, 2013 at 6:15

12 votes

2 answers

14k views

What's the right method to set a new prerender or prefetch in HTML?

<!DOCTYPE html> <meta charset="utf-8"> <title>An HTML Document</title> <link rel="prefetch" href="https://www.apple.com/"> <link rel="prerender" href="https://www....

weilou

4,668

asked Feb 26, 2013 at 21:43

11 votes

2 answers

9k views

how to test prefetch/prerender

I have started to load a few key resources and pages using the prefetch/prerender system. Is there a way to ensure that the resources in question are actually being preloaded?

Mild Fuzz

31.3k

asked Sep 15, 2011 at 13:07

10 votes

3 answers

6k views

How to prefetch image in GWT?

I tried the following code: RootPanel root = RootPanel.get("root"); root.clear(); final FlowPanel p = new FlowPanel(); root.add(p); for (int i=0; i<20; ++i) { String url = "/thumb/"+i; ...

Anthony

12.9k

asked Nov 11, 2010 at 14:27

10 votes

1 answer

2k views

Why not just predict both branches?

CPU's use branch prediction to speed up code, but only if the first branch is actually taken. Why not simply take both branches? That is, assume both branches will be hit, cache both sides, and the ...

AbstractDissonance

1

asked Apr 3, 2018 at 3:56

9 votes

3 answers

8k views

Django prefetch_related optimize query but still very slow

I'm experiencing some severe performances issues with prefetch_related on a Model with 5 m2m fields and I'm pre-fetching also few nested m2m fields. class TaskModelManager(models.Manager): def ...

Lucas Miller

123

asked Mar 16, 2017 at 15:13

9 votes

1 answer

5k views

Oracle JDBC prefetch: how to avoid running out of RAM/how to make oracle faster high latency

Using Oracle java JDBC (ojdbc14 10.2.x), loading a query with many rows takes forever (high latency environment. This is apparently the default prefetch in Oracle JDBC is default size "10" which ...

rogerdpack

67.7k

asked Jan 27, 2015 at 0:00

9 votes

1 answer

11k views

What do the TensorFlow Dataset's functions cache() and prefetch() do?

I am following TensorFlow's Image Segmentation tutorial. In there there are the following lines: train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat() train_dataset = ...

robertspierre

5,386

asked Dec 7, 2019 at 17:59

7 votes

1 answer

1k views

DBIx::Class - get all relationship that was used as a condition using prefetch?

Here are three tables: product, model, and product_model that maps products and models in N:M relationship. product product_model model id name product_id model_id ...

gypark

632

asked Dec 18, 2017 at 2:28

7 votes

0 answers

598 views

In which conditions the L1 IP-based stride prefetcher will be triggered?

Intel hardware Prefetcher Intel website shows that there are four kinds of hardware prefechers. The prefetcher controlled by bit 3 is the L1 stride prefetcher. I am running a test code to test what'...

JasperMa

71

asked Feb 24, 2021 at 2:48

7 votes

2 answers

3k views

Difference between prefetch for read or write

The gcc docs talk about a difference between prefetch for read and prefetch for write. What is the technical difference?

user1978011

3,629

asked May 2, 2015 at 14:20

6 votes

1 answer

268 views

Can I read a CPU x86 flag to determine if prefetched data has arrived in the L1 cache?

I want to prefetch data into the L1 cache and perform other work while waiting for the data to arrive, to avoid stalling the loop. Is there a way to determine if the prefetched data has arrived in the ...

wepajakeg

63

asked Jan 7 at 19:02

6 votes

2 answers

2k views

Automatically select related for OneToOne field

In my Django project I have a Profile for each django User, and the Profile is related to an Info model. Both relationships are OneToOne. Since most of the time I am using both the Profile and the ...

George Octavian Rabanca

765

asked Oct 30, 2013 at 4:02

6 votes

2 answers

2k views

programmatically disable hardware prefetching on AMD systems

is there a way to programmatically disable the hardware prefetcher on an AMD system like you can in an Intel system as discussed in this topic Specifically for the AMD Opteron Barcelona or Istanbul ...

Mark

3,215

asked Feb 16, 2010 at 19:12

5 votes

3 answers

4k views

Prefetch for Intel Core 2 Duo

Has anyone had experience using prefetch instructions for the Core 2 Duo processor? I've been using the (standard?) prefetch set (prefetchnta, prefetcht1, etc) with success for a series of P4 ...

Darren Engwirda

7,035

asked Nov 16, 2009 at 13:12

Collectives™ on Stack Overflow