2

When calloc is being used pointers to newly allocated memory are aligned to at least certian number of the least significant bits, meaning that least significent bits(as tagged pointeres) can be used for lock-free algorithms, and in fact is commonly used in case of those algorithms. I was testing memory menagment feature on linux ubuntu server( x86_64 GNU/Linux, 3.10.23-xxxx-std-ipv6-64-vps) and it seems, from my experiments, that the 4 least significant bits are set to 0. From what i have read it states that pointer alignment is formed in such a way for pointer expressed as uintptr to be divided by 4(alignment to 2 least significant bits)

What is the minimum number of the least significant bits in newly allocated memory pointers, obtained from memory menagment system in POSIX (linux), that are always set to 0 during initial memory allocation process?

What is the maximum number of the least significant bits that can be used as tagged pointers on linux systems (eg. lock-free algorithms)?

How to force compiler to align newly allocated pointers to exect number of the least significant bits ?

Does the alignment of pointers affect system overall performance, and how ?

4
  • There's no "minimum" or "maximum" number of least significant bits that are set to 0. Memory alignment is highly platform dependent. All you know is that the system typically has one type that constrains memory alignment, and that malloc() and other memory allocation functions always return a pointer that is suitably aligned for the most stringent type (and thus can be used with any type) Commented Jun 8, 2015 at 13:40
  • And why do you think there's a relation between memory alignment and lock-free algorithms? Commented Jun 8, 2015 at 13:42
  • I think you might want to read man7.org/linux/man-pages/man3/posix_memalign.3.html. Typically memory alignment is based on sizeof(double). If you call the functions in the man pages, it can be changed. Commented Jun 8, 2015 at 13:51
  • @Robert Jacobs Relation between memory alignment and lock-free algorithms is when one is able to use tagged pointers. Commented Jun 8, 2015 at 13:52

3 Answers 3

4

Alignment is important in optimization for many related reasons:

  • efficient usage of the cache lines
  • avoid to disable the prefetching logics
  • best usage of vector registers/instructions (SSE, AVX).
  • especially when I/O is concerned, also memory page alignment can be important.

You can find very good references for Intel architecture here: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

Answering quickly to your questions:

What is the minimum number of the least significant bits in newly allocated memory pointers, obtained from memory menagment system in POSIX (linux), that are always set to 0 during initial memory allocation process?

It actually depends on the CPU/architecture you are speaking of.

What is the maximum number of the least significant bits that can be used as tagged pointers on linux systems (eg. lock-free algorithms)?

The same as the former: you should use std::atomic or boost::atomic in order to have some sort of portability, if C++ is an option.

On Intel architectures, memory load and stores are atomic for 32 bit, on x86_32, for 64 on x86_64, if data are properly aligned.

If you are really enjoying this kind of low level, don't forget to have a look into memory semantics, memory fences and so on ("Fence instructions" in the above manual)

Sign up to request clarification or add additional context in comments.

1 Comment

To add on excellent answer from @Sigismondo: some SSE/AVX instructions generate a bus error if memory is not aligned.
2

I'm afraid I can't answer your whole question, but I can make a start:

Pointer alignment might not only change performance but also necessary to make your code work at all. Especially for things like ARM processors you can't read numbers larger then 1 byte if the pointer is unaligned. Doing this will result in an error.

If I, for example, work with a big data-stream I prefer have my data aligned so I can read more bytes at the same time, instead having to read byte for byte what will cost more time/CPU.

Comments

2

on x86/x86_64 architecture reading/writing to unaligned memory is paid with a performance cost, because you will need two memory ops instead of a single one: the bus operations to/from memory are always aligned. On GNU/Linux you can use posix_memalign & C. to get heap aligned memory (man memalign) in user space.

Some compilers also supports macro to get aligned memory on the stack, for instance

/* GCC align declarator */
#define MYMEMALIGN(x, y) x __attribute__( (aligned( y )) )
#endif

but I guess this are non portable solutions.

1 Comment

On recent Intel CPUs, the penalty for unaligned loads/stores is zero, except when the data crosses a cache-line. And potentially an even larger penalty for crossing a page line. So it's still a good idea to try to align your data, but there is good hardware support for unaligned data for cases where you need different offsets into your data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.