10

The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most of them is clear, their "full name" like packed single-precision isn't always clear from the function descriptions. I have created the following table. Unfortunately some entries are missing. What are the value of them? Additional questions below the table.

abbreviation full name C/++ equivalent
ps packed single-precision float
ph packed half-precision None**
pd packed double-precision double
pch packed half-precision complex None**
pi8 ??? int8_t
pi16 ??? int16_t
pi32 ??? int32_t
epi8 ??? int8_t
epi16 ??? int16_t
epi32 ??? int32_t
epi64 ??? int64_t
epi64x ??? int64_t

Additional questions:

  1. Have I missed any?
  2. What is the difference between epiX and piX?
  3. Why does no pi64 exist?
  4. What is the difference between epi64 and epi64x?

** I have found this, but there seems to be no standard way to represent a half precision (complex) value in C/++. Please correct me if this has changed in any way.

12
  • This is off-topic as either C or C++ - any answers related to those will be very specific to particular compilers (e.g. intel compilers). To increase chances of getting a useful reply, I suggest removing those tags and finding tag(s) related specifically to intel. Commented Jan 30, 2022 at 4:41
  • I have added the C/++ tags because of the question within the Footnote. Is this still regarded as off-topic? Commented Jan 30, 2022 at 4:43
  • 1
    I'd argue it is. The C and C++ tags are related to standard C or standard C++ respectively, and your question is not relevant to that. Your question will be specific to particular compilers (intel compilers?) so better to tag accordingly Commented Jan 30, 2022 at 4:44
  • @Peter Actually intel intrinsics are supported by the "3 big ones": Clang, GCC, MSVC. However, as they are not standard I see your point. I have removed the tags. Commented Jan 30, 2022 at 4:47
  • @Brotcrunsher What about the fourth? ICC, the intel compiler. Does the Intel compiler not support intel intrinsics? Feels weird. Commented Jan 30, 2022 at 4:49

1 Answer 1

12
  1. The missing versions are at least si128 and si64, used in bitwise operations and [e]pu{8,16,32,64} for unsigned operations.

  2. epi and pi differ in e probably meaning extended; epi register target is an 128 bit xmm register, while pi targets 64-bit mmx registers.

  3. pi64 does not exists, because the original mmx instruction set was limited to 32-bit elements; si64 is still available.

  4. The main argument for using epi64x instead of epi64 needs to do with lack of function overloading in C. There was need to provide set/conversion methods both for __m128i _mm_set1_epi64(__m64) which moves from MMX to XMM and for __m128i _mm_set1_epi64x(int64_t) working with integers. Additionally it seems that in the rest of the cases the 64x suffix is reserved for modes requiring 64-bit architecture, as in movq between a register and low half of __m128i, which could be emulated by multiple instruction, and for something like __int64 _mm_cvtsd_si64x (__m128d a), which converts a double to 64-bit register target (not to memory directly).

What I would speculate, is that 'si64' and 'si128' mean scalar integer of width 64/128_, notice that there exists _mm_add_si64 (that is not original SSE intrinsic, that is SSE2 intrinsic extending the original MMX instruction set and using MMX registers). It's si64, not pi64, because only one element of the same size as the whole register is involved.

Lastly piN means packed integer of element size N targeting MMX (__m64) and epiN means packed integer of elements size N targeting XMM (__m128i).

Sign up to request clarification or add additional context in comments.

12 Comments

Pretty close to accepting this as the answer. I am only missing what the s in si stands for, same for the p in pi. Also, what is the difference between piX and siX?
@Brotcrunsher: si is I think scalar integer, just like ss is scalar single-precision vs. ps packed single. e.g. _mm_loadu_si32(void*) and _mm_cvtsi32_si128(int) are intrinsics for movd, and _mm_cvtsi32_sd is cvtsi2sd (int32 -> FP conversion). si128 like bitwise booleans and integer loads are the whole vector as a notional scalar integer that's really wide, because there aren't any meaningful element boundaries. Also with byte-shift shuffles like pslldq = _mm_bslli_si128
I would guess the epi, pi distinction follows the earlier naming convention, where ax,bx, ... were extended to eax, ebx, ....
@Aki: 4: epi64x exists because they already used up the sensible names for MMX -> XMM stuff, like SSE2 __m128i _mm_set1_epi64 (__m64 a). I have no clue why they used the x name for the plain int64_t version (or __int64 as Intel would have it); seems very shortsighted from our perspective with MMX being long obsolete and int64_t being highly relevant especially with SSE4 providing even more stuff you can do with them, and wider vectors to make it worthwhile. The two intrinsics involving epi64x at all (set and set1) were new in SSE2, along with the same-named epi64 versions.
You might be partially right: I think some (versions of) compilers chose not to provide _mm_set_epi64x for 32-bit builds for some reason, even though . But the same-named intrinsic always did the same thing if it existed at all.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.