0

I have a hypothesis that speculative execution on Intel Nehalem (1 gen) causing a crash. Is it possible or I completely wrong? If this is possible what can I do to prevent this? Maybe disable speculative execution for just one function or whole translation unit?

For the compilation of the cpp file that has problematic code, clang is used with flags -mavx2 -mxsave all other files compiled without these flags. This code works fine on any available contemporary mac book and windows laptop/desktop.

Testers mac book has Intel(R) Core(TM) i5 CPU 760. This CPU doesn't support AVX2 instruction set. There is a code that checks if the AVX2 supported and if not it is not executed. I can't have direct access to this device for debugging to know which exactly code causing the crash. But now I have two hypotheses:

  • code that checks if AVX2 supported is wrong and returns true when should return false
  • even though check returns false speculative execution actually run the AVX2 code causing the crash

I have already replaced/"fixed" the checking code as the primary hypothesis but tester still reports the crash. So I don't know for sure that isAvx2Supported is false.

Code that checks if AVX2 supported

void cpuid(int info[4], int InfoType) noexcept
{
#ifdef _WIN32
  __cpuidex(info, InfoType, 0);
#else
  __cpuid_count(InfoType, 0, info[0], info[1], info[2], info[3]);
#endif
}

bool check_xcr0_ymm() noexcept
{
  uint32_t xcr0;
#if defined(_MSC_VER)
  xcr0 = (uint32_t)_xgetbv(0);
#else
  __asm__ __volatile__("xgetbv" : "=a" (xcr0) : "c" (0) : "%edx");
#endif
  // checking if xmm and ymm state are enabled in XCR0
  return (xcr0 & 6) == 6;
}

bool check_4th_gen_intel_core_features() noexcept
{
  // see original article
  // https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
  int cpuInfo[4] = {};

  cpuid(cpuInfo, 1);
  //    CPUID.(EAX=01H, ECX=0H):ECX.FMA[bit 12]==1
  // && CPUID.(EAX=01H, ECX=0H):ECX.MOVBE[bit 22]==1
  // && CPUID.(EAX=01H, ECX=0H):ECX.OSXSAVE[bit 27]==1
  constexpr uint32_t fma_movbe_osxsave_mask = ((1 << 12) | (1 << 22) | (1 << 27));
  if((cpuInfo[2] & fma_movbe_osxsave_mask) != fma_movbe_osxsave_mask)
    return false;

  if(!check_xcr0_ymm())
    return false;

  cpuid(cpuInfo, 7);
  //    CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]==1
  // && CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]==1
  // && CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]==1
  constexpr uint32_t avx2_bmi12_mask = (1 << 5) | (1 << 3) | (1 << 8);
  if((cpuInfo[1] & avx2_bmi12_mask) != avx2_bmi12_mask)
    return false;

  cpuid(cpuInfo, 0x80000001);
  // CPUID.(EAX=80000001H):ECX.LZCNT[bit 5]==1
  if((cpuInfo[2] & (1 << 5)) == 0)
    return false;

  return true;
}

const auto isAvx2Supported = check_4th_gen_intel_core_features();

actual code that uses AVX2

int findCharFast(const char* data, size_t dataSize, char c, unsigned int& offset)
{
  if(isAvx2Supported)
  {
    const auto mask = _mm256_set1_epi8(c);
    auto it = data + offset;
    for(const auto end = data + dataSize - 31; it < end; it += 32)
    {
      if(const auto result = _mm256_movemask_epi8(_mm256_cmpeq_epi8(mask, _mm256_loadu_si256(reinterpret_cast<const __m256i*>(it)))))
      {
        return it - data + get_first_bit_set(result);
      }
    }
    offset = it - data;
  }
  return -1;
}

crash report says

Crashed Thread: 43 Queue(0x60400005b270)[16]

Exception Type: EXC_BAD_INSTRUCTION (SIGILL) Exception Codes: 0x0000000000000001, 0x0000000000000000

Thread 43 Crashed:: Queue(0x60400005b270)[16] 0 0x000000010d54b06b findCharFast(char const*, unsigned long, char, unsigned int&) + 91

Thread 43 crashed with X86 Thread State (64-bit): rax: 0x00000000ffffffff rbx: 0x0000000000000000 rcx: 0x000070000982bbd0 rdx: 0x0000000000000000 rdi: 0x00007f9ac044fa0b rsi: 0x000000000000000d rbp: 0x000070000982bc00 rsp: 0x000070000982bbc8 r8: 0x0000000000000001 r9: 0x0000000000000001 r10: 0x000060c000334030 r11: 0xfffffffffffc0fde r12: 0x0000000000000001 r13: 0x0000600000016fb0 r14: 0x00007f9ac044fa0b r15: 0x00007f9ac044fa18 rip: 0x000000010d54b06b rfl: 0x0000000000010246 cr2: 0x000000010d529cf0

3
  • If your program is crashing, it's almost certainly a bug in your program and not a bug in the CPU. Commented Sep 4, 2019 at 22:46
  • I didn't say that this is a bug in CPU. It may be the case that I need to handle somehow speculative execution in the findCharFast. Other words it may be the case that it is not enough to just put an if statement. Commented Sep 5, 2019 at 1:46
  • Looks like I found a similar issue in rust cargo. Looks like they follow exactly the same intel article which does not fully describe how to do a proper check. Looks like bit 26 is also has to be checked before calling xgetbv. Will update this question when tester confirm that it is fixed. Commented Sep 5, 2019 at 2:54

1 Answer 1

2

There was a bug in AVX2 support detection code. Intel article which describes how to do it is basically wrong. Before calling xgetbv implementation MUST check that ECX.XSAVE[bit 26]==1. Checking only OSXSAVE flag is not sufficient.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.