I'm really interested by an answer if you find one!
Unfortunately plenty of things can change the numerical results...
For being efficient, some LAPACK algorithm iterate on sub-matrix blocks. For optimal efficiency, the size of the blocks has to fit somehow the size of CPU L1/L2/L3 caches...
The size of block is controlled by LAPACK routine ILAENV see http://www.netlib.org/lapack/lug/node120.html
Of course, if block size differ, result will differ numerically... It is possible that the lapack/BLAS DLL provided by Matlab are compiled with a differently tuned version of ILAENV on the two machines, or that ILAENV has been replaced with a customized optimized version taking cache size into account, you could check by yourself making a small C program which call ILAENV and link it to DLL provided by Matlab...
For underlying BLAS, it's even worse: if an optimized version is used, some fused mul-add FPU instruction might be used when available by exemple, and they are not necessarily available on all FPU. AFAIK, Matlab use ATLAS http://math-atlas.sourceforge.net/, and you'll have to inquire about how the lirary was produced... You would have to track differences in the result of basic algebra operations (like matrix*vector or matrix*matrix ...).
UPDATE: Even if ILAENV were the same, QR uses elementary rotations, so it obviously depend on sin/cos implementation. Unfortunately no standard tells exactly how sin and cos should bitwise-behave, they can be off a few ulp from rounded exact result, and differ from one library to another and will give different results on different architectures/compilers (hardwired in x87 FPU). So unless you provide your own version of these functions (or work in ADA) and compile them with specially crafted compiler options, and maybe finely control the FPU modes there's close to no chance to find exactly same results on different architectures... You will also have to ask Matlab if they took special care to insure floating point deterministic results when they compiled those libraries.