I recently updated an old project from DX9(ShaderModel3) to DX11 (ShaderModel5). One of the things that I noticed immediately was that there were suddenly a lot of NaNs showing up which didn’t exist earlier. This was a little surprising since the shaders hadn’t changed. So, the difference must be in the HLSL intrinsics. A simple google search didn’t return what I was looking for. So, I decided to look at the generated assembly for differences.

The following sample programs where compiled with fxc’s /Fc option for the specific profile such as

fxc test.ps /T ps_5_0 /Fc test_sm5.asm

Here is a shader that takes in the XY components of a vector and reconstructs Z assuming that the vector was normalized in the first place. This is usually used when decoding normals.

float2 NormalXY;

float4 main() : SV_Target
{
	float NormalZ = sqrt(1.f - dot(NormalXY,NormalXY));
	return float4(NormalXY, NormalZ, 0.0);
}

Here is the SM5 assembly. This is mostly expected - take the dot product of the NormalXY vector, subtract from 1.0, and take the square root to get NormalZ.

ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output o0.xyzw
dcl_temps 1
dp2 r0.x, cb0[0].xyxx, cb0[0].xyxx
add r0.x, -r0.x, l(1.000000)
sqrt o0.z, r0.x
mov o0.xy, cb0[0].xyxx
mov o0.w, l(0)
ret

and here is the corresponding SM3 assebly. The dot product and subraction are still there (although it is rolled into one dp2add instruction). However, instead of taking the sqare root, it computes the inverse square root (rsq) and then the reciprocal (rcp) of that. Whoa!

ps_3_0
def c1, 1, 0, 0, 0
mov r0.xy, c0
dp2add r0.z, r0, -r0, c1.x
rsq r0.z, r0.z
rcp oC0.z, r0.z
mul oC0.xyw, r0.xyzx, c1.xxzy

Turns out there is a good reason for that - you can compute an approximate inverse quare root much faster than an actual square root. See ¹ and ².

Additionally, SM3 takes the absolute value of the argument ³ before computing rsq() which swallows up any NaNs resulting from when the argument is negative.

Note that the dot product of two normalized vectors can be greater than 1.0 due to floating point inaccuracy. So, sqrt( 1.f - dot(A,B)) can produce NaNs even if A and B are normalized. Use sqrt(1.f - saturate(dot(A,B))) instead.

Another source of NaN I ran into was pow(), which is used all over the place in lighting code. Take this program as an example.

float x,y;

float4 main() : SV_Target
{
	float res = pow(x,y);
	return float4(res, res, res, 0.0);
}

This is what the generated assembly looks like

ps_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[1], immediateIndexed
dcl_output o0.xyzw
dcl_temps 1
log r0.x, cb0[0].x
mul r0.x, r0.x, cb0[0].y
exp o0.xyz, r0.xxxx
mov o0.w, l(0)
ret

The thing to note here is that pow(x,y) has been changed to exp(y * log(x)). This is because exp() and log() are quarter rate instructions⁴,⁵. Which means that x needs to be a positive non-zero value, otherwise the result is undefined. Turns out SM3 implementation of log() already performs this high level logic ⁶ for us which suppresses any NaNs.

float v = abs(src);
if (v != 0)
{
    dest.x = dest.y = dest.z = dest.w = 
        (float)(log(v)/log(2));  
}
else
{
    dest.x = dest.y = dest.z = dest.w = -FLT_MAX;
}

In short, for SM5 the intrinsic functions do just what you ask them to do - nothing more. It is upto you to feed correct values to these functions or make sure you saturate() or clamp() your values to the correct range.

Tracking Down NaNs in Shader Code

References