Selecting the correct frustum split
I have been working on my Parallel Split Shadow Mapping implementation for a while.. a while? Hm.. almost for five weeks. Yesterday, I have just proven that there still is some room for optimizations. While my implementation renders four frustum splits into the ARGB channels of a texture, instead of using four shadow maps per split, it became a mess selecting the correct split channel and matrices within the fragment shader. This saves three textures and thus a whole bunch of texture memory, but it’s a bit more complicated to switch between the different channels.
This is how the splits actually look like:
Where Black = 0, Red = 1, Yellow = 2, White = 3
Actually, selecting the proper split is very easy - easy to solve. Generally, we need a function that satifies the following equation:
As you see, this function needs to perform at least three tests to output the proper index. But encoding this in HLSL is a bit more complicated when you want it optimized. My first approach was very stupid, but see yourself:
half GetSplitByDepth(float fDepth)
{
half nSplitID = 3;
while( fDepth >= g_fSplitDistances[nSplitID] )
nSplitID--;
return nSplitID;
}
Note that asymmetric returns are not supported by my old Geforce 7800 GTX… I don’t even know if they are by newer ones, but regardless of this it doesn’t matter, cause this would break the rules of well-structured programming. But breaking the rules is a good thing when it ends up with a performance boost. But let me stop the dumb talk, here are the results of this method:
ps_3_0 def c1, 1, 0, -1, 0 dcl_texcoord1 v0.z add r0, -c0.wzyx, v0.z mp r0, r0, c1.x, c1.y mul r0.x, r0.y, r0.x mul r0.x, r0.z, r0.x mul r0.y, r0.w, r0.x cmp_pp r0.x, -r0.x, c1.x, c1.y cmp_pp oC0, -r0.y, r0.x, c1.z
// approximately 7 instruction slots used
ps_2_0
def c1, 1, 0, -1, 0
dcl t1.xyzadd r0.w, t1.z, -c0.w
cmp r0.y, r0.y, c1.x, c1.y
mul r0.x, r0.x, r0.y
add r0.y, t1.z, -c0.y
cmp r0.y, r0.y, c1.x, c1.y
mul r0.x, r0.x, r0.y
add r0.y, t1.z, -c0.x
cmp r0.y, r0.y, c1.x, c1.y
mul r0.y, r0.x, r0.y
cmp_pp r0.x, -r0.x, c1.x, c1.y
cmp_pp r0, -r0.y, r0.x, c1.z
mov_pp oC0, r0
// approximately 14 instruction slots used
So this is the crappiest solution. 14 instruction slots is probably the shittiest even possible solution. Let’s just forget this gimp and take a look at my second approach:
half GetSplitByDepth(float fDepth)
{
half nSplitID = 3;
if( fDepth >= g_fSplitDistances[3] )
fSplitID = 3;
else if( fDepth >= g_fSplitDistances[2] )
fSplitID = 2;
else if( fDepth >= g_fSplitDistances[1] )
fSplitID = 1;
return nSplitID;
}
So this should be logically the same as approach no. one, but you never know what the compiler does with it. Actually, its very different:
ps_3_0 def c1, 1, 0, 2, 3 dcl_texcoord1 v0.z add r0.xyz, -c0.wzyw, v0.z cmp_pp r0.z, r0.z, c1.x, c1.y cmp_pp r0.y, r0.y, c1.z, r0.z cmp_pp oC0, r0.x, c1.w, r0.y // approximately 4 instruction slots used ps_2_0 def c1, 1, 0, 2, 3 dcl t1.xyz add r0.w, t1.z, -c0.y cmp_pp r0.x, r0.w, c1.x, c1.y add r0.y, t1.z, -c0.z cmp_pp r0.x, r0.y, c1.z, r0.x add r0.y, t1.z, -c0.w cmp_pp r0, r0.y, c1.w, r0.x mov_pp oC0, r0 // approximately 7 instruction slots used
Four instructions on SM 3.0 and seven on SM 2.0. Thanks to dynamic branching abilities on SM 3.0, but on the good ol’ vanilla SM 2.0.. it’s not perfect. But I was able to get it (IMHO) perfect:
half GetSplitByDepth(float fDepth)
{
float4 fTest = fDepth > g_fSplitDistances;
return dot(fTest, fTest);
}
ps_3_0 def c1, 0, 1, 0, 0 dcl_texcoord1 v0.z add r0, c0, -v0.z cmp r0, r0, c1.x, c1.y dp4_pp oC0, r0, r0 // approximately 3 instruction slots used ps_2_0 def c1, 0, 1, 0, 0 dcl t1.xyz add r0, -t1.z, c0 cmp r0, r0, c1.x, c1.y dp4 r0, r0, r0 mov_pp oC0, r0 // approximately 4 instruction slots used
So this is THE solution, isn’t it? Think a bit about it ![]()


Trackback
RSS Feed

2 comments
1.
Kyle Hayward wrote at Monday, September 8, 2008 / 06:43
Ha, nice little trick there
I’ll have to remember this when I get around to implementing PSSM.
tschüss
2.
KrisBelucci wrote at Tuesday, June 2, 2009 / 19:12
Hi, cool post. I have been wondering about this topic,so thanks for writing.