Oh My Gosh

Source: http://www.lrs-auditiv.de

Source: http://www.lrs-auditiv.de
I recently had a cool idea about Parallel Split Shadow Mapping and the light projection on the different frustum splits. By default, every frustum split is handled separately as uniform shadowing or with a special projection algorithm as PSM or LiSPSM. This ensures that every split frustum is enclosed as much as possible to maximize the texel ratio of the shadow/depth map on that volume. But since we have several splits - I use four splits encoded in a ARGB 32bit floating point texture - we also have different light far planes used for the orthogonal projection on that split.
I have done a small and nasty sketch to visualize this issue and since my sketching skills are lim => 0.. eh.. you’ll get the point:
Keeping that construct (?) thing in mind you may notice that the depth representation of the different splits differ, since they are “normalized” to fit in the far plane ratio while using a linearized depth representation. However, this effect will also occur while using the default logarithmic depth representation. To point this construct (?) gizmo out, see attached:
While visualizing this by simply outputting the projected shadow map, that is, the depth vaues, on a 3D scene, it becomes more clearly:
Indiana Jones told me some time ago, that a cross never marks an important spot, but a red mark does
So what we actually want is having the depth representation constant over the whole view frustum and thus all splits. My idea was to use the far plane of the last frustum splits light for all the other light far planes before. You can simply do this by computing/rendering the frustum splits in a reversed order and thus saving the far plane computation on n-1 splits. Cool, eh?
But why do I do this? It’s because of blurring the depth map in some way. Most common blur algorithmns like the good ol’ Gaussian one has, in combination with a modern GPU, the property to work on all components of a non-skalar type at the same time. But this causes the blurred shadow map to suck, because it applies the same filter kernel to all splits at the same time, resulting in larger blurred shadow maps in the distance. It’s a good idea to scale the filter kernel by the distance for each split, but having the depth representation different between the splits comes to a problem too. So basically, we want a depth representation like this:
Having the depth representation across the splits as nice as this, you don’t even notice the split borders anymore if you simply project the shadow map onto the scene (and if you would scale your blur filters kernel size, like I don’t at the moment):
And here is the respective shadow/depth map for that picture above:
Note that the alpha channel is not visible… for some reason you may guess.
Hope this helps
This one goes out for all of you guys who know what tangent space normal mapping is and how it works
This is a working sample of a single directional light without color (=pure white).
Edit: I have been pleased to publish the code. There it is:
float4 PS(uniform struct google) { return google.output; }
Just a depressing post. My blog seems to have evolved from a small one… to a SPAM blog.
You have 381 total comments, 15 approved, 366 spam and 0 awaiting moderation.
Dear Mister Spam Bot, I don’t need any pills nor am I interested in enlarging something.
After some time of playing around with by PSSM stuff, I decided that I can’t improve any single thing. So hooray, it’s done! I’ve put my M5 project to sleep (the monkey thing, M5 = Massive Micro Mutant Monkey Madness) and joined the Iceforge team as a (lead?) programmer two weeks ago. You may know them from Darkmana. They’re currently working on a viking-like multiplayer game called Valhall Warriors Vallholl Warriors Valhöll Warriors that I was very interested in. Fourtunately, the project was at the very beginning so I was able to bring my default framework into the code layer. Years of experience and thousands of lines of my finished code made me able to build up a complete game framework in only one week (woot!).
So what do I mean by a framework? I mean a combination of:
Combining Truevision3D with NVIDIAs PhysX is a challenging one! The first thing is that PhysX doesn’t support managed languages out of the box, but there are a few (more or less) complete Wrappers around. Though, it’s not as that easy, but I made it to.. eh.. see yourself:
Don’t mind the FPS rate
There are about 50k 3D lines rendered in realtime. And the interesting thing is: Truevision3D’s built-in Newton Game Dynamics functionalities are about 40% slower than PhysX! Bye, bye, Newton!
I just released the source code, an example and a .NET library of my PSSM baby. You can find the respective post in the Truevision 3D forum here.
Hope you like it!
I finally had the time to set up my new system (4gigs DDR3, gtx280, 3,2ghz dual core) and just downloaded the latest version of Visual Studio from my MSDNAA access. I’m currently a bit gutted about Microsoft at the moment. You guys know that Microsoft loves to flood the world with copyrights (which is actually a good thing) and to name their software always prefixed with “Microsoft XXX”, like “Microsoft Office”, “Microsoft Windows” and “Microsoft Visual Studio”, but this time… look at the picture
And ya: this is a nerdy post, I know
Some guys asked me how we were able to get the chimps monkeys (acutally they’re no chimps, cause chimps don’t have a tail) look so cartoony. The effects for this are very trivial and easy to explain, but here we go:
The basic combination of the shader is a classic “albedo * diffuse + specular” term without any emissive properties. Since we only have only one directional light, we can fake as much as we want to
The diffuse part is a simple lambertian term, but with a small addition. You guys know what I’m talking about:
The standard lambertian term results in values from -1 to 1, which is bad for lighting case 50% of the lit surface will be black (since 0 is totally black). I came across the idea to use the “half lambertian term“:
This inofficial variant - first used by Valve Software in 1998 in Half-Life - has the property to result in values from 0 to 1. Means the lit surface appears more lightened and not overdarkened. But however, this will only result in a standard diffuse ligthing from white to black:
Of course you can tint it by multiplying with the ligths color, but that does not look cartoonish enough. Since the diffuse lighting is in range [0, 1], we can use a 1D texture for tinting the result:
This looks far more cartoonish and still semi-realistic. Furthermore it has the important availability of being artist-tuned. Can’t wait to see it?
But the diffuse ligthing itself doesn’t make a good lighting. And then the monkey still has a helmet that should look more like metal than the skin of a monkey. I have to admit that the helmet doesn’t look metallic in the final result, but we can take that. We use a separate “sfx” texture for scaling different factors on the whole model, like the intensity of specular and rim ligthing. Oh, did i mention rim lighting? Rim lighting has become a must in non-realistic renderings. Refer to Team Fortress 2 or Battlefield: Heros: They all use rim lighting. And so do we:
Note that there is specular lighting on the helmet only
This is due to our magic sfx texture. And one more time - this can be artist-tuned. And my artist likes it
In the end we came up with a little hacky solution. Since we use Parallel Split Shadow Maps for getting some semi-uber-shadows, the whole monkey should be shadowed too. That means we have to scale the specular and rim influence by taking care of the shadowness. In the end this looks satisfying. When the player enters a shadowed area, the specular and rim lighting disappears - guess why?
We also don’t use any filtering method on the shadows dropped onto the monkey. This saves a bit of performance that isn’t mentionable, but you don’t even realize it while playing. The white box shows what I mean.
Please, if you have any further questions or if you’re interested in the code, don’t hestitate to drop us a line
Indies for the win!
Some guys of you may know this lil video from the Truevision3D forums, where I was visualizing the different frustum splits in a tinted way:
The split distances are recalculated very time the maximum frustum depth changes. In other words: The shadow maps cover the whole visible area to push more shadow map texels onto the geometry. This happens when the player looks “outside” the map, that is, when the frustum depth is further away than the worlds boundaries. Note that the near red split disappears for some reason. In addition, the split borders become nearer to the player and when theyre to close to the camera, the borders and mainly the resolution/quality differences are noticeable. To avoid this problem/artefact, I rewrote the split distance calculation. The original code uses logarithmics to adapt the effect of view depth. I ended up with a small addition, which “snaps” the nearest split distance to a static border. The effect is quite nice and offers far better shadows:
So this one has a better split sheme applied to it. Take a look at the near red layer. It will always cover the near area around the player to assume the highest detail - regardless of the maximum frustum size.
…is what we would accept with pleasure. We found it difficult to continue development for a while, since the sunshine (and the hot babes) of Crete were so tempting
But stay tuned… our powerup HUD element already consists of 75 textures and we keep it growing. We’ll keep you up to date - there’s nothing to be said against it ![]()
What is a monkey game without bananas? One “powerup” you can get within our dispenser is the banana. Once its been loaded into your helmet, you can drop it wherever you like to. But be careful! When you step onto a banana peel, you slip away and probably fall down if you’re not fast enough.
This is a preview of the banana peel slipping:
The stand up animation is missing, but it works very well yet. While slipping away, you lose control of your movement and in the worst case you might slip off the platforms and fall to death - being a frag for the “owner” of the peel.
It’s a maddening feature and a lot of fun to place a whole mine field of banana peels!
As for many games, ours will also ship with an own editor. It’s not as that trivial, but easy to use after a few minutes of watching whats going on there. We decided to make the map design as easy and intuitive as possible, without BSPs or octrees. We came up with the idea of using puzzle pieces. This is something like.. eh.. ya.. 3D tiles. You can choose between the availble puzzle pieces within the editor and simply drag and drop them into the world. There also is a raster system which perfectly fits the puzzle pieces for easier and symmetric placement.
You can’t imagine what can be done with a small set of pieces! This is our first test set:
And thats what you actually can do with it:
So this kinda rocks!
But I have to admit that ChimpED is only a What-You-See-Is-What-You-Might-Get editor, cause there are no shadows. After a few secs of compiling that babe (grouping meshs, welding vertices, calculating better normals etc.), its ready for the game:
Et voilà, there are shadows! Compare that screenshot with the one of ChimpED. Its hard to difference the heights of the platforms without shadows, isn’t it?
If you wanna try out this editor, feel free to drop me a line
PS: While writing this post, the impossible happened:
At least there is no “Send debugging shit to Microsoft”.. lol
I have been working on my Parallel Split Shadow Mapping implementation for a while.. a while? Hm.. almost for five weeks. Yesterday, I have just proven that there still is some room for optimizations. While my implementation renders four frustum splits into the ARGB channels of a texture, instead of using four shadow maps per split, it became a mess selecting the correct split channel and matrices within the fragment shader. This saves three textures and thus a whole bunch of texture memory, but it’s a bit more complicated to switch between the different channels.
This is how the splits actually look like:
Where Black = 0, Red = 1, Yellow = 2, White = 3
Actually, selecting the proper split is very easy - easy to solve. Generally, we need a function that satifies the following equation:
As you see, this function needs to perform at least three tests to output the proper index. But encoding this in HLSL is a bit more complicated when you want it optimized. My first approach was very stupid, but see yourself:
half GetSplitByDepth(float fDepth)
{
half nSplitID = 3;
while( fDepth >= g_fSplitDistances[nSplitID] )
nSplitID--;
return nSplitID;
}
Note that asymmetric returns are not supported by my old Geforce 7800 GTX… I don’t even know if they are by newer ones, but regardless of this it doesn’t matter, cause this would break the rules of well-structured programming. But breaking the rules is a good thing when it ends up with a performance boost. But let me stop the dumb talk, here are the results of this method:
ps_3_0 def c1, 1, 0, -1, 0 dcl_texcoord1 v0.z add r0, -c0.wzyx, v0.z mp r0, r0, c1.x, c1.y mul r0.x, r0.y, r0.x mul r0.x, r0.z, r0.x mul r0.y, r0.w, r0.x cmp_pp r0.x, -r0.x, c1.x, c1.y cmp_pp oC0, -r0.y, r0.x, c1.z
// approximately 7 instruction slots used
ps_2_0
def c1, 1, 0, -1, 0
dcl t1.xyzadd r0.w, t1.z, -c0.w
cmp r0.y, r0.y, c1.x, c1.y
mul r0.x, r0.x, r0.y
add r0.y, t1.z, -c0.y
cmp r0.y, r0.y, c1.x, c1.y
mul r0.x, r0.x, r0.y
add r0.y, t1.z, -c0.x
cmp r0.y, r0.y, c1.x, c1.y
mul r0.y, r0.x, r0.y
cmp_pp r0.x, -r0.x, c1.x, c1.y
cmp_pp r0, -r0.y, r0.x, c1.z
mov_pp oC0, r0
// approximately 14 instruction slots used
So this is the crappiest solution. 14 instruction slots is probably the shittiest even possible solution. Let’s just forget this gimp and take a look at my second approach:
half GetSplitByDepth(float fDepth)
{
half nSplitID = 3;
if( fDepth >= g_fSplitDistances[3] )
fSplitID = 3;
else if( fDepth >= g_fSplitDistances[2] )
fSplitID = 2;
else if( fDepth >= g_fSplitDistances[1] )
fSplitID = 1;
return nSplitID;
}
So this should be logically the same as approach no. one, but you never know what the compiler does with it. Actually, its very different:
ps_3_0 def c1, 1, 0, 2, 3 dcl_texcoord1 v0.z add r0.xyz, -c0.wzyw, v0.z cmp_pp r0.z, r0.z, c1.x, c1.y cmp_pp r0.y, r0.y, c1.z, r0.z cmp_pp oC0, r0.x, c1.w, r0.y // approximately 4 instruction slots used ps_2_0 def c1, 1, 0, 2, 3 dcl t1.xyz add r0.w, t1.z, -c0.y cmp_pp r0.x, r0.w, c1.x, c1.y add r0.y, t1.z, -c0.z cmp_pp r0.x, r0.y, c1.z, r0.x add r0.y, t1.z, -c0.w cmp_pp r0, r0.y, c1.w, r0.x mov_pp oC0, r0 // approximately 7 instruction slots used
Four instructions on SM 3.0 and seven on SM 2.0. Thanks to dynamic branching abilities on SM 3.0, but on the good ol’ vanilla SM 2.0.. it’s not perfect. But I was able to get it (IMHO) perfect:
half GetSplitByDepth(float fDepth)
{
float4 fTest = fDepth > g_fSplitDistances;
return dot(fTest, fTest);
}
ps_3_0 def c1, 0, 1, 0, 0 dcl_texcoord1 v0.z add r0, c0, -v0.z cmp r0, r0, c1.x, c1.y dp4_pp oC0, r0, r0 // approximately 3 instruction slots used ps_2_0 def c1, 0, 1, 0, 0 dcl t1.xyz add r0, -t1.z, c0 cmp r0, r0, c1.x, c1.y dp4 r0, r0, r0 mov_pp oC0, r0 // approximately 4 instruction slots used
So this is THE solution, isn’t it? Think a bit about it ![]()
Don’t be confused about this blog - it’s just another one in the wild. And… I have to admit that this is just a stupid test post, but hopefully there will be more heady stuff soon! ![]()