ShaderX7

ShaderX7 has been out for a few months now, but due to its size (at 773 pages, it is by far the largest of the series) I haven’t been able to finish going through it until recently.  Here are the chapters I found most interesting (click the link for the rest of this post):

2.4 Fast Skin Shading

This article discusses a skin rendering method based on texture-space diffusion, where irradiance is drawn into texture space and then blurred before being applied to the final rendered image.  This method was first used in The Matrix Reloaded (one of the authors of the linked sketch, George Borshukov, is now at Electronic Arts and is one of the authors of this article).  We discuss texture-space diffusion in detail in Section 9.7.4 of our book; a high-quality real-time implementation was published in GPU Gems 3.

Although the technique in the GPU Gems 3 article produced very realistic results, it was only real-time in a demo sense (very simple scene, top-end graphics card).  Games have a much more lower performance bar; techniques need to run on older hardware, in large and complex scenes.  The authors needed to make the technique much faster so it could be used in games; they managed to do so without reducing quality (much).  The blur is performed in a single pass, with fewer taps.  Hierarchical depth culling is also utilized to avoid computations on backfacing triangles.  The authors are experienced game developers, and it shows; practical considerations are thoroughly taken into account.  All in all, a very good article, although there are some minor inaccuracies in the discussion of specular terms (the specular reflectivity of skin is about 20 times less than metal, not close to it as they claim).

2.5 An Efficient and Physically Plausible Real-Time Shading Model

This article takes a similar tack to Section 7.6 in our book, by pointing out the drawbacks of existing shading models and deriving one which is both fast and physically grounded.  The author, Christian Schüler, is an programmer at Replay Studios, and the shading model described here was used in the game Velvet Assassin.  It is refreshing to see a treatment of this subject informed by game development practicalities as well as correct rendering physics and math.  An interesting feature of this shading model is that it includes support for a two-color hemisphere ambient light, with both diffuse and specular response.  The author even includes a table of real-world specular reflectances derived from spectral index of refraction information (similar to Tables 7.3 and 7.4 in our book; since each table has materials the other lacks, they are nicely complementary).  Reflection models are treated (quite rightly) in a modular fashion; too many articles treat them as “all or nothing” choices.  The “cosine power texture” used actually stores a smoothness factor, from which the cosine power is derived via a power function.  This representation is similar to one I have used in the past, and it works very well.  Mathematical derivations of the model are given in the appendices.

The article contains some very nice insights, like this one explaining the difference between original Phong and Blinn-Phong: “While the Phong model generates perfect reflections of fuzzed light sources, the Blinn model generates fuzzy reflections of perfect point lights.”  Quite true, and it had never occurred to me to think of it like that.

I do have some nitpicks; I think the derivation of the energy-conserving specular term is slightly wrong, and the Fresnel approximations given in the appendices are a bit strange; the first looks like the Schlick approximation but has an unexplained “min(60k_specular, 1)” term which doesn’t make any sense, and the second (basically dividing normal reflectance color by N dot H) goes to infinity at grazing angles, instead of going to white.  I also think that some discussion on using environment maps would have produced a more useful ambient model (although the hemispherical model is quite cool).  However, these are all minor issues, not detracting from an excellent article.

2.6 Graphics Techniques in Crackdown

This article details some of the graphics techniques used by game developer Realtime Worlds, in their highly regarded game Crackdown.  Five techniques are described; sky rendering (with variable time of day, light scattering, cloud shadows, etc.), ground clutter, stylized object outlining, deferred lighting (done in a stylized way which probably would not work for most other games), and vehicle reflections.  Of these, the vehicle reflections are most interesting.

It is important that vehicle reflections contain crisp edges (this gives a visual cue that the vehicle surface is smooth), and that they change in a way that roughly shows the motion of the car relative to the surrounding world.  It is much less important that the reflections have exactly the right colors and details in them.  At each pixel, the vehicle reflections in Crackdown have one of two colors.  One color corresponds to the reflection ray being occluded by a building; the other corresponds to an unoccluded ray (reflecting the sky).  Both colors are constant over the scene.  In theory, a raycast is performed against a heightfield representing the city to determine whether the reflected ray is occluded.  In practice, the “raycast” is done at a fixed distance from the car, carefully chosen to be large enough to capture the buildings on the far side of a street but not so large as to capture the next building over on the near side.  The heightfield representation is also quite interesting; it includes two heights (a ground height and a building top height), as well as a third value, used to select between the first two.  This third value works in a way very similar to a signed distance field.  As this presentation by Valve shows, signed distance fields are very useful for representing abrupt transitions with low resolution textures.  The article goes into quite a bit of detail on the generation of this heightfield texture.

All the described methods are great examples of game rendering techniques; they yield good visual results for low cost, and are tailored to the game requirements.

GDC 2009 also had an excellent presentation on Crackdown rendering techniques by the same author (Hugh Malan).  The techniques discussed are different than the ones in the ShaderX7 article, so the two are complementary and I highly recommend reading both.

2.8 Deferred Shading with Multisampling Anti-Aliasing in DirectX 10

Deferred shading and lighting approaches have difficulty with MSAA, especially on PC (direct access to individual samples is straightforward on consoles).  This article explains the problem, and shows how to use features in Direct3D 10.0 and 10.1 to overcome it.  The implementation details are mostly D3D10-specific, but the optimization techniques (such as only performing lighting calculations once on non-edge pixels) are also interesting for other platforms.

2.9 Light-Indexed Deferred Rendering

This technique is in some sense the opposite of deferred shading/lighting approaches.  The technique has been mentioned before on Jeremy Shopf’s blog; the basic idea is to render light volumes into a light index buffer.  This buffer contains indices into a global array of lights.  The number of lights affecting a single pixel is limited (typically to 4) by the buffer format chosen.  In some ways, this resembles the technique described in the article Better Geometry Batching Using Light Buffers in ShaderX4, where the light properties were directly written into a buffer (there was no clear way to support multiple lights per pixel).

This technique has some of the advantages of deferred shading, and lacks some of its disadvantages; however, it adds one of its own.  The maximum number of overlapping lights (4 in most cases) will be used over all pixels, regardless of the actual number of lights affecting them.  Dynamic branching can be used to ameliorate this, but such branches are not particularly fast on most hardware.

The author has posted both an article and a demo (with source code!) online.

3.1 Efficient Post-Processing with Importance Sampling

Importance sampling is commonly used in offline ray-tracing renderers, but less often in real-time applications.  This article explains the basic idea behind importance sampling, and shows how it applies to bloom, depth of field, and SSAO.

Many rendering operations amount to a weighted integral over a spatial quantity.  This is typically performed in real-time applications by sampling an area with uniform density, and weighting each sample by the weighting function.  With importance sampling, the density of the sampling is proportional to the weighting function, and no weighting factor is applied.  This makes the best use of a limited number of samples.

Importance sampling is a general technique which can be used to enhance many rendering methods besides the three shown in the article.

3.2 Efficient Real-Time Motion Blur for Multiple Rigid Objects

Motion blur implementations often make use of a velocity buffer.  In cases where only camera motion is important, this can be avoided by using a combination of the depth and the previous frame’s camera matrix.  This technique has been around for a while, but it was first published in an article in GPU Gems 3.  It is typically used in racing games.  Often other cars are assuming to be moving with the camera, and are masked off from the effect.

This article generalizes this approach to handle multiple rigid objects.  The method is quite straightforward; a per-object index is stored in the stencil buffer, and an array of previous frame matrices for all objects is passed to the pixel shader performing the blur.  This should work well in games where most fast-moving objects are rigid (such as driving games).

4.2 A Hybrid Method for Interactive Shadows in Homogeneous Media

This article describes an interesting way to render volumetric shadows.  In a nutshell, stencil volumes are used to determine start and end points for raymarching.  This technique does have some limitations when applied to complex scenes, but in cases where volumetric shadows are cast from relatively simple, bounded geometry it could work well.  The technique is slower than depth buffer based techniques such as this one (published in GPU Gems 3), but it does produce more correct results in some cases.  It might make sense to use both; the depth buffer based technique for distant geometry, and this one for nearby geometry.

The article appears to be very similar to a paper in the 2008 Symposium on Interactive Ray Tracing, Interactive Volumetric Shadows in Participating Media with Single-Scattering, by the same authors (Chris Wyman and Shaun Ramsey).  The paper, as well as demos and videos, can be found on Chris Wyman’s publication page.

4.4 Facetted Shadow Mapping for Large Dynamic Game Environments

This article details the shadow mapping approach used in Rockstar GamesGrand Theft Auto IV.  Since GTA IV is one of the most successful games ever made (over 13 million copies sold!), the rendering techniques it uses are of interest.

The technique described is a solution to the pixel-texel mapping problem.  The main difficulty with classic shadow maps is that shadow map texels do not map well to screen pixels, resulting in artifacts (if the shadow map resolution is low enough for some screen areas to be under-sampled), poor performance (if the resolution is high enough that the entire screen is well-sampled), or both.  A few years ago, approaches which applied various warps to the shadow map were popular, but it is hard to get good results with a single warped map.  Now most games use cascaded shadow map approaches, with multiple shadow maps ensuring more uniform screen-pixel-to-shadow-texel ratios.  These typically involve multiple tiles of varying resolution.

The facetted shadow maps used in GTA IV divides the square shadow map into multiple wedges or facets, each using a different perspective transform.  The primary advantage of this approach is that it allows for filtering between facets; this is not possible betweeen the tiles of a cascaded shadow map.

5.1 Dynamic Weather Effects

This article describes the rendering of precipitation effects (such as rain and snow) in Bizarre Creations‘ racing game, Project Gotham Racing 4.  Although I have seen most of these techniques used in games before (such as simulating particles inside a cube with wrap-around then tiling the cube for rendering, and using a heightfield to determine occlusion), they haven’t been published previously.  Besides, it is always useful to see how techniques are combined into a complete system in a shipping (and in this case, successful and well-regarded) game.

6.1 Screen-Space Ambient Occlusion

Crytek first presented their screen-space ambient occlusion (SSAO) technique in a SIGGRAPH 2007 course.  Although a somewhat similar technique had been published a few months before, this was a working implementation, part of a highly anticipated followup (Crysis) to a very well-regarded game (Far Cry).  This presentation immediately made SSAO into the graphics buzzword of the day, and hundreds of game graphics programers went feverishly to work implementing this technique.  However, Crytek’s presentation left some details unclear, so people were parsing between the lines and trying to guess the missing details (including myself, when I was writing the section on SSAO in our book).

Finally, in this ShaderX7 chapter, the SSAO method used in Crytek is detailed by Vladimir Kajalin (the developer of the technique).  So how well did I do?  It looks like I got most of the details right, but one of them is quite wrong.

In Kajalin’s SSAO technique, 16 samples are taken in a 3D sphere surrounding the shaded point (the width of the sphere is such that it would fill about 10% of the screen width if projected to the screen).  These samples are projected into the depth buffer, and for each one it is determined whether it is unoccluded (in front of the depth in the depth buffer) or occluded (behind the depth buffer).  The AO value is simply the percentage of unoccluded samples.  This will cause samples on a flat plane to be considered as 50% occluded, and corners and edges will be brighter than the rest of the object.  The book description incIuded a technique that I thought they were using to fix this.  It turns out that Crytek liked this “edge-brightening” look, and did not use any techniques to avoid it.  In my defense, I think this technique is still useful, even if Crytek did not use it.  Oh well…

The article goes into a lot more depth, with several interesting implementation details (such as the fact that they used importance sampling), as well as shader code (with optimizations removed to improve readability).

This article represents the earliest work in the area; a lot has been done since.  Anyone thinking about implementing SSAO should also read Blizzard‘s description of the method used in Starcraft 2, as well as this nifty paper which computes the occluded percentage of a sphere above the shaded point.  Chapter 6.7 of ShaderX7 (the next one discussed) might also be of interest.

6.7 Variance Methods for Screen-Space Ambient Occlusion

Depth-buffer unsharp-masking is a cheap method sometimes used as an approximation to SSAO.  The results have a superficial resemblance to SSAO, but don’t look quite as good.  In a nutshell, the difference between the two methods could be expressed thus: SSAO does a bunch of comparisons with the depth buffer, treating each comparison result as having a value of 1 (unoccluded) or 0 (occluded), and then averages those comparison results.  Unsharp masking the depth buffer does a “comparison” (actually a subtraction) between the depth at the shaded point and the average depth around it.  In other words, SSAO compares, then averages, and depth-buffer unsharp-masking averages, then compares.  The “average first” approach can be made much faster, but the “compare first” approach is more correct.

The author of this article, Angelo Pesce, realized that this is very similar to shadow mapping.  Percentage-closer filtering (PCF) compares the shaded (light-space) depth to a depth buffer, and then averages the compared results.  Variance shadow maps (VSM) have two channels which express the distribution of depth values.  Using variance maps, a fast “average first” approach can produce something close to the PCF “compare first” result.  The article shows how variance mapping can be applied to SSAO; as a bonus, the light leaking artifacts which plague VSMs don’t seem to appear as often when used for SSAO.

It’s hard to evaluate the technique without seeing it implemented, but it looks interesting enough to merit further investigation.

8.5 Designing a Renderer for Multiple Lights: The Light Pre-Pass Renderer

I’ve discussed this article in a previous blog post, so you can follow the link if you want to learn more.

3 thoughts on “ShaderX7

  1. Mauricio

    Naty,

    An excellent overview! It certainly helped sell me on the book. For SSAO, I would also recommend looking at NVIDIA’s horizon-based ambient occlusion (HBAO). It is heavier than the Crytek or Blizzard solutions, but potentially more accurate. That’s good for the CAD folks, who favor quality over speed (compared to the games folks).

  2. Naty Post author

    There is actually a chapter on HBAO as well, I didn’t mention it since other approaches seem more appropriate for games, as Mauricio says.There are several chapters like that (techniques that might not be appropriate for games but may be good for CAD or other real-time applications); my summary was from a somewhat games-centric POV.

  3. wrice127

    Your post solved my question about unshape masking. I was wondering it sounds similar but simpler. Thanks.

    Where can I find more detail difference between “original Phong and Blinn-Phong”. I still don’t got that part.

    In addition, I like to ask why in the chapter 3.1, importance sampling, vertical and horizontal are separable for bloom. I believe it shouldn’t be.

    The shaderX7 was a very hard but useful book for me. I liked CHC++ and Real-time ray-tracing (chapter 8.4 and 6.6 respectively). I also liked the chapter 4.1 about improving some problems of cascaded shadow mapping.

Comments are closed.