Since then, Stephen contacted me – it turns out I got some details wrong, and he also provided me with some additional details about the techniques in his talk. I will give the corrections and additional details here.
- What I described in the post as a “software hierarchical Z-Buffer occlusion system” actually runs completely on the GPU. It was directly inspired by the GPU occlusion system used in ATI’s “March of the Froblins” demo (described here), and indirectly by the original (1993) hierarchical z-buffer paper. Stephen describes his original contribution as “mostly scaling it up to lots of objects on DX9 hardware, piggy-backing other work and the 2-pass shadow culling”. Stephen promises more details on this “in a book chapter and possibly… a blog post or two” – I look forward to it.
- The rigid body AO volumes were initially inspired by the Ambient Occlusion Fields paper, but the closest research is an INRIA tech report that was developed in parallel with Stephen’s work (though he did borrow some ideas from it afterwards).
- The character occlusion was not performed using capsules, but via nonuniformly-scaled spheres. I’ll let Stephen speak to the details: “we transform the receiver point into ‘ellipsoid’-local space, scale the axes and lookup into a 1D texture (using distance to centre) to get the zonal harmonics for a unit sphere, which are then used to scale the direction vector. This works very well in practice due to the softness of the occlusion. It’s also pretty similar to Hardware Accelerated Ambient Occlusion Techniques on GPUs although they work purely with spheres, which may simplify some things. I checked the P4 history, and our implementation was before their publication, so I’m not sure if there was any direct inspiration. I’m pretty sure our initial version also predated Real-time Soft Shadows in Dynamic Scenes using Spherical Harmonic Exponentiation since I remember attending SIGGRAPH that year and teasing a friend about the fact that we had something really simple.”
- My statement that the downsampled AO buffer is applied to the frame using cross-bilateral upsampling was incorrect. Stephen just takes the most representative sample by comparing the full-resolution depth and object IDs against the surrounding down-sampled values. This is a kind of “bilateral point-sampling” which apparently works surprisingly well in practice, and is significantly cheaper than a full bilateral upsample. Interestingly, Stephen did try a more complex filter at one point: “Near the end I did try performing a bilinearly-interpolated lookup for pixels with a matching ID and nearby depth but there were failure cases, so I dropped it due to lack of time. I will certainly be looking at performing more sophisticated upsampling or simply increasing the resolution (as some optimisations near the end paid off) next time around.”
A recent blog post on Jeremy Shopf’s excellent Level of Detail blog mentions similarities between the sphere technique and one used for AMD’s ping-pong demo (the technique is described in the article Deferred Occlusion from Analytic Surfaces in ShaderX7). To me, the basic technique is reminiscent of Inigo Quilez‘ article on analytical sphere ambient occlusion; an HPG 2010 paper by Morgan McGuire does something similar with triangles instead of spheres.
Although the technique builds upon previous ones, it does add several new elements, and works well in the game. The technique does suffer from multiple-occlusion; I wonder if a technique similar to the 1D “compensation map’ used by Morgan McGuire might help.