Monthly Archives: July 2010

Update on Splinter Cell: Conviction Rendering

In my recent post about Gamefest 2010, I discussed Stephen Hill’s great presentation on the rendering techniques used in Splinter Cell: Conviction.

Since then, Stephen contacted me – it turns out I got some details wrong, and he also provided me with some additional details about the techniques in his talk. I will give the corrections and additional details here.

  1. What I described in the post as a “software hierarchical Z-Buffer occlusion system” actually runs completely on the GPU. It was directly inspired by the GPU occlusion system used in ATI’s “March of the Froblins” demo (described here), and indirectly by the original (1993) hierarchical z-buffer paper. Stephen describes his original contribution as “mostly scaling it up to lots of objects on DX9 hardware, piggy-backing other work and the 2-pass shadow culling”. Stephen promises more details on this “in a book chapter and possibly… a blog post or two” – I look forward to it.
  2. The rigid body AO volumes were initially inspired by the Ambient Occlusion Fields paper, but the closest research is an INRIA tech report that was developed in parallel with Stephen’s work (though he did borrow some ideas from it afterwards).
  3. The character occlusion was not performed using capsules, but via nonuniformly-scaled spheres. I’ll let Stephen speak to the details: “we transform the receiver point into ‘ellipsoid’-local space, scale the axes and lookup into a 1D texture (using distance to centre) to get the zonal harmonics for a unit sphere, which are then used to scale the direction vector. This works very well in practice due to the softness of the occlusion. It’s also pretty similar to Hardware Accelerated Ambient Occlusion Techniques on GPUs although they work purely with spheres, which may simplify some things. I checked the P4 history, and our implementation was before their publication, so I’m not sure if there was any direct inspiration. I’m pretty sure our initial version also predated Real-time Soft Shadows in Dynamic Scenes using Spherical Harmonic Exponentiation since I remember attending SIGGRAPH that year and teasing a friend about the fact that we had something really simple.”
  4. My statement that the downsampled AO buffer is applied to the frame using cross-bilateral upsampling was incorrect. Stephen just takes the most representative sample by comparing the full-resolution depth and object IDs against the surrounding down-sampled values. This is a kind of “bilateral point-sampling” which apparently works surprisingly well in practice, and is significantly cheaper than a full bilateral upsample. Interestingly, Stephen did try a more complex filter at one point: “Near the end I did try performing a bilinearly-interpolated lookup for pixels with a matching ID and nearby depth but there were failure cases, so I dropped it due to lack of time. I will certainly be looking at performing more sophisticated upsampling or simply increasing the resolution (as some optimisations near the end paid off) next time around.”

A recent blog post on Jeremy Shopf’s excellent Level of Detail blog mentions similarities between the sphere technique and one used for AMD’s ping-pong demo (the technique is described in the article Deferred Occlusion from Analytic Surfaces in ShaderX7). To me, the basic technique is reminiscent of Inigo Quilez‘ article on analytical sphere ambient occlusion; an HPG 2010 paper by Morgan McGuire does something similar with triangles instead of spheres.

Although the technique builds upon previous ones, it does add several new elements, and works well in the game. The technique does suffer from multiple-occlusion; I wonder if a technique similar to the 1D “compensation map’ used by Morgan McGuire might help.

SIGGRAPH Scheduler & Course Update

For anyone still working on their SIGGRAPH 2010 schedule, SIGGRAPH now has an online scheduler available. They are also promising an iPhone app, but this has not yet materialized. Most courses (sadly, only one of mine) now have detailed schedules. These reveal some more detail about two of the most interesting courses for game and real-time rendering developers:

Advances in Real-Time Rendering in 3D Graphics and Games

The first half, Advances in Real-Time Rendering in 3D Graphics and Games I (Wednesday, 28 July, 9:00 AM – 12:15 PM, Room 515 AB) starts with a short introduction by Natalya Tatarchuk (Bungie), and continues with four 45 to 50-minute talks:

  • Rendering techniques in Toy Story 3, by John Ownby, Christopher Hall and Robert Hall (Disney).
  • A Real-Time Radiosity Architecture for Video Games, by Per Einarsson (DICE) and Sam Martin (Geomerics)
  • Real-Time Order Independent Transparency and Indirect Illumination using Direct3D 11, by Jason Yang and Jay McKee (AMD)
  • CryENGINE 3: Reaching the Speed of Light, by Anton Kaplayan (Crytek)

The second half, Advances in Real-Time Rendering in 3D Graphics and Games II (Wednesday, 28 July, 2:00 PM – 5:15 PM, Room 515 AB) continues with five more talks (these are more variable in length, ranging from 25 to 50 minutes):

  • Sample Distribution Shadow Maps, by Andrew Lauritzen (Intel)
  • Adaptive Volumetric Shadow Maps, by Marco Salvi (Intel)
  • Uncharted 2: Character Lighting and Shading, by John Hable (Naughty Dog)
  • Destruction Masking in Frostbite 2 using Volume Distance Fields, by Robert Kihl (DICE)
  • Water Flow in Portal 2, by Alex Vlachos (Valve)

And concludes with a short panel (Open Challenges for Rendering in Games and Future Directions) and Q&A session by all the course speakers.

Beyond Programmable Shading

The first half,  Beyond Programmable Shading I (Thursday, 29 July, 9:00 AM – 12:15 PM, Room 515 AB) includes seven 20-30 minute talks:

  • Looking Back, Looking Forward, Why and How is Interactive Rendering Changing, by Mike Houston (AMD)
  • Five Major Challenges in Interactive Rendering, by Johan Andersson (DICE)
  • Running Code at a Teraflop: How a GPU Shader Core Works, by Kayvon Fatahalian (Stanford)
  • Parallel Programming for Real-Time Graphics, by Aaron Lefohn (Intel)
  • DirectCompute Use in Real-Time Rendering Products, by Chas. Boyd (Microsoft)
  • Surveying Real-Time Beyond Programmable Shading Rendering Algorithms, by David Luebke (NVIDIA)
  • Bending the Graphics Pipeline, by Johan Andersson (DICE)

The second half, Beyond Programmable Shading II (Thursday, 29 July, 2:00 PM – 5:15 PM, Room 515 AB) starts with a short “re-introduction” by Aaron Lefohn (Intel) continues with five 20-35 minute talks:

  • Keeping Many Cores Busy: Scheduling the Graphics Pipeline, by Jonathan Ragan-Kelley (MIT)
  • Evolving the Direct3D Pipeline for Real-Time Micropolygon Rendering, by Kayvon Fatahalian (Stanford)
  • Decoupled Sampling for Real-Time Graphics Pipelines, by Jonathan Ragan-Kelley (MIT)
  • Deferred Rendering for Current and Future Rendering Pipelines, by Andrew Lauritzen (Intel)
  • PantaRay: A Case Study in GPU Ray-Tracing for Movies, by Luca Fascione (Weta) and Jacopo Pantaleoni (NVIDIA)

and closes with a 15-minute wrapup (What’s Next for Interactive Rendering Research?) by Mike Houston (AMD) followed by a 45-minute panel (What Role Will Fixed-Function Hardware Play in Future Graphics Architectures?) by all the course speakers Mike Houston, Kayvon Fatahalian, and Johan Andersson, joined by Steve Molnar (NVIDIA) and David Blythe (Intel) (thanks to Aaron Lefohn for the update).

Both of these courses look extremely strong, and I recommend them to any SIGGRAPH attendee interested in real-time rendering (I definitely plan to attend them!)

Four presentations by DICE is an unusually large number for a single game developer, but that isn’t the whole story; they are actually doing two additional presentations in the Stylized Rendering in Games course, for a total of six!

“Video Game Optimization” – a good book

I had the chance to spend some quality time with Preisz & Garney’s recent book “Video Game Optimization” a few weeks back, as I was trapped in a 14 hour plane flight. I hardly spent all that time with it, though I probably should have spent more. Instead, “Shutter Island” and “It’s Complicated” (with bad audio) are four hours out of my life I’ll never get back.

This book goes from soup to nuts on the topic: types of optimization, how to set and achieve goals, discussion of specific tools (VTune, PIX, PerfHUD, etc.), where bottlenecks can occur and how to test for them, and in-depth coverage of CPU and GPU issues. Graphics and engine performance are the focus, including multicore and networking optimization, plus a chapter on consoles and another on managed languages. Some of the information is in the “obvious if you’ve done it before” category, but critical knowledge if you haven’t, e.g., the first thing to do when optimizing is to create some good benchmark tests and lay down the baselines.

There are many specific tips, such as turning on the DirectX Debug runtime and seeing if any bugs are found. Even if your application appears to run fine with problems flagged, the fact that they’re being flagged is a sign of lost performance (the API has to recover from your problem) or possible bugs. I hadn’t really considered that aspect (“code works even with the warnings, why fix it?”), so plan to go back to work with renewed vigor in eliminating these when seen.

I also liked reading about how various optimizing compilers work nowadays. The main takeaway for me was to not worry about little syntactic tricks any more, most modern optimizers are good enough to make the code quite fast.

There’s very little in this book with which I can find fault. I tested a few terms against the index. About the only lack I found was for the “small batch problem“,  where it pays to merge small static meshes into a single large mesh when possible. This topic does turn out to be covered (Chapter 8), but the index has nothing under “batch”, “batching”, “small batch”, etc. There is also no index entry for “mesh”. So the index, while present (and 12 pages long), does have at least one hole I could detect. There are other little index mismatches, like “NVIDIA PerfHUD Tool” and “NvPerfHud Tool” being separate entries, with different pages listed. Typo-wise, I found one small error on page 123, first line should say “stack” instead of “heap”, I believe.

Executive summary: it’s a worthwhile book for just about anyone interested in optimization. These guys are veteran experts in this field, and the book gives specific advice and practical tips in many areas. A huge range of topics are covered, the authors like to run various experiments and show where problems can occur (sometimes the cases are a bit pathological, but still interesting), and there are lots of bits of information to mull over. Long and short, recommended if you want to know about this subject.

To learn more: first, look inside the book on Amazon. We mentioned here before Eric Preisz’s worthwhile article on videogame optimization on Gamasutra. A very early outline of the book appears on vertexbuffer.com. For me, it’s great to see that this is a passion for the first author – that comes through loud and clear in this book. I’ve added it to our recommended books section.

One little update: Carmack’s inverse sqrt trick, mentioned in the book on page 155, is dated for the PC. According to Ian Ameline, “It has been obsolete since the Pentium 3 came out with SSE. The rsqrtss/rsqrtps instructions are faster still and have better and more predictable accuracy. Rsqrtss + one iteration of Newton/Raphson gives 22 (of 23) bits of accuracy, guaranteed.”