You are currently browsing articles tagged CUDA.

I’ve been more than a bit busy working on the (newly renamed) Interactive 3D Graphics course for Udacity, so have been tardy pointing out that this cool course is out now: Introduction to Parallel Programming. Taught by John Owens, David Luebke, and others, it uses CUDA as its basis for teaching. I look forward to taking it myself! It’s a free online course with some serious content, graded exercises, and much else to recommend it – a lot of time & effort (& money) was put into making it, so I expect it will be worth my while.

Oh, and I should also mention another new course, HTML5 Game Development, from experts at Google. It’s more 2D graphics related, but again looks like quite a serious course with a lot of chew.

Tags: , , ,

Here goes:

  • The Journal of Computer Graphics Techniques has published its first accepted article: “Importance Sampling of Reflection from Hair Fibers”. Free for download, of course.
  • Mauricio Vives pointed out that the thirteenth article in a series going back to 2010 is now up: Fluid Simulation for Video Games. Not my particular interest, but my gosh this collection is impressive, it’s practically book-length at this point, and includes code snippets and a demo with source (you’ll need to install TBB to compile).
  • A new book has been announced, coming out in October: The CUDA Handbook. The author, Nicholas Wilt, was a software architect at NVIDIA who worked on CUDA since its inception.
  • John Owens pointed out an interesting way to search Google Scholar, by publications with the word graphics. This gives an interesting weighing of influence – no real surprises, and note that some conferences are not included (I highly doubt that EGSR, I3D, and HPG don’t make the cut).
  • Sebastien Lagarde gives an in-depth analysis showing how the Phong and Blinn specular highlight models are related by a factor of 4 in the power. This is an old result from Fisher & Woo in Graphics Gems IV, but it’s nice to see an independent verification and analysis.
  • Thinking of Graphics Gems, I wanted to mention this old piece of news now: Jim Arvo (editor of Graphics Gems II) passed away back on October 19th of last year. He did seminal work in ray tracing, Monte Carlo sampling, light transport, and many other areas, and was also just a great guy. See his homepage while it still is there.
  • There are mutant women who live amongst us with a fourth type of cone cell. There’s more information on Wikipedia.

Tags: , , , , , ,

Inspired by Bing (a person, not a search engine) and by the acrobatics I saw tonight in Shanghai, time for a blog post.

So what’s up with graphics APIs? I’ve been working on a project for a fast 3D graphics system for Autodesk for about 4 years now; the base level (which hides the various flavors of DirectX and OpenGL) is used by Maya, Max, AutoCAD, Inventor, and other products. There are various higher-level optimizations we’ve added (and why Microsoft’s fxc effect compiler suddenly got a lot slower is a mystery), with some particularly nice work by one person here in the area of multithreading. Beyond these techniques, minimizing the raw number of calls to the API is the primary way to increase performance. Our rule of thumb is that you get about 1000-1500 calls a frame (CAD isn’t held to a 60 FPS rule, but we still need to be interactive). The usual tricks are to sort by state, and to shove as much geometry and processing as possible into a single draw call and so avoid the small batch problem. So, how silly is that? The best way to make your GPU run fast is to call it as little as possible? That’s an API with a problem.

This is old news, Tim Sweeney railed against API limitations 3 years ago (sadly, the article’s gone poof). I wrote about his ideas here and added my own two cents. So where are we since then? DirectX 11 has been out awhile, adding three more stages to the pipeline for efficient tessellation of higher-order surfaces. The pipeline’s feeling a bit unwieldy at this point, with a lot of (admittedly optional) stages. There are still some serious headaches for developers, like having to somehow manage to put lighting and material shading in the same pixel shader (one good argument for deferred lighting and similar techniques). Forget about optimization; the arcane API knowledge needed to get even a simple rendering on the screen is considerable.

I haven’t heard anything of a DirectX 12 in the works (except maybe this breathless posting, which I feel obligated to link to since I’m in China this month), nor can I imagine what they’d add of any significance. I expect there will be some minor XBox 72o (or whatever it will be called) -related tweaks specific to that architecture, if and when it exists. With the various CPU+GPU-on-a-chip products coming out – AMD’s Fusion family, NVIDIA’s Tegra 2, and similar from other companies (I think I counted 5, all totaled) – some access costs between the two processors become much cheaper and so change the rules. However, the API still looks to be the bottleneck.

Marketwise, and this is based entirely upon my work in scapulimancy, I see things shifting to mobile. If that isn’t at least the 247th time you’ve heard that, you haven’t been wasting enough time on the internet. But, it has some implications: first, DirectX 12 becomes mostly irrelevant. The GPU pipeline is creaky and overburdened enough right now, PC games are an important niche but not the focus, and mobile (specifically, iPad and other tablets) is fine with the functionality defined thus far by existing APIs. OpenGL ES will continue to evolve, but I doubt we’ll see for a good long while any algorithmically (vs. data-slinging) new elements added to the API that the current OpenGL 4.x and DX11 APIs don’t offer.

Basically, API development feels stalled to me, and that’s how it should be: mobile’s more important, PCs are a (large but slowly evolving) niche, and the current API system feels warped from a programming standpoint, with peculiar constructs like feeding text strings to the API to specify GPU shader effects, and strange contortions performed to avoid calling the API in order to coax the GPU to run fast.

Is there a way out? I felt a glimmer while attending HPG 2011 this year. The paper “High-Performance Software Rasterization on GPUs” by Samuli Laine and Tero Karras was one of my (and many attendees’) favorites, talking about how to efficiently implement a basic rasterizer using CUDA (code’s open sourced). It’s not as fast as dedicated hardware (no surprise there), but it’s at least in the same ball-park, with hardware being anywhere from 1.5x to 8.1x faster for their test cases, median being 3.6x. What I find exciting is the idea that you could actually program the pipeline, vs. it being locked away. They discuss ideas for optimization such as loosening the “first in, first out” rule for triangles currently enforced by all APIs. With its “yet another language” dependency, I can’t say I hope GPGPU is the future (and certainly CUDA isn’t, since it cuts out non-NVIDIA hardware vendors, but from all reports it’s currently the best way to experiment with GPGPU). Still, it’s nice to see that the fixed-function bits of the GPU, while important, are not an insurmountable limit in considering more flexible and general interactive rasterization programming models. Or, ray tracing – always have to stick that in there.

So it’s “forward to the past”, looking at traditional algorithms like rasterization and ray tracing and how to gain efficiency (both in raw speed and in development time) on various modern architectures. That’s ultimately what it’s about for me, at least: spending lots of time fighting the API, gluing together strings to make shaders, and all the other craziness is a distraction and a time-waster. That said, there’s a cost/benefit calculation implicit in all of this. For example, using C# or Java is way more productive than C++, I’d say about 2x, mostly because you’re not tracking down memory problems like leaks and access uninitialized or non-existent values. But, there’s so much legacy C++ code around that it’s still the language of graphics, as previously discussed here. Which means I expect none of the API weirdness to change for a solid decade, at the minimum. Please do go ahead and prove me wrong – I’d be thrilled!

Oh, and acrobatics? Hover your cursor over the image. BTW, the ERA show in Shanghai is wonderful, unlike current APIs.

Tags: , , , ,

Last month I mentioned gDEBugger being free and the joys of cppcheck. Here are some others that have crossed my path for one reason or another. Please do let me know (and so let us all know) about any worthwhile tools and libraries I haven’t blogged about – part of the reason for putting out this list is in hopes of learning of tools I haven’t heard of yet.

  • There is now a free version of AQTime, a commercial application that finds memory leaks and performance bottlenecks.
  • The Intel Graphics Performance Analyzers are supposed to be good stuff, and free – you just sign up for the Visual Adrenaline Program. I haven’t used them, but know people that have (hey, there’s Dan Baker on Intel’s page – nice).
  • Intel’s Parallel Inspector, despite its name, is particularly strong at finding memory leaks in any programs. Free month trial.
  • NVIDIA’s Parallel Nsight, also despite its name and focus of its advertising, is not just for CUDA and DirectCompute debugging and analysis, it also works on DirectX 10 and 11 shaders – you’ll need two machines networked together, one to run the shader and the other to control it. The Standard version is free, though when you sign up for it you also get a time-limited “we hope you get hooked” Professional license. Due to a currently-goofy pair of machines in my office (on different networks, and one’s a Mac I use purely as a Windows box), I haven’t gotten to try it out yet, but the demos look pretty great.
  • The Windows Performance Analysis Tools are evidently worthwhile for checking coarse-grained performance and bottlenecks for Windows programs. Again, free. I’ve heard that a number of groups have used xperf to good effect.
  • On an entirely different subject, HLSL2GLSL does a good job of translating most DirectX 9 (only) HLSL shaders to – wait for it – GLSL. Open source, and more info here, which discusses related efforts (like Mojoshader) and translation in the other direction.
  • Not really a tool per se, but still cool to see: here’s a way to find out how much free GPU memory is left for your OpenGL application. Anyone know any way to do this sort of thing with DirectX and Vista/Windows 7?
  • Will WebGL take off? Beats me, but it’s nice to see there’s an inspector, similar to gDEBugger and PIX.
  • GLM is a C++ math library particularly well-suited for use with (but not at all dependent on) OpenGL.
  • Humus points out that the old workhorse PIX now has new functionality that lets you assign names to objects, making debugging easier.
  • While I was messing with his binvox and viewvox programs, Patrick Min pointed out there’s a free 3DS file format library out there, lib3ds. I tried it out and it did the job well, taking very little time for me to integrate into my own private copy of binvox.

Tags: , , , , , , , ,

From Mauricio Vives, our first guess blogger; I thank him for this valuable detailed report.

Written February 26, 2010.

This past weekend I attended the 2010 Symposium on Interactive 3D Graphics and Games, known more simply as “I3D.” It is sponsored by ACM SIGGRAPH, and was held this year in Bethesda, Maryland, just outside Washington. Disclaimer: I work for Autodesk, so much of this report comes from the perspective of a design software developer, but any opinions expressed are my own.


I3D is a small conference of about 100 people that covers computer graphics and interaction research, principally as it applies to games. I also attended the conference in 2008 near San Francisco, when it was co-chaired by my colleague at Autodesk, Eric Haines.

About half of the attendees are students or professors from universities all over the world, and the rest are from industry, typically game developers. As far as I could tell, I was the only attendee from the design software industry. NVIDIA was well represented both in attendees and presentations, and the other company with significant representation was Firaxis, a local game developer most well known for the Civilization series.

The program has a single track, with all presentations given in the same room. Unlike SIGGRAPH, this means that you can literally see everything the conference has to offer, though it is necessarily more focused. As you will see below, I was impressed with the quality and quantity of material presented.

Since this conference is mostly about games, all of the presented research has a focus on a real-time implementation, often for games running at 60 frames per second. Games have a very low tolerance for low frame rates, but they often have static environments and constrained movement which allows for precomputation and hence high performance and convincing results. Conversely, customers of design software like Autodesk’s products produce arbitrary and changing data, and want the most accurate possible results, so precomputation and approximations are less useful, though a frame rate as low as 5 or 10 fps is often tolerable.

However, an emerging trend in graphics research for games is to remove limitations while maintaining performance, and that was very evident at I3D. The papers and posters generally made a point to remove limitations, in particular so that geometry, lighting, and viewpoints can be fully dynamic, without lengthy precomputation. This is great news for leveraging these techniques beyond games.

In terms of technology, this is almost all about doing work on GPUs, preferably with parallel algorithms. NVIDIA’s CUDA was very well-represented for “GPGPU” techniques that could not use the normal graphics pipeline. With the wide availability of CUDA, a theme in problem-solving is to express as much as possible with uniform grids and throw a lot of threads at it! As far as I could tell, Larrabee was entirely absent from the conference. Direct3D 11 was mentioned only in passing; almost all of the papers used D3D9, D3D10, or OpenGL for rendering.

And a random statistic: a bit more than half of the conference budget was spent on food!


The conference web site, which includes a list of papers and posters, is here.

The Real-Time Rendering blog has a recent post by Naty Hoffman that discusses many of the papers and has links to the relevant author web sites.

Photos from the conference are available at Flickr here. I also took photos at I3D 2008, held at the Redwood City campus of Electronic Arts, which you can find here.


The bulk of the conference program consisted of paper presentations, divided into a few sessions with particular themes. I have some comments on each paper below, with more on the ones of greater personal interest.

Physics Simulation

Fast Continuous Collision Detection using Deforming Non-Penetration Filters

There is discrete collision detection, where CD is evaluated at various time intervals, and continuous CD, where an exact, analytic result is computed. This paper is about quickly computing continuous CD using some simple expressions that vastly reduce the number of tests between primitives.

Interactive Fluid-Particle Simulation using Translating Eulerian Grids

This was authored by NVIDIA researchers. The goal is a fluid simulation that looks better as processors get more and faster cores, i.e., scalable physics. This is actually a combination of techniques implemented primarily with CUDA, and rendered with a particle system. It allows for very dense and detailed results, and uses a simple trick to have the results continue outside the simulation “box.”

Character Animation

Here there was definitely a theme of making it easier for artists to prepare and animate characters.

Learning Skeletons for Shape and Pose

This is about creating skeletons (bones and weights) automatically from a few starting poses and shapes. The author noted that this was likely the only paper developed almost entirely with MATLAB (!).

Frankenrigs: Building Character Rigs From Multiple Sources

This paper has a similar goal: use existing artist-created character rigs to automatically create rigs for new characters, with some artist control to adjust the results. This relies on a database of rigged parts that an art team probably already has, thus it is a data-driven solution for the time-consuming tasks in character rigging.

Synthesis and Editing of Personalized Stylistic Human Motion

This is about taking a walk animation for a single character, and using that to generate new walk animations for the same character, or transfer them to new characters.

Fast Rendering Representations

Real-Time Multi-Agent Path Planning on Arbitrary Surfaces

Path finding in games is a huge problem, but it is normally constrained to a planar surface. This paper implements path planning on any surface, and does it interactively on both the CPU and GPU using CUDA.

Efficient Sparse Voxel Octrees

Is it time for voxel rendering to make a comeback? These researchers at NVIDIA think so. Here they want to represent a 3D scene similar using voxels with as little memory as possible, and render it efficiently with ray casting. In this case, the voxels contain slabs (they call them contours) that better define the surface. Ray casting through the generated octree is done with using special coordinates and simple bit manipulation. LOD is pretty easy: voxels that are too small are skipped, or the smallest level is constrained, similar to MIP biasing.

This paper certainly had some of the most impressive results from the conference. The demo has a lot of detail, even for large environments, where you think voxels wouldn’t work that well. One of the statistics about storage was that the system uses 5-8 bytes per voxel, which means an area the size of a basketball court could be covered with 1 mm resolution on a high-end NVIDIA GPU. This comprises a lot of techniques that could be useful in other domains, like point cloud rendering. Anyway, I recommend looking at the demo video and if you want to know more, see the web site, which has code and the compiled demo.

On-the-Fly Decompression and Rendering of Multiresolution Terrain

This paper targets GIS and sci-vis applications that want lossless compression, instead of more-common lossy compression. The technique offers variable rate compression, with 3-12x compression in practice. The decoding is done entirely on the GPU, which means no bus bottleneck, and there are no conditionals on decoding, so it can be very parallel. Also of interest is that decoding is done right in the rendering path, in the geometry shader (not in a separate CUDA kernel), and it is thus simple to perform lighting with dynamically generated normals. This is another paper that has useful ideas, even if you aren’t necessarily dealing with terrain.

GPU Architectures & Techniques

A Programmable, Parallel Rendering Architecture for Efficient Multi-Fragment Effects

The problem here is rendering effects that require access to multiple fragments, especially order-independent transparency, which the current hardware graphics pipeline does not handle well. The solution is impressive: build a entirely new rendering pipeline using CUDA, including transforms, culling, clipping, rasterization, etc. (This is the sort of thing Larrabee has promised as well, except the system described here runs on available hardware.)

This pipeline is used to implement a multi-layer depth buffer and color buffer (A-buffer), both fixed size, where fragments are inserted in depth-sorted order. Compared to depth peeling, this method saves on rendering passes, so is much faster and has very similar results. The downside is that it is a slower than the normal pipeline for opaque rendering, and sorting is not efficient for scenes with high depth complexity. Overall, it is fast: the paper quoted frame rates in the several hundreds, but really they should be getting their benchmark conditions complex enough to measure below 100 fps, in order to make the results relevant.

Parallel Banding Algorithm to Compute Exact Distance Transform with the GPU

The distance transform, used to build distance maps like Voronoi diagrams, is useful for a number of image processing and modeling tasks. This has already been computed approximately on GPUs, and exactly on CPUs. This claims to the first exact solution that runs entirely on GPUs. The big idea, as you might expect, is to implement all phases of the solution in a parallel way, so that it uses all available GPU threads. This uses CUDA, and the results are quite fast, even faster than the existing approximate algorithms.

Spatio-Temporal Upsampling on the GPU

The results of this paper are almost like magic, at least to my eyes. Upsampling is about rendering at a smaller resolution or fewer frames, and interpolating the in-between results somehow, because the original data is not available or slow to obtain. Commonly available 120 Hz / 240 Hz TVs now do this in the temporal space. There is a lot of existing research on leveraging temporal or spatial coherence, but this work uses both at once. It takes advantage of geometry correlation within images, e.g. using normals and depths, to generate the new useful information.

I didn’t follow all of the details, but the results were surprisingly free of artifacts, at least for the scenes demonstrated. This could be useful any place where you might want progressive rendering, real-time ray tracing, because rendering full-resolution is very expensive. This technique or some of the ones it references (like this one) could offer much better results than just rendering at a lower resolution and doing simple filtering like is often done for progressive rendering.

Scattering and Light Propagation

Cascaded Light Propagation Volumes for Real-Time Indirect Illumination

This paper almost certainly had the most “street cred” by virtue of being developed by game developer Crytek. Simply put, this is a lattice-based technique for real-time indirect lighting. The most important features are that it is fully dynamic, scalable, and costs around 5 ms per frame. A very quick overview of how it works: render reflective shadow map for each light, initialize the grid with this information to define many secondary light sources, then propagate light through the grid in 30 directions (faces) from each cell into the adjacent 6 cells, approximate the results with spherical harmonics, and render.

To manage performance and storage, this uses cascades (several levels of detail) relative to the viewer, hence the use of the term “cascaded” in the title. The same data and technique can be used to render secondary occlusion, multiple bounces, glossy reflections, participating media using ray marching… just a crazy amount of nice rendering stuff. The use of a lattice has some of its own quality limitations, which they discuss, but nothing too bad for a game. This was a lot to take in, and I did not follow all of the details, but the results were very inspiring. Apparently this will appear in the next version of their game engine, which means consumers will soon come to expect this. Crytek apparently also discussed this at SIGGRAPH last year.

Interactive Volume Caustics in Single-Scattering Media

Caustics is basically “light focusing,” and scattering media is basically “fog / smoke /water,” so this is about rendering them together interactively, e.g. stage lights at a concert with a fog machine, or sunlight under water.  It is fully dynamic, and offers surprisingly good quality under a variety of conditions. It is perhaps too slow for games, but would be fine for design software or a hardware renderer which can take a few seconds to render.

Epipolar Sampling for Shadows and Crepuscular Rays in Participating Media with Single Scattering

This paper has a really long title, but what it is trying to do is simple: render rays of light, a.k.a. “god rays.” Normally this is done with ray marching, but this is still too slow for reasonable images, and simple subsampling doesn’t represent the rays well. The authors observed that radiance along the ray “lines” don’t change much, except for occlusions, which leads to the very clever idea of the paper: construct the (epipolar) lines in 2D around the light source, and sparsely sample along the lines, adding more samples at depth changes. The sampling data is stored as a 2D texture, one row per line, with samples are in columns. It’s fast, and looks great.

NPR and Surface Enhancement

Interactive Painterly Stylization of Images, Videos and 3D Animations

This is another title that direct expresses its goal. Here the “painterly” results are built by a pipeline for stroke generation, with many thousands of strokes per image, which also leverages temporal coherence for animations. It can be used on videos or 3D models, and runs entirely on the GPU. If you are working with NPR, you should definitely look at their site, the demo video, and the referenced papers.

Simple Data-Driven Modeling of Brushes

A lot of drawing programs have 2D brushes, but real 3D brushes can represent and replace a large number of 2D brushes. However, geometrically modeling the brush directly can lead to bending extremes that you (as an artist) usually want to avoid. In this paper from Microsoft Research, the modeling is data-driven, based on measuring how real brushes deform in two key directions. The brush is geometrically modeled with only a few spines having a variable number of segments as bones.

This has some offline precomputation, but most of the implementation is computed at run time. This was one of the few papers with a live demo, using a Wacom tablet, and it was made available for attendees to play with. See an example from an attendee at the Flickr gallery here.

Radiance Scaling for Versatile Surface Enhancement

This is about rendering geometry in such a way the surface contours are not obscured by shading. This is the problem that techniques like the “Gooch” style try to solve. However, the technique in this paper does it without changing the perceived material, sort of like an advanced sharpening filter for 3D models.

It describes a scaling function based on curvature, reflectance, and some user controls, which is then trivially multiplied with the normally rendered image. The curvature part is from a previous paper by the authors, and reflectance is based on BRDF, where you can enhance BRDF components independently. You should definitely have a quick look at the results here.

Shadows and Transparency

Volumetric Obscurance

This is yet-another screen-space ambient occlusion (SSAO) technique. Instead of point sampling, it samples lines (or beams of area) to estimate the volume of sample spheres that are obscured by surrounding geometry. It claims to get smoother results than point sampling, without requiring expensive blurring, and with the performance (or even better) of point sampling. You can see some results at the author’s site here. While it has a few interesting ideas, this may or may not be much better than an existing SSAO implementation you may already have. I found the AO technique in one of the posters (see below) more compelling.

Stochastic Transparency

This was selected as the best paper of the conference. Like one of the earlier papers, this tries to deal with order-independent transparency, but it does it very differently. The author described it as “using random numbers to approximate order-independent transparency.” It has a nice overview of existing techniques (sorting, depth peeling, A-buffer). The new technique does away with any kind of sorting, is fast, and requires fixed memory, but is only approximate. It was demonstrated interactively on some very challenging scenes, e.g. thousands of transparent strands of hair and blades of grass.

The idea is to collect rough statistics about pixels, similar to variance shadow maps, using a combination of screen-door transparency, multisampling (MSAA), and random masks per fragment (with D3D 10.1). This can generate a lot of noise, so much of the presentation was devoted to mitigating that, such as using per-primitive random number seeding to look OK in motion. This is also extended to shadow maps for transparent shadows. Since this takes advantage of MSAA and is parallel, quality and performance will increase with normal trends in hardware. It was described as not quite fast enough for games (yet), but (again) it might be fast enough for other applications.

Fourier Opacity Mapping

The goal of this work is to add self-shadowing to smoke effects, but it needs to be simple to integrate, scalable, and execute in just a few milliseconds. The technique is based on opacity shadow mapping (2001), which stores a transmittance function per texel, but has significant visual artifacts. Here a Fourier basis is used to encode the function, and you can adjust the number of coefficients (samples) to determine the quality / performance tradeoff. Using just a few coefficients results in “ringing” of the function, but it turns out that OK for smoke and hair. The technique was apparently implemented successfully in last year’s Batman: Arkham Asylum.

Normals and Textures

Assisted Texture Assignment

This paper is about making it much easier and faster for artists to assign textures to game environments (levels). It is an ambiguous problem, with limited input to make decisions. The solution relies on adjacency and shape similarities, e.g. two surfaces that are parallel are likely to have the same texture. The artist picks a surface, and related surfaces are automatically chosen. After a few textures are assigned, the system produces a list of candidate textures based on previous choices. There is some preprocessing that has to be performed, but once ready, the system seems to work great. Ultimately this is not about textures; rather, this is an advanced selection system.

LEAN Mapping

“LEAN” is a long acronym for what is essentially antialiasing of bump maps. Without proper filtering, minified bump maps provide incorrect specular highlights: the highlights change intensity and shape as the bump maps gets small in screen space. The paper implements a technique for filtering bump maps using some additional data on the distribution of bumped normals, that can be filtered like color textures. The math to derive this is not trivial, but the implementation is simple and inexpensive.

The results look great in motion, at glancing angles, minified, magnified, and with layered maps. It also has the distinctive property of turning grooves into anisotropy under minification, something I have never seen before.

Efficient Irradiance Normal Mapping

There are a few well-known techniques in games for combining light mapping and normal mapping, but they are very rough approximations of the “ground truth” results. This paper introduces an extension based on spherical harmonics, but only over a hemisphere, that significantly improves the quality of irradiance normal mapping. Strangely, no mention was made of performance, so I would have to assume that it runs as fast as the existing techniques, just with different math.


The posters session was preceded by a brief “fast forward” presentation with each author having a minute to describe their work. There were about 20 posters total, and I have comments on a few of them.

Ambient Occlusion Volumes (link)

This is a geometric solution to the problem of rendering convincing ambient occlusion, compared to the screen-space (SSAO) techniques which are faster, but less accurate. The results are very close to ray-traced results, and while it appears to be too slow for games right now (about 30 ms to render), that will change with faster hardware.

Real Time Ray Tracing of Point-based Models (link)

The title says it all. I didn’t look into this too much, but I wanted to highlight it because it is getting cheaper to get point cloud data, and it would be great to be able to render that data with better materials and lighting.

Asynchronous Rendering

This poster has an awfully generic name, but it is really about splitting rendering work between a server and a low-spec client, like a mobile phone. In this case, the author demonstrated precomputed radiance transfer (PRT) for high-quality global illumination, where the heavy processing was done on the server, while still allowing the client (here it was an iPhone) to render the results and allow for interactive lighting adjustments. For me the idea alone was interesting: instead of just having the server or client do all the work, split it in a way that leverages the strengths of each.


A few academic and industry speakers were invited to give 90-minute presentations.

Biomechanical and Artificial Life Simulation of Humans for Computer Animation and Games

The keynote address was given by Demetri Terzopoulos of UCLA. I was not previously familiar with his work, but apparently he has a very long resume of work in computer graphics, including one of the most cited papers ever. The talk was an overview of his research from the last 15 years on modeling human geometry, motion, and behavior. He started with the face, then the neck, and then the entire body, each modeled in extensive detail. His most recent model has 75 bones, 846 muscles, and 354,000 soft tissue elements.

The more recent work is in developing intelligent agents in urban settings, each with a set of social behaviors and goals, though with necessarily simple physical models. The eventual and very long-term goal is to have a full-detail physical model coupled with convincing and fully autonomous behavior.

Interactive Realism: A Call to Arms

The dinner talk was given by Peter Shirley of NVIDIA. This was the “motivational” talk, with his intended goal of having computer graphics that are both pleasing and predictive. Some may think that we have already reached the point of graphics that are “good enough,” but he disagrees. He referenced recent games and research to point out the areas that he feels needs the most work. From his slides, these are:

  • Volume lighting / shadowing
  • Indoor-outdoor algorithms
  • Coarse / fine lighting
  • Artist / designer-in-the-loop
  • Motion blur and defocus blur
  • Material models
  • Polarization
  • Tone mapping

He concluded with some action items for the attendees, which includes reforming the way computer graphics research is done, and lobbying for more funding. From the talk and subsequent Q&A, it looks like a lot of people are not happy with the way SIGGRAPH handles papers, a world I know very little about.

The Evolution of Precomputed Lighting for Games

The capstone address was given by Peter-Pike Sloan of Disney Interactive Studios. He presented essentially a history of precomputed lighting for games from Quake, to Halo 3, and beyond. Such lighting trades off flexibility for quality and performance, i.e. you can get very convincing and fast lighting with some important restrictions. This turned out to be a surprisingly large topic, split mostly between techniques for static and dynamic elements, like environments and characters, respectively.

You may wonder why this is relevant beyond a history lesson, the trend in research being for techniques to not require precomputation, and that includes lighting. But precomputed lighting is still relevant for low-end hardware, like mobile devices, and cases where artist control is more important than automated results.

Wrap It Up!

Thanks for making it this far. As you can see, it was a very busy weekend! Like the 2008 conference, this was a great opportunity to see the state-of-the-art in computer graphics and interaction research in a more intimate setting. I hope this was useful, and please reply here if you have any comments.

Tags: , , , , , , ,

Made me laugh

I noticed Kirk & Hwu’s “Programming Massively Parallel Processors” book is back in stock at Amazon (and now with 2 mixed reviews), after being unavailable for a number of days. The part that made me laugh is Amazon’s ranking listing:

#1 in  BooksComputers & InternetHardwareMainframes & Minicomputers

If GPU computing isn’t the antithesis of mainframes and minicomputers, I’m not sure what is…

Tags: ,

We’ve been reworking our books page to take longer to download, I mean, to be more visually interesting and readable. Honestly, the old one was a dense, hard to view pile of book titles. Just adding whitespace between titles is a plus. We’ve also added one book to the recommended list, Eric Lengyel’s math book. Anyway, go check it out. On our main resources page we’ve put all the free books online into one section.

There are some new books coming out that look interesting. For those of you going to GDC, there should be some worthwhile offerings to check out on the floor.

The Programming Massively Parallel Processors book by Kirk (Chief Scientist at NVIDIA) and Hwu (professor at U. of Illinois) is out by now, as of 3 days ago, and is currently sold out on Amazon. It’s undoubtedly derived from the course they co-taught at Illinois. CUDA and Tesla are the keywords here. Hwu’s current course lectures are here and here; I don’t know how they compare to the book, but these newer (non-Kirk) lectures seem more general. I look forward to learning more about this volume—if you have it, please do leave a comment (or better yet, a review on Amazon).

Wolfgang Engel and all have a new book out, GPU Pro. He’s using a new publisher, so it does not have the ShaderX name, but effectively is ShaderX 8. Finally, the book is color throughout vs. previous ShaderX’s. I’ve skimmed some of the articles, and it’s in the same vein as others in the series: a range from practical advice to wild ideas. I can just about guarantee that professional interactive graphics programmers will find something of interest—I found about 5 articles off the bat I want to read through, and plenty of others I should at least skim. More info at the blog for this book.

Game Programming Gems 8 adds to this long-lived series. I haven’t seen it yet, so no comments; Adam Lake’s blog may give updates on status, contents, etc. This series has slowly drifted to including much more non-graphical material over the years. Understandable, but Adam’s someone I think as a graphics guy, so I’m selfishly hoping for more graphics and less the other stuff. My view on collection books like ShaderX and this is simple: an hour of a programmer’s time is about the same as the cost of a book, so if the book saves an hour, it’s paid for itself. Of course, there’s the time cost of reading the articles of interest, but still…

Second editions have been announced for Physically Based Rendering Techniques and High Dynamic Range Imaging. PBRT is more offline rendering oriented, but is a great book because it takes a stand; the authors say what they do for a real system and why they made that choice, vs. listing all possible techniques. It also presents about the longest literate programming presentation published. I have a short review of the first edition. The HDRI book is nice in that it pulls together the various research articles out there into one place, with a coherent thread to it all. The second edition’s new material is described on its Amazon page.

Tags: , ,

Today at the GPU Technology Conference (the successor to last year’s NVISION), NVIDIA announced Fermi, their new GPU architecture (exactly one week after AMD shipped the first GPU from their new Radeon HD 5800 architecture).  NVIDIA have published a Fermi white paper, and writeups are popping up on the web.  Of these, the ones from Real World Technologies and AnandTech seem most informative.

With this announcement, NVIDIA is focusing firmly on the GPGPU market, rather than on graphics.  No details of the graphics-specific parts of the chip (such as triangle rasterizers and texture units) were even mentioned.  The chip looks like it will be significantly more expensive to manufacture than AMD’s chip, and at least some of that extra die area has been devoted to things which will not benefit most graphics applications (such as improved double-precision floating-point support and more general programming models).  With full support for indirect branches, a unified address space, and fine-grained exception handling, Fermi is as general purpose as it gets.  NVIDIA is even adding C++ support to CUDA (the first iterations of OpenCL and DirectCompute will likely not enable the most general programming models).

Compared to their previous architecture, NVIDIA has shuffled around the allocation of ALUs, thread scheduling units, and other resources.  To make sense of the soup of marketing terms such as “warps”, “cores”, and “SMs”,  I again recommend Kayvon Fatahalian’s SIGGRAPH 2009 presentation on GPU architecture.

Tags: , , ,

  • LLVM compiler. A number of people at the High Performance Graphics 2009 symposium were impressed, or even using, this new compiler. It’s new, based on recent research on compilers and optimization, and is supposed to be darn good. More here, with page 3 talking about Apple’s use of it for GLSL code optimization.
  • Very Sleepy CPU Profiler. Free, of course, and works directly on any Windows app with PDBs. Sounds pretty convenient if you don’t have access to a reasonable profiler, or just want to try a different one (I’ve found profilers sometimes have blind spots or peculiar biases). Bonus link at the same site: summaries and links to classic graphics papers. The first sentence on this page made me laugh.
  • Vista Gadgets. I use the NVIDIA temperature gadget, the memory monitor’s also handy. An alternate temperature gadget is here.
  • NVIDIA NEXUS. Debugging GPU code with PIX is flakey at best; I have high hopes that this product from NVIDIA will be much better. It’s something NVIDIA will charge for (a first for NVIDIA, I think), and that’s fine by me if it does a noticeably better job.
  • NVPP. A CUDA library of functions from NVIDIA. I haven’t tried CUDA, but this library looks worthwhile. To be honest, in the long-term OpenCL or compute shaders look like the popular future for commercial products vs. research, since those two are multi-platform. CUDA is much more developed at this point, however, and I’ve heard that whatever techniques you learn using CUDA can almost always be applied to the other two. So, I’m on the fence waiting for a winner, since I have no personal reason to use any of them at this point.
  • VMMap. A little free application that shows where all the memory went.
  • OverClock Checking Tool. I kinda forgot people still overclock. This utility is interesting even if you don’t, if nothing else than to check if things are working. It’s a bit exciting to hear my GPU’s fan kick into overdrive as the temperature climbs to 87 degrees Celsius (188.6 Fahrenheit). I also learnt a little more about my Intel Core 2 Quad CPU: it “idles” at 2.0 GHz, but jumps up to 2.66 GHz when running something serious. I wimped out on going ahead with the Power Supply test, as their warning kept me away.

Tags: , , , ,

I love the movie sequel title “2 Fast 2 Furious”. How clever, and a great way to guarantee there will never be a third movie. Well, there was, but they had to go the colon route, “… : Tokyo Drift”.

Which is indicative of nothing, as I don’t think I’ve ever actually seen any of these movies. I was reminded of the title as my goal today is to whip through the backlog of 72 potential blog resource links I’ve been gathering on [Well, as it turns out, I got through 39 of them (the fresher ones), 33 to go...]

ShaderX^7 has been published. We hope to give it an overview sometime soon (mine’s on backorder from

From various source I heard that OnLive got a bit of notice at GDC. Think: pure server-side computation of all graphics for a game, i.e., a cloud computing model. Now even your grandma’s computer or even a rigged-out TV can play Crysis, assuming the net bandwidth is there. Which of course makes me think: what about latency? Lag for how other players see your action is always there, and causes mismatches (“how did I instantly die?”). But increasing lag for you seeing the consequences of your own actions seems like a non-starter for shooters, at least.

Mark DeLoura has a great two-part article on what game engines are licensed for titles. First part is a general survey, second is about the technology involved. I found it interesting to see what people cared about, e.g. multicore is on people’s minds. Nothing too shocking here, but it’s fantastic to see what is getting used, and why, in this marketplace.

Related to this, I happened across a list of game engines on wikipedia. Not massively useful (e.g. no sense of what’s popular), but a starting place.

John Ratcliff has a graphics math library available for download with an unrestrictive reuse license. He recently added best fit methods for AABB’s and OBB’s.

I was interested to look at the open source, cross-platform (!) model viewer GLC. I’ve wanted something like this for doing some experiments with mesh manipulation. Not a bad viewer, but that’s all it is at this point, unfortunately: you can’t even export to a different 3D format. The search continues… If you know a reasonable open source 3D file viewer/converter out there, please tell me. I should probably bite the bullet and just use Blender, but this application is way overkill.

CUDA voxel rendering – pretty impressive!

I liked this post on optimization mainly because of the line “I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen”. I can entirely relate (though 140 must be some kind of record): too many times I’ve put output debugging statements showing updates, only to see 2,3,6 updates happening. I once started on a project and in the first few weeks increased performance by 100%, simply by noting the main draw path was being executed twice each frame.

Speaking of performance, there’s an article on volume rendering optimizations when using a ray-casting approach on the GPU.

Wolfenstein source code for the free iPhone version, along with Carmack’s documentation on the project, is available.

Software patents are only slightly dumber than business method patents, which are patently absurd. I hadn’t noticed until now, but there was recently a ruling on a business method patent, In re Bilski, which has been used to strike down software patents.

A detailed data and execution flow diagram for the new DirectX 11 pipeline front-end is available from Jolly Jeffers.

People are still making ray-tracing specific hardware; witness Caustic Graphics. They have a rather amazing claim: “The CausticOne, however, thrives in incoherent raytracing situations: encouraging the use of multiple secondary rays per pixel. Its level of performance is not affected by the degree of incoherence.” Good trick. That said, I can’t say I see any large customer base for such a product. This seems like a company designed for acquisition, similar to Ageia. Fine by me, best of luck to them.

I’m happy to learn that the Humus site now has a news blog. This is a great site for demos of advanced techniques, and for honest comments about strengths and limitations of various approaches.

Another blog: The Geeks of 3D. Tracks demos, APIs, SDKs, and graphics card releases. Handy – some of the links here I found there.

There was a nice little article on data alignment on Gamasutra. Proper alignment is a key element in getting high performance.

I was trying to find the name of the projection of equidistant latitude and longitude lines for a surrounding spherical environment. From this interesting page (click on the “Wall Maps of the World” text) I found it: Plate Carrée.

Predicting the future is so much more interesting than predicting the past. I love this: MIPS per $1000. It’s entertaining to equate raw computing power with structured processing. By the same equivalence, I should be able to hook up 1700 mice in parallel to get a human brain.

A great line from a GPU review: “Nvidia’s new line of unbelievably expensive cards will block out the sun, and ray-trace its own shadow in real time.”

Faber College’s motto is “Knowledge is Good”. Learning about the idea of metamers would have saved this article from confusion. Coming back to this article now, I see all the comments have been removed, and an apologia trying to convert confusion into enlightenment added, but I think this still misses the point. Sure, there is a color associated with a single wavelength of light. But, my guess is that 99.99% of the colors we perceive arrive at any location on the eye as light with a spectral mix of wavelengths, not a single wavelength (Naty will correct me if I’m wrong). Unless you’re Dr. Evil and deal with sharks with frickin’ laser beams on their heads on a daily basis. Hmmm, I’m probably forgetting some other single-wavelength phenomena, like fluorescence. Anyway, the article did lead me to look up more information on metamers on Wikipedia, where I learnt about metameric failure, a term I hadn’t heard before. One more reason a simple RGB representation of color isn’t sufficient.

Cute thing: Snapily lets you turn some set of images or video into lenticular prints.

I don’t have a lot to say about what I do at Autodesk. Here’s a tidbit.

Art for the day, crayons as pixels.

Tags: , , , , , , , , , , , , , , , , ,

« Older entries