Good list of classic graphics papers

Old graphics papers don’t get enough respect nowadays; for example, Porter and Duff’s original paper is still the best place to get a good understanding of alpha blending (which too many people get wrong nowadays). There are many more gems to be found in papers from the 70s and 80s. A while ago, I pointed out Pixar’s online paper library, which includes a lot of “golden oldies” (as well as good new stuff). I just saw this great list of old papers on the codersnotes blog. I heartily concur with Kayamon’s assessment of the value of an ACM digital library subscription, though I wish ACM would find a way to go the Open Access route. It’s not just a matter of expense; the registration wall adds a huge amount of friction to the process of finding information.

Exploiting coherence at GDC 2009

A few months back, I wrote a blog post discussing techniques which exploit coherence, either spatial (like multiresolution rendering) or temporal (like reprojection caching).

Both of these were represented at GDC this year. Jeremy Shopf presented a talk on Mixed Resolution Rendering, and the ambient occlusion technique presented in the talk Rendering Techniques in Gears of War 2 (available on the GDC Vault site) made use of both methods. The ambient occlusion factors were rendered at a downsampled resolution. In addition, reprojection caching was used to reduce temporal aliasing. This is the first use I have seen of reprojection caching in a shipping game.

In my previous blog post, I was skeptical of reprojection approaches, since it seemed to me that as an optimization method they did not address the worst case (where the camera angle changes abruptly). Using such approaches to improve quality instead (as Epic did) makes more sense.

Fast and Furious

Given my last links post referenced “The Fast and the Furious”, I might as well call this post by the 4th movie in that series. Which is bizarrely titled by simply removing two “the”s from the original title. So the 5th movie will just be “Fast Furious”? I can imagine this subtract-a-the for other movies: “Fellowship of Ring”, “Silence of Lambs”, “Singin’ in Rain”, “Back to Future”. Anyway, the goal of this post is to whip through the rest of my links backlog.

I’m still catching up with reading the post-GDC flurry of resources and blog posts and whatnot – you’re on your own. Well, mostly. One or two things: watch the last half of the Unreal 3 new features demo – some nice-looking stuff. Also, the GDC tutorials are available for download; the first set of 7 are what you want. Lots of DirectX 10 and 11 material, from my quick skim. The third talk, by the DICE guys, looked to have some interesting things to say about cascaded shadow maps. Here’s another older presentation, about the Frostbite rendering engine, parallelism, software occlusion culling, ray tracing, and other nice tidbits. What’s also interesting about this one is that it uses a slide hosting site, SlideShare, to hold the presentation. Speaking of slidesets, there are also these from the parallel graphics computing course at SIGGRAPH Asia 2008.

But, you found you’re required to attend a conference between GDC and SIGGRAPH (If so, I want your job). In that case, EGSR 2009 is coming up at the end of June, in Girona, Spain. This is the conference for rendering research in general. There’s still a week before abstracts are due, so get cracking.

In my last links post I asked for open source that loaded and exported a variety of model types and allowed mesh manipulation. Two people answered back: MeshLab. The blog about this package is also worth skimming through.

Also in the previous links article I mentioned the server-side graphics computation model presented by OnLive. I should also mention AMD’s Fusion Render Cloud project, in the same space. Hmmm, maybe this really could work, with compression, and if you don’t mind some lag.

In Gamasutra is a “sponsored” article, but a good one, on Intel’s Threading Building Blocks. I can attest that this component truly does help you take advantage of multiple cores. Knowing at least a bit about TBB is worth your while. Also on Gamasutra is part two of the data alignment article.

There’s a nice rundown of Killzone 2’s graphical features on Brian Karis’ blog.

The sIBL site has some HDR environment maps and manipulation software for download.

Paul Merrell has made plugins available for Max and Blender for his city synthesis procedural modeling research at UNC Chapel Hill.

This diagram of Windows’ graphics makes me think, “it’s just that easy.”

NVScale is an OpenGL-based SDK that lets you use up to four GPUs to store and render extremely large models. It’s nice to see NVIDIA supporting this (non-gaming) area of rendering.

Well-produced tutorial on volume rendering, along with demo code, by Kyle Hayward: part 1, part 2.

There are lots of articles about XNA graphics programming getting put at Ziggyware.

Nothing to do with computer graphics, but this seems like the best computer science class ever.

When nerds and lace-making meet: fractal doilies.

Left-Handed vs. Right-Handed World Coordinates

Two years ago I read a blog entry by Pete Shirley about left-handed vs. right-handed coordinates. I started to have a go at explaining these as simply as I could, but kept putting it off, to avoid saying something stupid or confusing. Having just dealt with this issue yet again at work, it’s time to write down my mental model.

One problem in thinking about this area is that there are two places where we care about them: world coordinates (where stuff is in space) and view coordinates (what we use for the view and perspective matrices). So, this first post will be just about world coordinates, as a start. Basic, but let’s get it down to begin.

The way I think about RH vs. LH for world space is there’s an objective reality out there. You are trying to define where stuff is in that reality. You stand on a plane and decide that looking East is the X+ axis, looking North is the Y+ axis (typical Cartesian coordinates). For the Z+ axis you decide that altitudes are positive numbers. That’s an RH coordinate system, and that’s why it’s the one used by most modeling packages, AFAIK (please do let me know if there are any LH modelers). We all likely know about how the right hand is used to explain the counterclockwise twist the three axes form, the right-hand rule. I was also happy to see on this same page how to label your two fingers and thumb to show the coordinate system.

You meet with Marvin the Moleman. He likes Y+ North, X+ East, just like you, but Z+ for him is downwards, his numbers increase as he digs his holes. He’s LH. So he hands you a model of his mole-lair, fully modeled in 3D. Fine, the transform to the RH space you like is a Z axis reflection, i.e., negate the Z coordinate and, as needed, normal values. He also gives you a 2D textured rectangle showing the floor plan, a 2D object. Viewing his dataset from above, your and his XYZ coordinates (and UV coordinates) happen to exactly match, the Z flip does nothing to these coordinates since Z is 0. You flip only the rectangle’s normal direction.

There are an infinite number of ways to transform between LH and RH, not just negate Z. A plane with any orientation and location can be used to mirror the vertices of the model; some planes are just more useful and convenient than others. A quick rule is that negating one axis or swapping two axes transforms from one coordinate system to the other.

One interesting property of such conversions is that, even though the normals get flipped along some plane (or perhaps I should say, because the normals get flipped), clockwise order of the vertices is not affected by conversion between these two model coordinate systems. Which is counterintuitive, on the face of it: if, for example, you do a planar mirror transform of a model so that you can render it again as an object reflected in a mirror (p. 386 on in RTR3), the mirroring matrix most definitely does change the ordering of the vertices. A clock seen in a mirror is reversed in the direction its hands travel.

The point is simply this: LH and RH are indeed just two ways of describing the same underlying world space. Conversion between the two does not change the world. A clock in the real world will always have its hands move in a clockwise direction, regardless of how you describe that world. I find this “conversion that does not modify clockwise” operation to be like the Zen koan, “It is not the wind that moves. It is not the flag that moves. It is your mind that moves.”

One last bit I found interesting: latitude/longitude. Typically we describe a location on the earth as lat/long/altitude, with North positive for latitude, East positive for longitude. So lat/long/altitude is left-handed, assigning them in this XYZ order. But, I’ve also seen such coordinates listed in longitude/latitude order, e.g., TerraServer USA uses this order. Which is right-handed, since the X and Y coordinates are swapped. In this case all the values are the same, but it’s simply the ordering that changes the handedness.

My next posting on this subject will be about LH vs. RH for viewing vs. world coordinates, which is where the real confusion comes in.

Connections: Larrabee, Michael Abrash, Intel, Dr. Dobb’s and me

There has been a spate of Larrabee information during the last two weeks. Two GDC talks (slides near the bottom of this page), a prototype library, and an article by Michael Abrash on the Dr. Dobb’s website.

Dr. Dobb’s Journal has been out of print since February, but for many years it was one of the leading software publications. When initially published in 1976 (as Dr. Dobb’s Journal of Computer Calisthenics & Orthodontia) it was the first journal focusing on software development for microcomputers. Michael Abrash wrote many articles for Dr. Dobb’s over the years, including a series on the Quake software renderer in the mid 90’s. This series made a great impression on me; when it was published I was considering a career change from microprocessor design to graphics programming. At the time, I was working on Intel’s P55C processor, publicly known as “Pentium with MMX Technology”. This chip was notable both for being the first X86 processor with a SIMD (single instruction multiple data) instruction set, and for being the last CPU to use the in-order Pentium micro-architecture.

When Michael Abrash wrote the Quake articles game rendering was 100% software, mostly written in assembly language. Abrash was the uber-game programmer, having worked on DOOM, written the Quake renderer, and published (in addition to his Dr. Dobb’s articles) many influential books about graphics programming, assembly and optimization (the last of which is available online).

Within a few years (around the time I finally made the jump from CPU design to game graphics programming), it seemed to many that graphics hardware and compiler improvements had made software rendering and hand-coded assembly obsolete. This was mirrored by my own experience; I was hired to my first game industry job on the strength of a software rasterization demo (written mostly in assembler) and by the time the game shipped, it required graphics hardware and contained very little assembly (none written by me). Abrash started applying his considerable skills to what he saw as the next unsolved hard problem: natural language processing.

But he couldn’t stay away from graphics for long; when Microsoft started working on the XBox console he got involved in its design. In the early 2000’s, he figured out that there was a market for software renderers after all, mostly due to the mess of caps bits, unorthogonal feature support, and flaky compliance that characterized low-end graphics hardware at the time (Intel was among the greatest offenders; compounding the problem, its graphics chips sold very well so there were a lot of them out there). With Mike Sartain (another XBox designer), he wrote Pixomatic, a software renderer published by RAD Game Tools (until then mostly known for the Miles sound library, perhaps the most widely-used middleware in the games industry). Of course, he published another series of articles in Dr. Dobb’s about the experience, where he discussed how he made use of SIMD instruction sets such as MMX and SSE when optimizing Pixomatic.

I found this particularly interesting due to my personal involvement with these instruction sets. After working on the first MMX hardware implementation I helped define its successor, which was twice as wide (128 bits instead of 64) and added support for floating-point SIMD. This instruction set was at first called MMX2, then VX, and finally split into two separate instruction sets: SSE and SSE2. By this time SIMD instruction set extensions were becoming quite popular; AMD had their own version called 3DNow!, and PowerPC had the AltiVec instruction set. Intel kept on adding new SIMD extensions: SSE3, SSSE3, SSE4.1, SSE4.2, and AVX.

As Abrash details in the Larrabee article, Larrabee got started when he decided to talk to Intel about some ideas for SIMD instructions to accelerate software rasterization. As a result, Larrabee includes a powerful set of SIMD instructions. Much wider than previous instruction sets (512 bits instead of 128, or 256 in the case of AVX), Larrabee’s instruction set contains several instructions tailored to software rasterization. It is also general enough to allow for automatic code vectorization of a wide variety of loops. Abrash had a key role in the design of the instruction set, bringing software rasterization back into the mainstream.

Besides a good instruction set, Larrabee also needed an efficient hardware design with a large number of cores. Each of these cores needed to be very efficient in terms of performance-per-Watt and per-transistor. Since the Larrabee team started out as a skunkworks, they couldn’t afford to design a brand-new core so they looked at previous Intel cores, and the old in-order Pentium core (almost the same one I used in the P55C) was the one chosen.

What I find fascinating about this story is that Abrash managed to follow rasterization all the way around the Wheel of Reincarnation. This term refers to the common process where a piece of computing functionality is first implemented in software, then moves to special-purpose hardware which gradually becomes more general until it rivals a CPU in complexity, at which point the functionality is folded back into software. It was coined in a 1968 article by T. H, Myer and Ivan Sutherland (the latter is widely considered the father of computer graphics).

ACMR and ATVR

ACMR is Average Cache Miss Ratio, which is used to measure the effectiveness of a vertex reordering scheme to see how it performs. That is, the GPU has a certain number of transformed vertices it keeps in its post-transformed vertex cache. This cache will be more or less effective, depending on the order you feed triangles into the GPU. ACMR is the sum of the number of times each vertex must be transformed and loaded into the cache, divided by the number of triangles in the mesh. Under perfect conditions it is 0.5, since the most a cached vertex can be shared on average in a (non-bizarre) mesh is 6 times and each triangle has 3 vertices.

Ignacio Castaño has an excellent point: a better measure is to ignore the number of triangles in the mesh but instead divide by the number of vertices. He calls this the ATVR (average transform to vertex ratio) of a scheme. The problem with using the number of triangles is that this number varies with the mesh’s topology. Optimal ACMR, vertices over triangles, gives a sense of the amount of shared data in a mesh. ATVR is a better measure of cache performance, as it provides a number that can be judged by itself: 1.0 is always the optimum, so if your caching scheme is giving 1.05 ATVR, you’re doing pretty good. The worst ATVR can get is 6.0 (or just shy of 6.0).

I think the reason ACMR is used is that we evolved from triangle strips to connected meshes. Individual triangles have an ACMR of 3.0. Triangle strips have an optimum ACMR of 1.0, since each vertex can be shared with a maximum of 3 triangles. The way to think about triangle strips is how they increase sharing of vertices with triangles. Gluing strips together in a certain order allowed a better ACMR, since the vertices in one tristrip could be then reused by the next tristrip(s). So the ACMR as an optimum measure was a way to show why general meshes were worthwhile. But, once general meshes came to the fore and became the norm, this vertex/triangle ratio became less meaningful. At this point ACMR just makes it difficult to compare algorithms, as you alway need to know the optimum ACMR. The optimum ACMR changes from mesh to mesh. The optimum ATVR is always 1.0, so different meshes can be compared on the same scale.

That said, this comparison of different meshes using just ATVR can be a bit bogus: if a “mesh” was actually a set of disconnected triangles, no sharing, the ATVR would be 1.0 no matter what, since each vertex would always be transformed once and only once. ACMR would always be 3.0. So, the optimum ACMR is also good to know: a lower optimum ACMR means more sharing is possible and can be exploited.

Ignacio has a followup article on optimal grid rendering, showing the ACMR and ATVR for various schemes.

I3D 2009 Report

I was holding out in hopes that Jeremy Shopf would do a summary of days 2 and 3 of I3D 2009, since he did such a nice piece for day 1. No such luck, so here’s mine. What follows is a brief overview of I3D, mostly the papers that I cared about most, i.e., those on rendering and maybe a little modeling. The goal is to put enough information to let you skim through many papers and decide which you want to read.

There were about 100 attendees. As usual, you can find the paper titles and links to many of them on Ke-Sen Huang’s website.

Day 1:

Pat Hanrahan’s keynote was on do-it-yourself UI. He encouraged people to get their hands dirty and try making some dirt-cheap UI hardware, just to see where it might lead. How to summarize an hour-long talk? “Just Do It”, Arduino, Maker Faire, and this video, which is brilliant.

Multiscale 3D Navigation – The interesting nugget was the idea of rendering a cube map around the viewer to get the depths at pixels, then use this depth data to adjust the near and far planes, adjust the amount of distance traveled when moving forward, and perform collision detection and object avoidance when moving forward.

Gigavoxels – Make everything voxels. They use an octree at the upper levels, 3D textures at the nodes. Compress. Render.

A Novel Paged-Based Data Structure for Interactive Walkthroughs – A “what if our model has 110 million triangles?” paper. Usual idea of textures for far away stuff. Key idea is to divide the scene into coherent chunks that each fit into a disk page, with a k-d tree atop. Lots of preprocessing. Nice result of 20 FPS of the scene at good quality, on a laptop.

Terrain Sketching – UI and algorithms for artist-controlled creation of heightfields. Various ways to create them (silhouette, then “spine”, or vice versa). I liked where he uses spectral noise analysis of real terrain to fill in the artist’s shaky silhouette to make the final result more convincing.

Animation’s not my forte, so I’m skipping reporting that session. That said, Kavan’s paper seems to offer a nice thing, his (or any) higher quality non-linear skinning can be automatically turned into linear skinning, which CAD tools and GPUs support well.

Soft Irregular Shadow Mapping – similar to an earlier paper by Sintorn, now for Larrabee. Wild stuff: take the scene from the eye, take those samples that are seen and transform them into light space as a group. Cells in the light’s “grid” view of these samples are processed. Samples are compared to geometry and an approximation of the coverage of the occluder of the sample is made. Quite involved, but imaginable that this could be “the future”. Table 3 is particularly valuable to anyone interested in shadows, as it’s a summary of previous work and what features each has (more or less).

Hair Self Shadowing and Transparency Depth … – a different way of quickly creating a deep shadow map (like a shadow buffer, but with an opacity function at each pixel instead of a depth). Interesting use of buckets to count hairs at each depth, an “occupancy map”. Good error analysis & corrections done. Seems to work pretty well.

Approximating Dynamic Global Illumination in Image Space – SSAO has no view-dependent component, so as the lights move this self-shadowing component never changes. With a bit more work and record-keeping you can also get shadows that are more affected by directional lights and can also get color bleeding. Some cool effects, some artifacts.

Multiresolution Splatting for Indirect Illumination – a paper where I understood most of it while watching the presentation, but coming back to it I realize I’d have to read the article carefully to know what it all means when put together. Virtual point lights, min-max mipmaps, all sorts of stuff. If you know reflective shadows maps, this extends those by using multiple resolutions to save on bandwidth. Seems a bit tricksy, but amazing that it works.

Day 2:

Started with the fluids session; I’m not fluid-oriented, so no descriptions for you.

Fast High-Quality Line Visibility – very nice work and a good presentation, giving a number of techniques for computing visible lines. 50x faster than using an item buffer, for complex models.

Dynamic Solid Textures… – solid textures for NPR. Problem: how to balance between 3D textures that stay in place on their surfaces while having a 2D look with a constant frequency of pattern on the screen? Basic idea is to use octaves of noise and fade depending on zoom level.

Laplacian Lines for Real-Time Shape Illustration – A new NPR line type. I have “Suggestive contours are only in concave areas. Laplacian lines are a bit faster.” Hmmm, I didn’t write down much else about this one. As is often the case, some things looked great, some things did not.

Real-Time View-Dependent Rendering of Parametric Surfaces – the summary, “screen-space flatness”. Use CUDA to tessellate patches dependent on how much the curve diverges on the screen from a straight line segment. I didn’t understand fully how cracking was ameliorated, but it came out in questioning that cracking was not fully eliminated (though almost so).

MLS-based Scalar Fields… – wild deformations. Magic.

Real-Time Creased Approximate Subdivision Surfaces – how to keep creases on Catmull-Clark subdivision surfaces. Valve is starting to use this in their pipeline, as they would like to have just one art asset be able to be used both in the game and for offline animation. Has some little glitches at concave corners, and there are fixes for these. Valve seemed to feel this method had some staying power for their modeling in the future. Given the limitations of Catmull-Clark surfaces (e.g. these can’t be concave) it allows a lot.

Posters session followed. To be honest, the one that most sticks in my mind is where the guy analyzed a large number of games to see how artists simulate first-person walking. Does your head bob, does your gun bob, do both? Would something else work better? Bob patterns included up and down, U-shaped, infinity-symbol-shaped – the latter two were slightly better. This one won best poster presentation, in fact. Other one that was of interest was combining SSAO and toon shading, which seemed to give a bit more detail to objects.

End of day 2; I already covered the NVIRT announcement at the banquet that night.

Day 3:

Granular Visibility Queries on the GPU – about occlusion culling. First gave a summed area table way of tracking occluders, but that method seemed fussy and complex. The hierarchical item buffer presented seemed like a winner.

Parallel View-Dependent Refinement of Progressive Meshes – indeed, how to do this in parallel. Some very nice visualizations during this talk.

Efficient … Audio-Visual Rendering … – If you have a CPU budget for rendering images plus generating sounds, which pays off best? Nice to see someone do something different.

Don Greenberg gave an enjoyable capstone talk about the history of perspective and its use in architecture. He focussed on historical Italian architects playing tricks with right angles in buildings to make corridors look longer, trompe l’oeil painting, etc.

… human motion papers… crowd patches… egocentric affordance fields… – not my areas, and I faded a bit as conference paper overload started to set in.

The last paper I attended to more carefully, since it was from my company and I’m working in this area (but have nothing to do with the paper):

Multiscale 3D Reference Visualization – Infinite multiscale grids and how to render them well when zooming in and out. Also, put objects on blue stalks (red if two blue stalks merge) with the stalk bases forming circles on the ground plane grid, to give clearer visual cues as to the location and size of objects. New word for the day: exocentric. Egocentric rotation is where the viewer rotates his head, exocentric is where the object stays still and the viewer orbits (or for you gamers, strafes) around it.

And then home the next day through a snowstorm. My car survived, I survived, life’s good.

ShaderX^8 CFP – proposals due May 17

Will there be a GPU Gems 4? I don’t know. But I do know there will be a ShaderX^7 and, with your help, a ShaderX^8. The timeline and information about this next volume is at the ShaderX^8 site. If you’re interested in submitting, one detail (currently) missing from this site is that an example ShaderX proposal, writing guidelines, and a FAQ can be downloaded from here. The key bit: proposals are due May 17th. I’m not currently associated with this series (though I was for volumes 3 & 4), I just like to see them get good submissions.

The existence of these book series – Game Programming Gems, ShaderX, GPU Gems – is a fascinating phenomenon. Conferences like SIGGRAPH are heavy on theory and cutting-edge research, light on practical advice. Books like ours can be more applied, but are survey-oriented by their nature, not spending a lot of time on any given topic. Code samples and white papers on the web from NVIDIA, AMD/ATI, etc., and independents such as Humus, they’re great stuff, but are produced by particular groups of people with specific interests. Also, sometimes just finding relevant code samples on these sites can be a serious challenge (“search” sometimes works less well than I would like).

These book series fill the gap: they go through a review and editing process, improving quality and presentation. This in turn makes them of higher average interest to the reader, vs. a random article on the web of unknown quality. They won’t disappear if someone’s domain expires or interest wanes. They can be easily accessed years later, unlike material published in ephemeral venues such as Game Developer Magazine or GDC proceedings. The titles, at least, can be surveyed in one place by sites such as IntroGameDev (though this one appears to no longer receiving updates, unfortunately, e.g. ShaderX^6 is not listed).

The major downside of these books is that they’re only available on paper, not as searchable PDFs (except the first few ShaderX books). Well, almost the entire GPU Gems series is, wonderfully, online for free, but is still not easily searchable. Now if someone could just figure out a Steam-like system that let people buy books in electronic form while protecting publishers’ monetary interests. Hmmm, maybe eye-implanted bar-code readers that check if you have access to a given piece of digital content, that’ll be non-intrusive… Anyway, this is the challenge ahead for publishers. Maybe the Kindle is the best solution, but I like the Steam games model better, where something you’ve purchased is available on any computer attached to the Internet.

Best of all for consumers is free & digital, of course, but this does trim back the pool of authors pretty drastically, as a royalty percentage of 0% is not much of an incentive (I’ve been reading too many popularized economics books late, e.g. Naked Economics, so have been thinking more in economics-speak, like “incentives”). We wrote our book for the love of the subject, but I can’t complain about also, to my surprise, earning a bit of money (enough to allow me to, what else, upgrade my computer and graphics card on a regular basis). Enough rambling, but the subject of electronic publication is one that’s been on my mind for a few decades now. I expect a solution from you all by the end of the week, then let’s create a startup and we’ll sell out by next March and make a mint.

Real-Time Rendering

Tracking the latest developments in interactive rendering techniques