Tag Archives: NVIDIA

I3D 2011 Report – Part III: Banquet Talk

GDC has put a bit of a hiatus in my I3D posts; I better get them done soon so I can move onto the GDC posts.

This post describes a talk that David Luebke (Director of Research at NVIDIA) gave during the I3D banquet titled GPU Computing: Past, Present, and Future. Although the slides for the I3D talk are not available, parts of this talk appear to be similar to one David gave a few months ago, which does have video and slides available.

I’ll summarize the talk here; anyone interested in more detail can view the materials linked above.

The first part of the talk (which isn’t in the earlier version) covered the “New Moore’s Law”: computers no longer get faster, just wider; must re-think algorithms to be parallel. David showed examples of several scientists which got profound speedups – from days to minutes. He covered several different techniques, I’ll summarize the most notable four:

  1. A “photonic fence” that zaps mosquitoes with lasers, to reduce the incidence of malaria in third world countries. This application needs fast computer vision combined with low power consumption, which was achieved by using GPUs.
  2. A military vehicle which detects Improvised Explosive Devices (IEDs) using computer vision techniques. The speedup afforded by using GPUs enables the vehicle to drive much faster (an obvious advantage when surrounded by hostile insurgents) while still reliably detecting IEDs.
  3. A method for processing CT scans that enables much reduced radiation exposure for the patient. When running on CPUs, the algorithm was impractically slow; GPUs enabled it to run fast enough to be used in practice.
  4. A motion compensation technique that enables surgery on a beating heart. The video of the heart is motion-compensated to appear static to the surgeon, who operates through a surgical robot that translates the surgeon’s motions into the moving frame of the heart.

David started the next part of the talk (which is very similar to the earlier version linked above)  by going over the heritage of GPU computing. He did so by going over three separate historical threads: graphics hardware, supercomputing, and finally GPU Computing.

The “history of graphics hardware” section started with a brief mention of a different kind of hardware: Dürer‘s perspective machine. The history of electronic graphics hardware started with Ivan Sutherland’s SketchPad and continues through the development of the graphics pipeline by SGI: Geometry Engine (1982), RealityEngine (1993), and InfiniteReality (1997). In the early days, the graphics pipeline was an actual description of the physical hardware structure: each stage was a separate chip or board, with the data flow fixed by the routing of wires between them. Currently, the graphics pipeline is an abstraction; the stages are different threads running on a shared pool of cores, as seen in modern GPU designs such as the GeForce 8, GT200, and Fermi.

The second historical thread was the development of supercomputers. David covered the early development of three ways to build a parallel machine: SIMD (Goddard MPP, Maspar MP-1, Thinking Machines CM-1 and CM-2), hardware multithreading (Tera MTA) and symmetric multiprocessing (SGI Challenge, Sun Enterprise) before returning to Fermi as an example of a design that combines all three.

“GPU computing 1.0” was the use (or abuse) of graphics pipelines and APIs to do general-purpose computing, culminating with BrookGPU. CUDA ushered in “GPU computing 2.0” with an API designed for that purpose. The hardware supported branching and looping, and hid thread divergence from the programmer. David claimed that now GPU computing is in a “3.0” stage, supported by a full ecosystem (multiple APIs, languages, algorithms, tools, IDEs, production lines, etc.). David estimated that there are about 100,000 active GPU compute developers in the world. Currently CUDA includes features such as “GPU Direct” (direct GPU-to-GPU transfer via a unified address space), full C++ support, and a template library.

The “future” part of the talk discussed the workloads that will drive future GPUs. Besides current graphics and high performance computing workloads, David believes a new type of workload, which he calls computational graphics, will be important. In some cases this will be the use of GPU compute to improve (via better performance or flexibility) algorithms typically performed using the graphics pipeline (image histogram analysis for HDR tone mapping, depth of field, bloom, texture-space diffusion for subsurface scattering, tessellation), and in others it will be to perform algorithms for which the graphics pipeline is not well-suited: ray tracing, stochastic rasterization, or dynamic object-space ambient occlusion.

David believes that the next stage of GPU computing (“4.0”) poses challenges to APIs (such as CUDA), to researchers, and to the education community. CUDA needs to be able to elegantly express programming models beyond simple parallelism, it needs to better express locality, and the development environment needs to improve and mature. Researchers need to foster new high-level libraries, languages, and platforms, as well as rethinking their algorithms. Finally, computer science curricula need to start teaching parallel computing in the first year.

I3D 2010 Report

From Mauricio Vives, our first guess blogger; I thank him for this valuable detailed report.

Written February 26, 2010.

This past weekend I attended the 2010 Symposium on Interactive 3D Graphics and Games, known more simply as “I3D.” It is sponsored by ACM SIGGRAPH, and was held this year in Bethesda, Maryland, just outside Washington. Disclaimer: I work for Autodesk, so much of this report comes from the perspective of a design software developer, but any opinions expressed are my own.

Overview

I3D is a small conference of about 100 people that covers computer graphics and interaction research, principally as it applies to games. I also attended the conference in 2008 near San Francisco, when it was co-chaired by my colleague at Autodesk, Eric Haines.

About half of the attendees are students or professors from universities all over the world, and the rest are from industry, typically game developers. As far as I could tell, I was the only attendee from the design software industry. NVIDIA was well represented both in attendees and presentations, and the other company with significant representation was Firaxis, a local game developer most well known for the Civilization series.

The program has a single track, with all presentations given in the same room. Unlike SIGGRAPH, this means that you can literally see everything the conference has to offer, though it is necessarily more focused. As you will see below, I was impressed with the quality and quantity of material presented.

Since this conference is mostly about games, all of the presented research has a focus on a real-time implementation, often for games running at 60 frames per second. Games have a very low tolerance for low frame rates, but they often have static environments and constrained movement which allows for precomputation and hence high performance and convincing results. Conversely, customers of design software like Autodesk’s products produce arbitrary and changing data, and want the most accurate possible results, so precomputation and approximations are less useful, though a frame rate as low as 5 or 10 fps is often tolerable.

However, an emerging trend in graphics research for games is to remove limitations while maintaining performance, and that was very evident at I3D. The papers and posters generally made a point to remove limitations, in particular so that geometry, lighting, and viewpoints can be fully dynamic, without lengthy precomputation. This is great news for leveraging these techniques beyond games.

In terms of technology, this is almost all about doing work on GPUs, preferably with parallel algorithms. NVIDIA’s CUDA was very well-represented for “GPGPU” techniques that could not use the normal graphics pipeline. With the wide availability of CUDA, a theme in problem-solving is to express as much as possible with uniform grids and throw a lot of threads at it! As far as I could tell, Larrabee was entirely absent from the conference. Direct3D 11 was mentioned only in passing; almost all of the papers used D3D9, D3D10, or OpenGL for rendering.

And a random statistic: a bit more than half of the conference budget was spent on food!

Links

The conference web site, which includes a list of papers and posters, is here.

The Real-Time Rendering blog has a recent post by Naty Hoffman that discusses many of the papers and has links to the relevant author web sites.

Photos from the conference are available at Flickr here. I also took photos at I3D 2008, held at the Redwood City campus of Electronic Arts, which you can find here.

Papers

The bulk of the conference program consisted of paper presentations, divided into a few sessions with particular themes. I have some comments on each paper below, with more on the ones of greater personal interest.

Physics Simulation

Fast Continuous Collision Detection using Deforming Non-Penetration Filters

There is discrete collision detection, where CD is evaluated at various time intervals, and continuous CD, where an exact, analytic result is computed. This paper is about quickly computing continuous CD using some simple expressions that vastly reduce the number of tests between primitives.

Interactive Fluid-Particle Simulation using Translating Eulerian Grids

This was authored by NVIDIA researchers. The goal is a fluid simulation that looks better as processors get more and faster cores, i.e., scalable physics. This is actually a combination of techniques implemented primarily with CUDA, and rendered with a particle system. It allows for very dense and detailed results, and uses a simple trick to have the results continue outside the simulation “box.”

Character Animation

Here there was definitely a theme of making it easier for artists to prepare and animate characters.

Learning Skeletons for Shape and Pose

This is about creating skeletons (bones and weights) automatically from a few starting poses and shapes. The author noted that this was likely the only paper developed almost entirely with MATLAB (!).

Frankenrigs: Building Character Rigs From Multiple Sources

This paper has a similar goal: use existing artist-created character rigs to automatically create rigs for new characters, with some artist control to adjust the results. This relies on a database of rigged parts that an art team probably already has, thus it is a data-driven solution for the time-consuming tasks in character rigging.

Synthesis and Editing of Personalized Stylistic Human Motion

This is about taking a walk animation for a single character, and using that to generate new walk animations for the same character, or transfer them to new characters.

Fast Rendering Representations

Real-Time Multi-Agent Path Planning on Arbitrary Surfaces

Path finding in games is a huge problem, but it is normally constrained to a planar surface. This paper implements path planning on any surface, and does it interactively on both the CPU and GPU using CUDA.

Efficient Sparse Voxel Octrees

Is it time for voxel rendering to make a comeback? These researchers at NVIDIA think so. Here they want to represent a 3D scene similar using voxels with as little memory as possible, and render it efficiently with ray casting. In this case, the voxels contain slabs (they call them contours) that better define the surface. Ray casting through the generated octree is done with using special coordinates and simple bit manipulation. LOD is pretty easy: voxels that are too small are skipped, or the smallest level is constrained, similar to MIP biasing.

This paper certainly had some of the most impressive results from the conference. The demo has a lot of detail, even for large environments, where you think voxels wouldn’t work that well. One of the statistics about storage was that the system uses 5-8 bytes per voxel, which means an area the size of a basketball court could be covered with 1 mm resolution on a high-end NVIDIA GPU. This comprises a lot of techniques that could be useful in other domains, like point cloud rendering. Anyway, I recommend looking at the demo video and if you want to know more, see the web site, which has code and the compiled demo.

On-the-Fly Decompression and Rendering of Multiresolution Terrain

This paper targets GIS and sci-vis applications that want lossless compression, instead of more-common lossy compression. The technique offers variable rate compression, with 3-12x compression in practice. The decoding is done entirely on the GPU, which means no bus bottleneck, and there are no conditionals on decoding, so it can be very parallel. Also of interest is that decoding is done right in the rendering path, in the geometry shader (not in a separate CUDA kernel), and it is thus simple to perform lighting with dynamically generated normals. This is another paper that has useful ideas, even if you aren’t necessarily dealing with terrain.

GPU Architectures & Techniques

A Programmable, Parallel Rendering Architecture for Efficient Multi-Fragment Effects

The problem here is rendering effects that require access to multiple fragments, especially order-independent transparency, which the current hardware graphics pipeline does not handle well. The solution is impressive: build a entirely new rendering pipeline using CUDA, including transforms, culling, clipping, rasterization, etc. (This is the sort of thing Larrabee has promised as well, except the system described here runs on available hardware.)

This pipeline is used to implement a multi-layer depth buffer and color buffer (A-buffer), both fixed size, where fragments are inserted in depth-sorted order. Compared to depth peeling, this method saves on rendering passes, so is much faster and has very similar results. The downside is that it is a slower than the normal pipeline for opaque rendering, and sorting is not efficient for scenes with high depth complexity. Overall, it is fast: the paper quoted frame rates in the several hundreds, but really they should be getting their benchmark conditions complex enough to measure below 100 fps, in order to make the results relevant.

Parallel Banding Algorithm to Compute Exact Distance Transform with the GPU

The distance transform, used to build distance maps like Voronoi diagrams, is useful for a number of image processing and modeling tasks. This has already been computed approximately on GPUs, and exactly on CPUs. This claims to the first exact solution that runs entirely on GPUs. The big idea, as you might expect, is to implement all phases of the solution in a parallel way, so that it uses all available GPU threads. This uses CUDA, and the results are quite fast, even faster than the existing approximate algorithms.

Spatio-Temporal Upsampling on the GPU

The results of this paper are almost like magic, at least to my eyes. Upsampling is about rendering at a smaller resolution or fewer frames, and interpolating the in-between results somehow, because the original data is not available or slow to obtain. Commonly available 120 Hz / 240 Hz TVs now do this in the temporal space. There is a lot of existing research on leveraging temporal or spatial coherence, but this work uses both at once. It takes advantage of geometry correlation within images, e.g. using normals and depths, to generate the new useful information.

I didn’t follow all of the details, but the results were surprisingly free of artifacts, at least for the scenes demonstrated. This could be useful any place where you might want progressive rendering, real-time ray tracing, because rendering full-resolution is very expensive. This technique or some of the ones it references (like this one) could offer much better results than just rendering at a lower resolution and doing simple filtering like is often done for progressive rendering.

Scattering and Light Propagation

Cascaded Light Propagation Volumes for Real-Time Indirect Illumination

This paper almost certainly had the most “street cred” by virtue of being developed by game developer Crytek. Simply put, this is a lattice-based technique for real-time indirect lighting. The most important features are that it is fully dynamic, scalable, and costs around 5 ms per frame. A very quick overview of how it works: render reflective shadow map for each light, initialize the grid with this information to define many secondary light sources, then propagate light through the grid in 30 directions (faces) from each cell into the adjacent 6 cells, approximate the results with spherical harmonics, and render.

To manage performance and storage, this uses cascades (several levels of detail) relative to the viewer, hence the use of the term “cascaded” in the title. The same data and technique can be used to render secondary occlusion, multiple bounces, glossy reflections, participating media using ray marching… just a crazy amount of nice rendering stuff. The use of a lattice has some of its own quality limitations, which they discuss, but nothing too bad for a game. This was a lot to take in, and I did not follow all of the details, but the results were very inspiring. Apparently this will appear in the next version of their game engine, which means consumers will soon come to expect this. Crytek apparently also discussed this at SIGGRAPH last year.

Interactive Volume Caustics in Single-Scattering Media

Caustics is basically “light focusing,” and scattering media is basically “fog / smoke /water,” so this is about rendering them together interactively, e.g. stage lights at a concert with a fog machine, or sunlight under water.  It is fully dynamic, and offers surprisingly good quality under a variety of conditions. It is perhaps too slow for games, but would be fine for design software or a hardware renderer which can take a few seconds to render.

Epipolar Sampling for Shadows and Crepuscular Rays in Participating Media with Single Scattering

This paper has a really long title, but what it is trying to do is simple: render rays of light, a.k.a. “god rays.” Normally this is done with ray marching, but this is still too slow for reasonable images, and simple subsampling doesn’t represent the rays well. The authors observed that radiance along the ray “lines” don’t change much, except for occlusions, which leads to the very clever idea of the paper: construct the (epipolar) lines in 2D around the light source, and sparsely sample along the lines, adding more samples at depth changes. The sampling data is stored as a 2D texture, one row per line, with samples are in columns. It’s fast, and looks great.

NPR and Surface Enhancement

Interactive Painterly Stylization of Images, Videos and 3D Animations

This is another title that direct expresses its goal. Here the “painterly” results are built by a pipeline for stroke generation, with many thousands of strokes per image, which also leverages temporal coherence for animations. It can be used on videos or 3D models, and runs entirely on the GPU. If you are working with NPR, you should definitely look at their site, the demo video, and the referenced papers.

Simple Data-Driven Modeling of Brushes

A lot of drawing programs have 2D brushes, but real 3D brushes can represent and replace a large number of 2D brushes. However, geometrically modeling the brush directly can lead to bending extremes that you (as an artist) usually want to avoid. In this paper from Microsoft Research, the modeling is data-driven, based on measuring how real brushes deform in two key directions. The brush is geometrically modeled with only a few spines having a variable number of segments as bones.

This has some offline precomputation, but most of the implementation is computed at run time. This was one of the few papers with a live demo, using a Wacom tablet, and it was made available for attendees to play with. See an example from an attendee at the Flickr gallery here.

Radiance Scaling for Versatile Surface Enhancement

This is about rendering geometry in such a way the surface contours are not obscured by shading. This is the problem that techniques like the “Gooch” style try to solve. However, the technique in this paper does it without changing the perceived material, sort of like an advanced sharpening filter for 3D models.

It describes a scaling function based on curvature, reflectance, and some user controls, which is then trivially multiplied with the normally rendered image. The curvature part is from a previous paper by the authors, and reflectance is based on BRDF, where you can enhance BRDF components independently. You should definitely have a quick look at the results here.

Shadows and Transparency

Volumetric Obscurance

This is yet-another screen-space ambient occlusion (SSAO) technique. Instead of point sampling, it samples lines (or beams of area) to estimate the volume of sample spheres that are obscured by surrounding geometry. It claims to get smoother results than point sampling, without requiring expensive blurring, and with the performance (or even better) of point sampling. You can see some results at the author’s site here. While it has a few interesting ideas, this may or may not be much better than an existing SSAO implementation you may already have. I found the AO technique in one of the posters (see below) more compelling.

Stochastic Transparency

This was selected as the best paper of the conference. Like one of the earlier papers, this tries to deal with order-independent transparency, but it does it very differently. The author described it as “using random numbers to approximate order-independent transparency.” It has a nice overview of existing techniques (sorting, depth peeling, A-buffer). The new technique does away with any kind of sorting, is fast, and requires fixed memory, but is only approximate. It was demonstrated interactively on some very challenging scenes, e.g. thousands of transparent strands of hair and blades of grass.

The idea is to collect rough statistics about pixels, similar to variance shadow maps, using a combination of screen-door transparency, multisampling (MSAA), and random masks per fragment (with D3D 10.1). This can generate a lot of noise, so much of the presentation was devoted to mitigating that, such as using per-primitive random number seeding to look OK in motion. This is also extended to shadow maps for transparent shadows. Since this takes advantage of MSAA and is parallel, quality and performance will increase with normal trends in hardware. It was described as not quite fast enough for games (yet), but (again) it might be fast enough for other applications.

Fourier Opacity Mapping

The goal of this work is to add self-shadowing to smoke effects, but it needs to be simple to integrate, scalable, and execute in just a few milliseconds. The technique is based on opacity shadow mapping (2001), which stores a transmittance function per texel, but has significant visual artifacts. Here a Fourier basis is used to encode the function, and you can adjust the number of coefficients (samples) to determine the quality / performance tradeoff. Using just a few coefficients results in “ringing” of the function, but it turns out that OK for smoke and hair. The technique was apparently implemented successfully in last year’s Batman: Arkham Asylum.

Normals and Textures

Assisted Texture Assignment

This paper is about making it much easier and faster for artists to assign textures to game environments (levels). It is an ambiguous problem, with limited input to make decisions. The solution relies on adjacency and shape similarities, e.g. two surfaces that are parallel are likely to have the same texture. The artist picks a surface, and related surfaces are automatically chosen. After a few textures are assigned, the system produces a list of candidate textures based on previous choices. There is some preprocessing that has to be performed, but once ready, the system seems to work great. Ultimately this is not about textures; rather, this is an advanced selection system.

LEAN Mapping

“LEAN” is a long acronym for what is essentially antialiasing of bump maps. Without proper filtering, minified bump maps provide incorrect specular highlights: the highlights change intensity and shape as the bump maps gets small in screen space. The paper implements a technique for filtering bump maps using some additional data on the distribution of bumped normals, that can be filtered like color textures. The math to derive this is not trivial, but the implementation is simple and inexpensive.

The results look great in motion, at glancing angles, minified, magnified, and with layered maps. It also has the distinctive property of turning grooves into anisotropy under minification, something I have never seen before.

Efficient Irradiance Normal Mapping

There are a few well-known techniques in games for combining light mapping and normal mapping, but they are very rough approximations of the “ground truth” results. This paper introduces an extension based on spherical harmonics, but only over a hemisphere, that significantly improves the quality of irradiance normal mapping. Strangely, no mention was made of performance, so I would have to assume that it runs as fast as the existing techniques, just with different math.

Posters

The posters session was preceded by a brief “fast forward” presentation with each author having a minute to describe their work. There were about 20 posters total, and I have comments on a few of them.

Ambient Occlusion Volumes (link)

This is a geometric solution to the problem of rendering convincing ambient occlusion, compared to the screen-space (SSAO) techniques which are faster, but less accurate. The results are very close to ray-traced results, and while it appears to be too slow for games right now (about 30 ms to render), that will change with faster hardware.

Real Time Ray Tracing of Point-based Models (link)

The title says it all. I didn’t look into this too much, but I wanted to highlight it because it is getting cheaper to get point cloud data, and it would be great to be able to render that data with better materials and lighting.

Asynchronous Rendering

This poster has an awfully generic name, but it is really about splitting rendering work between a server and a low-spec client, like a mobile phone. In this case, the author demonstrated precomputed radiance transfer (PRT) for high-quality global illumination, where the heavy processing was done on the server, while still allowing the client (here it was an iPhone) to render the results and allow for interactive lighting adjustments. For me the idea alone was interesting: instead of just having the server or client do all the work, split it in a way that leverages the strengths of each.

Speakers

A few academic and industry speakers were invited to give 90-minute presentations.

Biomechanical and Artificial Life Simulation of Humans for Computer Animation and Games

The keynote address was given by Demetri Terzopoulos of UCLA. I was not previously familiar with his work, but apparently he has a very long resume of work in computer graphics, including one of the most cited papers ever. The talk was an overview of his research from the last 15 years on modeling human geometry, motion, and behavior. He started with the face, then the neck, and then the entire body, each modeled in extensive detail. His most recent model has 75 bones, 846 muscles, and 354,000 soft tissue elements.

The more recent work is in developing intelligent agents in urban settings, each with a set of social behaviors and goals, though with necessarily simple physical models. The eventual and very long-term goal is to have a full-detail physical model coupled with convincing and fully autonomous behavior.

Interactive Realism: A Call to Arms

The dinner talk was given by Peter Shirley of NVIDIA. This was the “motivational” talk, with his intended goal of having computer graphics that are both pleasing and predictive. Some may think that we have already reached the point of graphics that are “good enough,” but he disagrees. He referenced recent games and research to point out the areas that he feels needs the most work. From his slides, these are:

  • Volume lighting / shadowing
  • Indoor-outdoor algorithms
  • Coarse / fine lighting
  • Artist / designer-in-the-loop
  • Motion blur and defocus blur
  • Material models
  • Polarization
  • Tone mapping

He concluded with some action items for the attendees, which includes reforming the way computer graphics research is done, and lobbying for more funding. From the talk and subsequent Q&A, it looks like a lot of people are not happy with the way SIGGRAPH handles papers, a world I know very little about.

The Evolution of Precomputed Lighting for Games

The capstone address was given by Peter-Pike Sloan of Disney Interactive Studios. He presented essentially a history of precomputed lighting for games from Quake, to Halo 3, and beyond. Such lighting trades off flexibility for quality and performance, i.e. you can get very convincing and fast lighting with some important restrictions. This turned out to be a surprisingly large topic, split mostly between techniques for static and dynamic elements, like environments and characters, respectively.

You may wonder why this is relevant beyond a history lesson, the trend in research being for techniques to not require precomputation, and that includes lighting. But precomputed lighting is still relevant for low-end hardware, like mobile devices, and cases where artist control is more important than automated results.

Wrap It Up!

Thanks for making it this far. As you can see, it was a very busy weekend! Like the 2008 conference, this was a great opportunity to see the state-of-the-art in computer graphics and interaction research in a more intimate setting. I hope this was useful, and please reply here if you have any comments.

7 Things for February 10

  • The first three are from Geeks3D, which is a worthwhile site I frequently reference. First: some noise textures, in case you don’t feel like making some yourself.
  • Next, a night-vision filter in GLSL, developed with their GeeXLab tool for prototyping shaders.
  • Finally, PyOpenGL_Lab, which calls OpenGL from Python. Interpreted languages like Python are lovely in that there’s no compilation step, making experimentation much more rapid. If you’re a Perl person, there’s this module.
  • Daniel Rákos has an article about how to perform instance culling using the GPU, using OpenGL 3.2. The basic idea is to run the bounding volumes through the geometry shader for frustum culling and pipe out results as transform feedback, which is then used in a second pass for which instances to actually render. This type of technique has been done using DirectX (e.g., Froblins), Daniel shows how to do it in OpenGL and provides source.
  • Aras Pranckevičius has a worthwhile post on deferred rendering and mipmap bugs, along with some good follow-up comments.
  • John Ratcliff’s Code Suppository has lots of little handy graphics code tidbits and chunks. It’s moving here and here on Google Code, but the original page is much easier to skim.
  • Wolfgang Engel provides a nice little page of books and resources he recommends for upcoming graphics programmers, with some good follow-up comments. I hadn’t heard of the 3D Math Primer before. It gets high ratings on Amazon, and you can use Look Inside. Skimming it over, it does look like a good book, covering many topics with the space they deserve (vs. our sometimes quick zoom through them in our own book). Code snippets are also given throughout. The book mentions “The First Law of Computer Graphics,” but unfortunately the pages explaining it are blocked. Happily, I found it on Google Books: “If it looks right, it is right”. Whew, good, I honestly was concerned there was some law I had been breaking all these years.

… and I’m all caught up, my queue is empty! Well, there will be a special post tomorrow.

7 Things for February 9

Some news, and some olds.

  • HPG has a CFP. In slow motion,  this means the High Performance Graphics conference, June 25-27 in Saarbrucken, Germany, has a call for participation. Naty talked about this conference in his post two months ago; now the HPG website and CFP are up. In case you don’t recognize the conference’s name, this is the combination of the Graphics Hardware and Interactive Ray Tracing symposia. HPG was fantastic last year, with more useful (to me) papers than SIGGRAPH (where it was co-located). Potential submitters please note: because HPG 2010 is co-located with EGSR this year, the deadlines are very tight after SIGGRAPH notification and quite rigid. In other words, if your SIGGRAPH submission is rejected, you will have a very short time to revise and submit to HPG (i.e., by April 2nd).
  • NVIDIA has put up a list of talks at GDC in which it is participating, which will undoubtedly appear soon after on the web. In other NVIDIA news, there’s an interesting press release about NVIDIA and Avatar and how GPUs were used in precomputation of occlusion using ray tracing, for scenes with billions of polygons.
  • A handy tool for showing frame rate and capturing screenshots and video that is worth a mention again (it’s buried on the Resources page): FRAPS. It’s been around forever, continues to improve, and the basic version is free.
  • Crytek made an updated version of the famous Sponza model (used in many global illumination papers) available in OBJ and 3DS Max formats, along with textures. If you have the time, in theory 99 lines of code will make a picture for you.
  • Stefan Gustavson has a nice little demo of using distance fields for “perfect” text rendering. This type of technique has been used for a number of years in various games, such as Valve’s Team Fortress 2. The demo unfortunately falls apart when you rotate the scene off-axis, but otherwise is lovely.
  • SUBSTANCE is an application for making 3D evolutionary art. I really need more time on my hands to check this sort of tool out…
  • Theory for the day: we don’t have fur because our skin can show our emotions, which we pick up with our improved color perception.

7 things for January 22

There’s been some great stuff lately:

  • Gustavo Oliveira has an article in Gamasutra about writing an efficient cross-platform SIMD vector library and the tradeoffs involved. The last page was of particular interest, as I had wondered how effective the Intel C++ Compiler (ICC) was vs. Microsoft’s. He also provides downloadable source code and in-depth statistics.
  • NVIDIA has given some information abour Fermi, their next GPU. Warning: their page will automatically start some audio – annoying. You could just skip to the white paper. One big deal about Fermi is its support of doubles, which means it can be used for more science & engineering number-crunching. The Tech Report has a good overview article of other interesting features, and also presents benchmarking results.
  • Tests of OpenCL, the platform-independent parallel programming standard, have started to appear for AMD and NVIDIA GPUs.
  • Speaking of NVIDIA, their PhysX engine is getting some attention. The first video clip in this article gives a sense of the sorts of effects it can add. Pretty stuff, but the funny thing about PhysX is that it must accelerate computations that do not actually affect gameplay (i.e. it should not move around any objects in the scene differently than non-PhysX machines). This limits its use to particle systems and other eye candy. Not a diss—heck, most game graphics are about eye candy—but something to keep in mind.
  • Naty pointed out an article about how increasing the number of megapixels in a camera is just salesmanship and gains no actual benefit. The author later gives more explanation of his argument, which is that diffraction puts a physical limit on the useful size of a pixel for a given camera size.
  • Sony Pictures Imageworks has released a draft describing their Open Shading Language (OSL). While aimed at high-end rendering for films, it’s interesting to see what is built-in (e.g. deferred ray tracing) and what they consider important. Read the introduction for more information, or the draft itself.
  • My favorite infographic of the week: Avatar vs. Modern Warfare 2. Ignore the weird chartjunk concentric circles, focus on the numbers. The most amazing stat to me is the $200M advertising budget for MW2.

… and that’s seven; more later.

NVIDIA Optix ray-tracing API available – kind of

We’ve written about the NVIDIA Optix ray-tracing API (which used to be called NVIRT) once or twice before.  Well, today it is finally available – for free.  While it’s very nice of NVIDIA to make this available, there are a few caveats.

We already knew Optix would only work on NVIDIA hardware (duh), but the system requirements reveal another unwelcome fact; it does not even run on GeForce cards, only Tesla and Quadro (which are significantly more expensive than GeForce despite being based on exactly the same chips).  They say GeForce will be supported on their new Fermi architecture – I call shenanigans.

NVIDIA Jumps on the Cloud Rendering Bandwagon

In January, AMD and OToy announced Fusion Render Cloud, a centralized rendering server system which would perform rendering tasks for film and even games, compressing the resulting video and sending it over the internet.  In March, OnLive announced a similar system, but for the entire game, not just rendering.  Now NVIDIA has announced another cloud rendering system, called RealityServer, running on racks of Tesla GPUs (presumably using Fermi in future iterations).  This utilizes the iray ray tracing system developed by mental images, who also make mental ray (mental images has been owned by NVIDIA since 2007).

The compression is going to be key, since it has to be incredibly fast, extremely low bit rate and very high quality for this to work well.  I’m a bit skeptical of cloud rendering at the moment but maybe all these companies (and investors) know something I don’t…

NVIDIA Announces Fermi Architecture

Today at the GPU Technology Conference (the successor to last year’s NVISION), NVIDIA announced Fermi, their new GPU architecture (exactly one week after AMD shipped the first GPU from their new Radeon HD 5800 architecture).  NVIDIA have published a Fermi white paper, and writeups are popping up on the web.  Of these, the ones from Real World Technologies and AnandTech seem most informative.

With this announcement, NVIDIA is focusing firmly on the GPGPU market, rather than on graphics.  No details of the graphics-specific parts of the chip (such as triangle rasterizers and texture units) were even mentioned.  The chip looks like it will be significantly more expensive to manufacture than AMD’s chip, and at least some of that extra die area has been devoted to things which will not benefit most graphics applications (such as improved double-precision floating-point support and more general programming models).  With full support for indirect branches, a unified address space, and fine-grained exception handling, Fermi is as general purpose as it gets.  NVIDIA is even adding C++ support to CUDA (the first iterations of OpenCL and DirectCompute will likely not enable the most general programming models).

Compared to their previous architecture, NVIDIA has shuffled around the allocation of ALUs, thread scheduling units, and other resources.  To make sense of the soup of marketing terms such as “warps”, “cores”, and “SMs”,  I again recommend Kayvon Fatahalian’s SIGGRAPH 2009 presentation on GPU architecture.

7 Things for July 20th

While at SIGGRAPH I like to look at new books at the booths. One you may wish to check out is Graphics Shaders: Theory and Practice, from AK Peters (or just use “Look Inside” on Amazon). I received a review copy and skimmed through it. If you’re interested in programming in GLSL 1.2 (part of OpenGL 2.1), consider looking at this one. A minor problem is that it’s not quite as up-to-date as the Orange Book (now on OpenGL 3.1), but the difference in core concepts between language versions is not large. The Graphics Shaders book is full color and comes with a lot of GLSL code examples. It has a bias towards scientific visualization, though not so much that it neglects the basics. I particularly enjoyed the chapter on noise, as it gave one of the clearest explanations I’ve seen on the differences between various types of basic interactive noise functions. One or two elements in the book are a little weak – the flowcharts for pipelines are often too small and difficult to read, for example – but all in all this looks like a solid contribution to the field. Don’t expect more elaborate effects, e.g., shadows are not touched upon. It does cover the basics, plus some additional topics like image post-processing (not normally covered in texts I’ve seen). One of the authors wrote a nice learning tool for GLSL, glman, free for download. If you find you like this tool, definitely consider the book.

Another book I noticed recently is Fluid Simulation for Computer Graphics. This is a topic I know little about, I was just interested to see that there’s any book at all. It looks pretty equation-filled, so is definitely for the serious practitioner.

Speaking of fluid simulation, Intel has an article on this topic for games. One of the chief strengths of any publication is that its staff makes a decision based on merit as to what is published and what is culled. So, I have to admit to being leery of anything that says, “Sponsored Feature”, as that means editorial review and decision-making are gone. I tend to err on the side of ignoring such articles (there’s plenty to read already). That said, Intel’s had quite a number of these articles recently, including such topics as instancing, ocean fog, FFT’s for image processing, and quite a few on parallelism.

In the “clearing the queue” category of links, I don’t think I ever pointed out this handy page, which presents all AMD/ATI and NVIDIA presentations at GDC 2009.

There’s now a (not very active, but at least it exists) Microsoft DirectX blog.

On the OpenGL front, NVIDIA has introduced bindless graphics to help avoid L2 cache misses. I will be interested to see how APIs evolve, as the elements in the current APIs that are bottlenecks are not so much CPU or GPU limitations as due to the API constructs themselves.

Thing for the day: an advertisement with interesting stippling.