Category Archives: Reports

2011 Color and Imaging Conference, Part II: Courses A

CIC traditionally includes a strong course program, with a two-day course on fundamentals (a DVD of this course presented by Dr. Hunt can be purchased online) and a series of short courses on more specialized topics. Since I attended the fundamentals course last year, this year I only went to short courses. This blog post will detail three of these courses, with the others covered by a future post.

Color Pipelines for Computer Animated Features

The first part of the course was presented by Rod Bogart. Rod is the lead color science expert at Pixar, and worked on color-related issues at ILM before that.

The animated feature pipeline has many steps, some of which are color-critical (underlined) and some which aren’t: Story, Art, Layout, Animation, Shading, Lighting, Mastering, and Exhibition. The people working on the underlined stages are the ones with color-critical monitors on their desks. Rod’s talk went through the color-critical stages of the pipeline, discussing related topics on the way.

Art

In this stage people look at reference photos, establish color palettes, and do look development. Accurate color is important. Often, general studies are done on how exteriors, characters, etc. might look. This is mostly done in Photoshop on a Mac.

Art is the first stage where people make color-critical images. In general, all images made in animated feature production exist for one of two reasons – for looking at directly, or to be used for making more images (e.g., textures). The requirements for image processing will vary depending on which group they belong to. During the Art stage the images generated are intended for viewing.

Images for viewing can be quantized as low as 8 bits per channel, and even (carefully) compressed. Pixel values tend to be encoded to the display device (output referred). In the absence of a color management system, the encoding just maps to frame buffer values, which feed into a display response curve. However, it is better to tag the image with an assumed display device (ICC tagging to a target like sRGB; other metadata attributes can be stored with the image as well). It’s important to minimize color operations done on such images, since they have already been quantized and have no latitude for processing. These images contain low dynamic range (LDR) data.

During the Art phase, images are typically displayed on RGB additive displays calibrated to specific reference targets. Display reference targets include specifications for properties such as the chromaticity coordinates of the RGB primaries and white point, the display response curve, the display peak white luminance and the contrast ratio or black level.

Shading

Shading and antialiasing operations need to occur on linear light values – values that are proportional to physical light intensity. Other operations that require linear values include resizing, alpha compositing, and filtering. Rendered buffers are written out as HDR values and later used to generate the final image.

Lighting

Lighting is sometimes done with special light preview software, and sometimes using other methods such as “light soloing”. “Light soloing” is a common practice where a buffer is written out for the contribution of each light in the scene (all other lights are set to black) and then the lighters can use compositing software to vary individual light colors and intensities and combine the results.

For images such as these “solo light buffers” which are used to assemble viewable images, Pixar uses the OpenEXR format. This format stores linear scene values with a logarithmic distribution of numbers – each channel is a 16-bit half-float. The range of possible values is -65505.0 to +65505.0. The positive range can be thought of as 32 stops (powers of 2) of data, with 1024 steps in each of the stops.

After images are generated, they need to be viewed. This is done in various review spaces: monitors (CRT or calibrated LCD) on people’s desks, as well as various special rooms (review rooms, screening rooms, grading suites) where images are typically shown on DLP projectors. In review rooms the projector is usually hooked up directly to a workstation, while screening rooms use special digital cinema playback systems or “dailies” software. Pixar try not to have any monitors in the screening rooms – screening rooms are dark and the monitors are intended (and calibrated) for brighter rooms.

Mastering

The mastering process includes in-house color grading. This covers two kinds of operations: shot-to-shot corrections and per-master operations. An example of a shot-to-shot correction: in “Cars” in one of the shots the grass ended up being a slightly different color than in other shots in the sequence – instead of re-rendering the shot, it was graded to make the grass look more similar to the other shots. In contrast, per-master operations are done to make the film fit a specific presentation format.

Mastering for film: film has a different gamut than digital cinema projection. Neither is strictly larger – each has colors the other can’t handle. Digital is good for bright, saturated colors, especially primary colors – red, green, and blue. Film is good for dark, saturated colors, especially secondary colors – cyan, magenta, and yellow. Pixar doesn’t generate any film gamut colors that are outside the digital projection gamut, so they just need to worry about the opposite case – mapping colors from outside the film gamut so they fit inside it, and previewing the results during grading. Mapping into the film gamut is complex. Pixar try to move colors that are already in-gamut as little as possible (the ones near the gamut border do need to move a little to “make room” for the remapped colors). For the out-of-gamut colors, first Pixar tried a simple approach – moving to the closest point in the gamut boundary. However, this method doesn’t preserve hue. An example of the resulting problems: in the “Cars” night scene where Lightning McQueen and Mater go tractor-tipping, the closest-point gamut mapping made Lightning McQueen’s eyes go from blue (due to the night-time lighting) to pink, which was unacceptable. Pixar figured out a proprietary method which involves moving along color axes. This sometimes changes the chroma or lightness quite a bit, but tends to preserve hue and is more predictable for the colorist to tweak if needed. For film mastering Pixar project the content in the P3 color space (originally designed for digital projection), but with a warmer white point more typical of analog film projection.

Mastering for digital cinema: color grading for digital cinema is done in a tweaked version of the P3 color space – instead of using the standard P3 white point (which is quite greenish) they use D65, which is the white point people have been using on their monitors while creating the content. Finally a Digital Cinema Distribution Master (DCDM) is created – this stores colors in XYZ space, encoded at 12 bits per channel with a gamma of 2.6.

Mastering for HD (Blu-ray and HDTV broadcast): color grading for HD is done in the standard Rec.709 color space. The Rec.709 green and red primaries are much less saturated than the P3 ones; the blue primary has similar saturation to the P3 blue but is darker. The HD master is stored in RGB, quantized to 10 bits. Rod talked about the method Pixar use for dithering while quantization – it’s an interesting method that might be relevant for games as well. The naïve approach would be to round to the closest quantized value. This is the same as adding 0.5 and rounding down (truncating). Instead of adding 0.5, Pixar add a random number distributed uniformly between 0 and 1. This gives the same result on average, but dithers away a lot of the banding that would otherwise result.

Exhibition

Exhibition for digital cinema: this uses a Digital Cinema Package (DCP) in which each frame is compressed using JPEG2000. The compression is capped to 250 megabits per second – this limit was set during the early days of digital cinema, and any “extra features” such as stereo 3D, 4K resolution, etc. still have to fit under the same cap.

Exhibition for HD (Blu-ray, HDTV broadcast): the 10-bit RGB master is converted to YCbCr, chroma subsampled (4:2:2) and further quantized to 8 bits. This is all done with careful dithering, just like the initial 10 bit quantization. MPEG4 AVC compression is used for Blu-ray, with a 28-30 megabits per second average bit rate, 34 megabits per second peak.

Disney’s Digital Color Workflow – Featuring “Tangled”

The second part of the course was presented by Stefan Luka, a senior color science engineer at Walt Disney Animation Studios. Disney uses various display technologies, including CRT, LCD and DLP projectors. Each display has a gamut that defines the range of colors it can show. Disney previously used CRT displays, which have excellent color reproduction but are unstable over time and have a limited gamut. They now consider LCD color reproduction to finally be good enough to replace CRTs (several in the audience disputed this), and primarily use HP Dreamcolor LCD monitors. These are very stable, can support wide gamuts (due to their RGB LED backlights), and include programmable color processing.

Disney considered using Rec.709 calibration for the working displays, but the artists really wanted P3-calibrated displays, mostly to see better reds. Rec 709’s red primary is a bit orangish – P3’s red primary is very pure, it’s essentially on the spectral locus. Disney calibrate the displays with P3 primaries, a D65 white point, and a 2.2 gamma (which Stefan says matches the CRTs used at that time). The viewing environment in the artist’s rooms is not fully controlled, but the lighting is typically dim.

Disney calibrate their displays by mounting them in a box lined with black felt in front of a spectroradiometer. They measure the primaries and ramps on each channel to build lookup tables. For software Disney use a custom-tweaked version of a tool from HP called “Ookala” (the original is available on SourceForge). When calibrating they make sure to let the monitor warm up first, since LEDs are temperature dependent. The HP DreamColor has a temperature sensor which can be queried electronically, so this is easy to verify before starting calibration. Disney uses a spectroradiometer for calibration – Stefan said that colorimeters are generally not good enough to calibrate a display like this, though perhaps the latest one from X-Rite (the i1Display Pro) could work. Only people doing color-critical work have DreamColor monitors – Disney couldn’t afford to give them to everyone. People with non-color-critical jobs use cheaper displays.

During “Tangled” production, the texture artists painted display encoded RGB, saved as 16-bit (per channel) TIFF or PSD. They used sRGB encoding (managed via ICC or external metadata/LUT) since it makes the bottom bits go through better than a pure power curve. Textures were converted to linear RGB for rendering. Rendering occurred in linear light space; the resulting images had a soft roll-off applied to the highlights and were written to 16-bit TIFF (if they were saving to OpenEXR – which they plan to do for future movies – they wouldn’t have needed to roll-off the highlights). Compositing inputs and final images were all 16-bit TIFFs.

During post production final frames are conformed and prepared for grading. The basic grade is done for digital cinema, with trim passes for film, stereoscopic, and HD.

The digital cinema grade is done in a reference room with a DLP projector using P3 primaries, D65 white point, 2.2 gamma, and 14 foot-Lamberts reference white. The colorist uses “video” style RGB grading controls, and the result is encoded in 12-bit XYZ space with 2.6 gamma, dithered, and compressed using JPEG2000.

For the film deliverable, Disney adjust the projector white point and view the content through the same film gamut mapping that Pixar uses. They then do a trim pass. White point compensation is also needed; the content was previously viewed at D65 but needs to be adjusted for the native D55 film white point to avoid excessive brightness loss. A careful process needs to be done to bridge the gap between the two white points. At the output, film gamut mapping as well as an inverse film LUT is applied to go from the projector-previewed colors to values suitable for writing to film negative. Finally, Disney review the content at the film lab and call printer lights.

Stereo digital cinema – luminance is reduced to 4.5 foot-Lamberts (in the field there will be a range of stereo luminances, Disney make an assumption here that 4.5 is a reasonable target). They do a trim pass, boosting brightness, contrast, and saturation to compensate for the greatly reduced luminance. The colorist works with one stereo eye at a time (working with stereo glasses constantly would cause horrible headaches). Afterwards the result is reviewed with glasses, output & encoded similarly as the mono digital cinema deliverable.

HD mastering – Disney also use a DLP projector for HD, but view it through a Rec.709 color-space conversion and with reference white set to 100 nits. They do a trim pass (mostly global adjustments needed due to the increase in luminance), output and bake the values into Rec.709 color space. Then Disney compress and review final deliverables on a HD monitor in a correctly set up room with proper backlight etc.

After finishing “Tangled”, Disney wanted to determine whether it was really necessary for production to work in P3; could they instead work in Rec.709 and have the colorist tweak the digital cinema master to the wider P3 gamut? Stefan said that this question depends on the distribution of colors in a given movie, which in turn depends a lot on the art direction. Colors can go out of gamut due to saturation, or due to brightness, or both. Stefan analyzed the pixels that went out of Rec.709 gamut throughout “Tangled”. Most of the out-of-gamut colors were due to brightness – most importantly flesh tones. A few other colors went out of gamut due to saturation: skies, forests, dark burgundy velvet clothing on some of the characters, etc.

Stefan showed four example frames on a DreamColor monitor, comparing images in full P3 with the same images gamut-mapped to Rec.709. Two of the four barely changed. Of the remaining two, one was a forest scene with a cyan fog in the background which shifted to green when gamut-mapped. Another shot, with glowing hair, had colors out of Rec.709 gamut due to both saturation & brightness.

At the end of the day, the artists weren’t doing anything in P3 that couldn’t have been produced at the grading stage, so Stefan doesn’t think doing production in P3 had much of a benefit. P3 was mostly used to boost brightness, so working in 709 space with additional headroom (e.g. OpenEXR) would be good enough.

After “Tangled”, Disney moved from 16-bit TIFFs to OpenEXR, helped by their recent adoption of Nuke (which has fast floating-point compositing – “Tangled” was composited on Shake). They also eliminated the sRGB encoding curve, and now just use a 2.2 gamma without any LUTs. Disney no longer need to do a soft roll off of highlights when rendering since OpenEXR can contain the full highlight detail. They are doing some experiments with HDR tone mapping, especially tweaking the saturation. Disney have also moved to working in Rec.709 instead of P3 for production (for increased compatibility between formats) and are using non-wide-gamut monitors (still HP, but not DreamColor).

In the future, Disney plan to do more color management throughout the pipeline, probably using the open-source OpenColorIO library. They also plan to investigate improvements in gamut mapping, including local contrast preservation (taking account of which colors are placed next to each other spatially, and not collapsing them to the same color when gamut mapping).

Color in High-Dynamic Range Imaging

This course was presented by Greg Ward. Greg is a major figure in the HDR field, having developed various HDR image formats (LogLuv TIFF and JPEG-HDR, as well as the first HDR format, RGBE), the first widely-used HDR rendering system (RADIANCE), and the first commercially available HDR display, as well as various pieces of software relating to HDR (including the Photosphere HDR image builder and browsing program). He’s also done important work on reflectance models, but that’s outside the scope of this course.

HDR Color Space and Representations

Images can be scene-referred (data encodes scene intensities) or output-referred (data encodes display intensities). Since human visual abilities are (pretty much) known, and future display technologies are mostly unknown, then scene-referred images are more useful for long-term archival. Output-referred images are useful in the short term, for a specific class of display technology. Human perceptual abilities can be used to guide color space encoding of scene-referred images.

The human visual system is sensitive to luminance values over a range of about 1:1014, but not in a single image. The human simultaneous range is about 1:10,000. The range of sRGB displays is about 1:100.

The HDR imaging approach is to render or capture floating-point data in a color space that can store the entire perceivable gamut. Post-processing is done in the extended color space, and tone mapping is applied for each specific display. This is the method adopted in the Academy Color Encoding Specification (ACES) used for digital cinema. Manipulation of HDR data is much preferred because then you can adjust exposure and do other types of image manipulation with good results.

HDR imaging isn’t new – black & white color film can hold at least 4 orders of magnitude, and the final print has much less. Much of the talent of photographers like Ansel Adams was darkroom technique – “dodging” and “burning” to bring out the dynamic range of the scene on paper. The digital darkroom provides new challenges and opportunities.

Camera RAW is not HDR; the number of bits available is insufficient to encode HDR data. A comparison of several formats which are capable of encoding HDR follows (using various metrics, including error on an “acid test” image covering the entire visible gamut over a 1:108 dynamic range).

  • Radiance RBGE & XYZE: a simple format (three 8-bit mantissas and one 8-bit shared exponent) with open source libraries. Supports lossless (RLE) compression (20% average compression ratio). However, does not cover visible gamut, the large dynamic range comes at the expense of accuracy, and the color quantization is not perceptually uniform. RGBE had visible error on the “acid test” image, XYZE performed much better but still had some barely perceptible error.
  • IEEE 96-bit TIFF (IEEE 32-bit float for each channel) is the most accurate representation, but the files are enormous (even with compression – 32-bit IEEE floats don’t compress very well).
  • 16-bit per channel TIFF (RGB48) is supported by Photoshop and the TIFF libraries including libTIFF. 16 bits each of gamma-compressed R G and B; LZW lossless compression is available. However, does not cover the visible gamut, and most applications interpret the maximum as “white”, turning it into a high-precision LDR format rather than an HDR format.
  • SGI 24-bit LogLuv TIFF Codec: implemented in libTIFF. 10- bit log luminance, and a 14-bit lookup into a ‘rasterized human gamut’ in CIE (u’,v’) space. It just covers the visible gamut and range, but the dynamic range doesn’t leave headroom for processing and there is no compression support. Within its dynamic range limitations, it had barely perceptible errors on the “acid test” image (but failed completely outside these limits).
  • SGI 32-bit LogLuv TIFF Codec: also in libTIFF. A sign bit, 16-bit log luminance, and 8 bits each for CIE (u’,v’). Supports lossless (RLE) compression (30% average compression). It had barely perceptible errors on the “acid test” image.
  • ILM OpenEXR Format: 16-bit float per primary (sign bit, 5-bit exponent, 10-bit mantissa). Supports alpha and multichannel images, as well as several lossless compression options (2:1 typical compression – compressed sizes are competitive with other HDR formats). Has a full-featured open-source library as well as massive support by tools and GPU hardware. The only reasonably-sized format (i.e. excluding 96-bit TIFF) which could represent the entire “acid test” image with no visible error. However, it is relatively slow to read and write. Combined with CTL (Color Transformation Language – a similar concept to ICC, but designed for HDR images), OpenEXR is the foundation of the Academy of Motion Picture Arts & Sciences’ IIF (Image Interchange Framework).
  • Dolby’s JPEG-HDR (one of Greg’s projects): backwards-compatible JPEG extension for HDR. A tone-mapped sRGB image is stored for use by naïve (non-HDR-aware) applications; the (monochrome) ratio between the tone-mapped luminance and the original HDR scene luminance is stored in a subband. JPEG-HDR is very compact: about 1/10 the size of the other formats. However, it only supports lossy encoding (so repeated I/O will degrade the image) and has an expensive three-pass writing process. Dolby will soon release an improved version of JPEG-HDR on a trial basis; the current version is supported by a few applications, including Photoshop (through a plugin – not natively) and Photosphere (which will be detailed later in the course).

HDR Capture and Photosphere

Standard digital cameras capture about 2 orders of magnitude in sRGB space. Using multiple exposures enables building up HDR images, as long as the scene and camera are static. In the future, HDR imaging will be built directly into camera hardware, allowing for HDR capture with some amount of motion.

Multi-exposure merge works by using a spatially-variant weighting function that depends on where the values sit within each exposure. The camera’s response function needs to be recovered as well.

The Photosphere application (available online) implements the various algorithms discussed in this section. Exposures need to be aligned – Photosphere does this by generating median threshold bitmaps (MTBs) which are constant across exposures (unlike edge maps). MTBs are generated based on a grayscale image pyramid version of the original image, alignments are propagated up the pyramid. Rotational as well as translational alignments are supported. This technique was published by Greg in a 2003 paper in the Journal of Graphics Tools.

Photosphere also automatically removes “ghosts” (caused by objects which moved between exposures) and reconstructs an estimate of the point-spread function (PSF) for glare removal.

Greg then gave a demo of new Windows version of PhotoSphere, including its HDR image browsing and cataloging abilities. It’s merging capabilities also include the unique option of outputting absolute HDR values for all pixels, if the user inputs an absolute value for a single patch (this would typically be a grey card measured by a separate device). This only needs to be done once per camera.

Image-Based Lighting

Take an HDR (bracketed exposure) image of a mirrored ball, use for lighting. Use a background plate to fill in the “pinched” stuff in the back. Render synthetic objects with the lighting and composite into the real scene, with optional addition of shadows. Greg’s description of HDR lighting capture is a bit out of date – most VFX houses no longer use mirrored balls for this (they still use them for reference), instead panoramic cameras or DSLRs with a nodal mount are typically used.

Tone-Mapping and Display

A renderer is like an ideal camera. Tone mapping is medium-specific and goal-specific. The user needs to consider display gamut, dynamic range, and surround. What do we wish to simulate – cinematic camera and film, or human visual abilities and disabilities? Possible goals include colorimetric reproduction, matching visibility, or optimizing contrast & color sensitivity.

Histogram tone-mapping is a technique that generates a histogram of log luminance for the scene, and creates a curve that redistributes luminance to fit the output range.

Greg discussed various other tone mapping methods. He mentioned a SIGGRAPH 2005 paper that used an HDR display to compare many different tone-mapping operators.

HDR Display Technologies

  • Silicon Light Machines Grating Light Valve (GLV) – amazing dynamic range, widest gamut, still in development. Promising for digital cinema.
  • Dolby Professional Reference Monitor PRM-4200. It’s a LED-based 42″ production unit based on technology that Greg worked on. He says this is extended dynamic range, but not true HDR (it goes up to 600 cd/m2).
  • SIM2 Solar Series HDR display: this is also based on the (licensed) Dolby tech- Greg says this is closer to what Dolby originally had in mind. It’s a 47” display with a 2,206 LED backlight that goes up to 4000 cd/m2.

As an interesting example, Greg also discussed an HDR transparency (slide) viewer that he developed back in 1995 to evaluate tone mapping operators. It looks similar to a ViewMaster but uses much brighter lamps (50 Watts for each eye, necessitating a cooling fan and heat-absorbing glass) and two transparency layers – a black-and-white (blurry) “scaling” layer as well as a color (sharp) “detail” layer. Together these layers yield 1:10,000 contrast. The principles used are similar to other dual-modulator displays; the different resolution of the two layers avoids alignment problems. Sharp high-contrast edges work well despite the blurry scaling layer – scattering in the eye masks the artifacts that would otherwise result.

New displays based on RGB LED backlights have the potential to achieve not just high dynamic range but greatly expanded gamut – the new LEDs are spectrally pure and the LCD filters can select between them easily, resulting in very saturated primaries.

HDR Imaging in Cameras, Displays and Human Vision

The course was presented by Prof. Alessandro Rizzi from the Department of Information Science and Communication at the University of Milan. With John McCann, he co-authored the book “The Art and Science of HDR Imaging” on which this course is based.

HDR Issues

The imaging pipeline starts with scene radiances generated from the illumination and objects. These radiances go through a lens, a sensor in the image plane, and sensor image processing to generate a captured image. This image goes through media processing before being shown on a print or display, to generate display radiances. These go through the eye’s lens and intraocular medium, form an image on the retina, which is then processed by the vision system’s image processing to form the final reproduction appearance. Prof. Rizzi went over HDR issues relating to various stages in the pipeline.

The dynamic range issue relates to the scene radiances. Is it useful to define HDR based on a specific threshold number for the captured scene dynamic range? No. Prof. Rizzi defines HDR as “a rendition of a scene with greater dynamic range than the reproduction media”. In the case of prints, this is this is almost always the case since print media has an extremely low dynamic range. Renaissance painters were the first to successfully do HDR renditions – example paintings were shown and compared to similar photographs. The paintings were able to capture a much higher dynamic range while still appearing natural.

A table was shown of example light levels, each listed with luminance in cd/m2. Note that these values are all for the case of direct observation, e.g. “sun” refers to the brightness of the sun when looking at it directly (not recommended!) as opposed to looking at a surface illuminated by the sun (that is a separate entry).

  • Xenon short arc: 200,000 – 5,000,000,000
  • Sun: 1,600,000,000
  • Metal halide lamp: 10,000,000 – 60,000,000
  • Incandescent lamp: 20,000,000 – 26,000,000
  • Compact fluorescent lamp: 20,000 – 70,000
  • Fluorescent lamp: 5,000 – 30,000
  • Sunlit clouds: 10,000
  • Candle: 7,500
  • Blue sky: 5,000
  • Preferred values for indoor lighting: 50 – 500
  • White paper in sun: 10,000
  • White paper in 500 lux illumination (typical office lighting): 100
  • White paper in 5 lux illumination (very dim lighting, similar to candle-light): 1

The next issue, range limits and quantization, refers to the “captured image” stage of the imaging pipeline. A common misconception is that the problem involves squeezing the entire range of intensities which the human visual system can handle, from starlight at 10-6 cd/m2 to a flashbulb at 108 cd/m2, into the 1-100 cd/m2 range of a typical display. The fact is that the 10-6 — 108 cd/m2 range is only obtainable with isolated stimuli – humans can’t perceive a range like that in a single image. Another common misconception is to think of precision and range as being linked; e.g. 8-bit framebuffers imply a 1:255 contrast. Prof. Rizzi used a “salami” metaphor – the size of the salami represents the dynamic range, and the number of slices represents the quantization. Range and precision are orthogonal.

In most cases, the scene has a larger dynamic range than the sensor does. So with non-HDR image acquisition you have to give up some dynamic range in the highlights, the shadows, or both. The “HDR idea” is to bracket multiple acquisitions with different exposures to obtain an HDR image, and then “shrink” during tone mapping. But how? Tone mapping can be general, or can take account of a specific rendering intent. Naively “squeezing” all the detail into the final image leads to the kind of unnatural “black velvet painting”-looking “HDR” images commonly found on the web.

As an example, the response of film emulsions to light can be mapped via a density-exposure curve, commonly called a Hurter-Driffield or “H&D” curve. These curves map negative density vs. log exposure. They typically show an s-shape with a straight-line section in the middle where density is proportional to log exposure, with a “toe” on the underexposed part and a “shoulder” on the overexposed part. In photography, exposure time should be adjusted so densities lie on the straight-line portion of the curve. With a single exposure, this is not possible for the entire scene – you can’t get both shadow detail and highlight detail, so in practice only midtones are captured with full detail.

History of HDR Imaging

Before the Chiaroscuro technique was introduced, it was hard to convey brightness in painting. Chiaroscuro (the use of strong contrasts between bright and dark regions) allowed artists to convey the impression of very high scene dynamic ranges despite the very low dynamic range of the actual paintings.

HDR photography dates back to the 1850s; a notable example being the photograph “Fading Away” by H.P.Robinson, which combined five exposures. In the early 20th century, C. E. K. Mees (director of research at Kodak) worked on implementing a desirable tone reproduction curve in film. Mees showed a two-negative photograph in his 1920 book as an example of desirable scene reproduction, and worked to achieve similar results with single-negative prints. Under Mees’ direction, the Kodak Research Laboratory found that an s-shaped curve produced pleasing image reproductions, and implemented it photochemically.

Ansel Adams developed the zone system around 1940 to codify a method for photographers to expose their images in such a way as to take maximum advantage of the negative and print film tone reproduction curves. Soon after, in 1941, L. A. Jones and H. R. Condit published an important study measuring the dynamic range of various real-world scenes. The range was between 27:1 and 750:1, with 160:1 being average. They also found that flare is a more important limit on camera dynamic range than the film response.

The Retinex theory of vision developed around 1967 from the observation that luminance ratios between adjacent patches are the same in the sun and the shade. While absolute luminances don’t always correspond to lightness appearance (due to spatial factors), the ratio of luminances at an edge do correspond strongly to the ratio in lightness appearance. Retinex processing starts with ratios of apparent lightness at all edges in the image and propagates these to find a global solution for the apparent lightness of all the pixels in the image. In the 1980s this research led to a prototype “Retinex camera” which was actually a slide developing device. Full-resolution digital electronics was not feasible, so a low-resolution (64×64) CCD was used to generate a “correction mask” which modulated a low-contrast photographic negative during development. This produced a final rendering of the image which was consistent with visual appearance. The intent was to incorporate this research in a Polaroid instant camera but this product never saw the light of day.

Measuring the Dynamic Range

The sensor’s dynamic range is limited but slowly getting better – Prof. Rizzi briefly went over some recent research into HDR sensor architectures.

Given limited digital sensor dynamic range, multiple exposures are needed to capture an HDR image. This can be done via sequential exposure change, or by using multiple image detectors at once.

There have been various methods developed for composing the exposures. Before Paul Debevec’s 1997 paper “Recovering High Dynamic Range Radiance Maps from Photographs”, the emphasis was on generating pleasing pictures. From 1997 on, research focused primarily on accurately measuring scene radiance values. Combined with recent work on HDR displays, this holds the potential of accurate scene reproduction.

However, veiling glare is a physical limit on HDR image acquisition and display. At acquisition time, glare is composed of various scattered light in the camera – air-glass reflections at the various lens elements, camera wall reflections, sensor surface reflections, etc. The effect of glare on the lighter regions of the image is small, but darker regions are affected much more strongly, which limits the overall contrast (dynamic range).

Prof. Rizzi described an experiment which measured the degree to which glare limits HDR acquisition, for both digital and film cameras. A test target was assembled out of Kodak Print Scale step-wedges (circles divided into 10 wedges which transmit different amounts of light, ranging from 4% to 82%) and neutral density filters to create a test target with almost 19,000:1 dynamic range. This target was photographed against different surrounds to vary the amount of glare.

In moderate-glare scenes, glare reduced the dynamic range at the sensor or film image plane to less than 1,000:1; in high-glare scenes, to less than 100:1. This limited the range that could be measured via multiple digital exposures (negative film has more dynamic range – about 10,000:1 – than the camera glare limit, so in the case of film multiple exposures were pointless).

While camera glare limits the amount of scene dynamic range that can be captured, glare in the eye limits the amount of display dynamic range which is useful to have.

Experiments were also done with observers estimating the brightness of the various sectors on the test target. There was a high degree of agreement between the observers. The perceived brightness was strongly affected by spatial factors; the brightness differences between the segments of each circle were perceived to be very large, and the differences between the individual circles were perceived to be very small. Prof. Rizzi claimed that a global tone scale cannot correctly render appearance, since spatial factors predominate.

Spatial factors also required designing a new target, so that glare could be separated from neural contrast effects. For this target, both single-layer and double-layer projected transparencies were used, allowing them to vary the dynamic range from about 500:1 to about 250,000:1 while keeping glare and surround constant.

For low-glare images (average luminance = 8% of maximum luminance), the observers could detect appearance changes over a dynamic range of a little under 1000:1. For high-glare images (average luminance = 50% max luminance), this decreased to about 200:1. Two extreme cases were also tested: with a white surround (extreme glare) the usable dynamic range was about 100:1 and with black surround (almost no glare at all) it increased to 100,000:1. The black surround case (which is not representative of the vast majority of real images) was the only one in which the high-dynamic range image had a significant advantage, and even there the visible difference only affected the shadow region – the bottom 30% of perceived brightnesses. These results indicate that dramatically increasing display dynamic range has minor effects on the perceived image; glare inside the eye limits the effect.

Separating Glare and Contrast

Glare inside the eye reduces the contrast of the image on the retina, but neural contrast increases the contrast of the visual signal going to the brain. These two effects tend to act in opposition (for example, brightening the surround of an image will increase both effects), but they vary differently with distance and do not cancel out exactly.

It is possible to estimate the retinal image based on the CIE Glare Spread Function (GSF). When doing so for the images in the experiment above, the high-glare target (where observers could identify changes over a dynamic range of 200:1) formed an image on the retina with a dynamic range of about 100:1. With white surround (usable dynamic range of 100:1) the retinal image had a dynamic range of about 25:1 and with black surround (usable dynamic range of 100,000:1) the retinal image had a dynamic range of about 3000:1. It seems that neural contrast partially compensates for the intra-ocular glare; both effects are scene dependent.

Scene Content Controls Appearance

The appearance of a pixel cannot be predicted from its intensity values – no global tone mapping operator can mimic human vision. An image dependent, local operator is needed. The human visual system performs local range compression. It is important to choose a rendering intent – reproduce the original scene radiances, scene reflectances, scene appearance, a pleasing image, etc. If the desire is to predict appearance then Retinex processing does a pretty good job in many cases.

Color in HDR

Two different data sets can be used to describe color: CMF (color matching functions – low-level sensor data) or UCS (uniform color space – high-level perceptual information).

CMF are used for color matching and metamerism preservation. They are linear transforms of cone sensitivities modified by pre-retinal absorptions. They have no spatial information, and cannot predict appearance.

UCS – for example CIEL*a*b*. Lightness (L*) is a cube root of luminance, which compresses the visible range. 99% of possible perceived lightness values fall in a 1000:1 region of scene dynamic range. This fits well with visual limitations caused by glare.

There are some discrepancies between data from appearance experiments with observers and measurements of retinal cone response.

First discrepancy: the peaks of the color-matching functions do not line up with the peaks of the cone sensitivity functions. This is addressed by including pre-retinal absorptions, which shift peak sensitivities to longer wavelengths.

Second discrepancy: retinal cones have a logarithmic response to light, but observers report a cube-root response. This is addressed by taking account of intra-ocular glare; it turns out that due to glare, a cube-root variation in light entering the eye turns into a logarithmic variation in light at the retina.

HDR Image Processing

Around 2002-2006, Robert Sobol developed a variant of Retinex which was implemented in a (discontinued) line of Hewlett-Packard cameras; the feature was marketed as “Digital Flash”. This produced very good results and could even predict certain features of well-known perceptual illusions such as “Adelson’s Checkerboard and Tower”, which were commonly thought to be evidence of cognitive effects in lightness perception.

ACE (Automatic Color Equalization) (which Prof. Rizzi worked on) and STRESS (Spatio-Temporal Retinex-inspired Envelope with Stochastic Sampling) are other examples of spatially-aware HDR image processing algorithms. Several examples were shown to demonstrate that spatially-aware (local) algorithms produce superior results to global tone mapping operators.

Prof. Rizzi described an experiment made with a “3D Mondrian” model – a physical scene with differently colored blocks, under different illumination conditions. Various HDR processing algorithms were run on captured images of the scene, and compared with observers estimations of the colors as well as a painter’s rendition (attempting to reproduce the perceptual appearance as closely as possible). The results were interesting – appearance does not appear to correlate specifically to reflectance vs. illumination, but rather to edges vs. gradients. The results appeared to support the goals of Retinex and similar algorithms.

Prof. Rizzi finished the course with some “take home” points:

  • HDR works well, because it preserves image information, not because it is more accurate (accurate reproduction of scene luminances is not possible in the general case).
  • Dynamic range acquisition is limited by glare, which cannot be removed.
  • Our vision system is also limited by glare, which is counteracted to some degree by neural contrast.
  • Accurate reproduction of scene radiance is not needed; reproduction of appearance is important and possible without reproducing the original stimulus.
  • Appearances are scene-dependent, not pixel-based.
  • Edges and gradients generate HDR appearance and color constancy.
http://www.graphics.cornell.edu/online/formats/rgbe/

2011 Color and Imaging Conference, Part I: Introduction

A few weeks ago, I attended the 2011 Color and Imaging Conference (CIC). CIC is a small conference (a little under 200 attendees) that nevertheless commands an important role in the fields of color science and digital imaging, similar to SIGGRAPH’s importance to computer graphics. CIC is co-sponsored by the Society for Imaging Science and Technology (IS&T) and the Society for Information Display (SID); it has been held annually in various US locations since 1993.

I attended this conference for the first time last year. In both years I attended, most of the conference attendees were academic color science researchers (the field appears to be dominated by a handful of institutions, most notably the color labs at the Rochester Institute of Technology and the University of Leeds), with the remainder primarily representing the R&D divisions of various camera, printer, display, and mobile phone manufacturers. There are typically also a few color experts from film companies such as Technicolor, ILM, Pixar, and Disney. I didn’t see any other game developers – I hope this will change in future years, as our industry starts paying more attention to this critical area.

Despite its modest attendance numbers, CIC boasted an impressive array of sessions, including courses, papers, short papers, and several keynotes. The content was of very high quality. The conference organizers are currently in the process of posting video of most of the conference content for free streaming and download in a variety of formats – a step which organizers of other conferences (such as SIGGRAPH) would do well to emulate.

I’ll be putting up several other posts with details of the conference content. They will be coming in rapid succession since I’m editing them down from an existing document (a report I did for work).

Do you spell these two words correctly?

We all have dumb little blind spots. As a kid, I thought “Achilles” was pronounced “a-chi-elz” and, heaven knows how, “etiquette” was somehow “eh-teak”. When you say goofy things to other people, someone eventually corrects you. However, if most of the people around you are making the same mistake (I’m sorry, “nuclear” is not pronounced “new-cue-lar”, it just ain’t so), the error never gets corrected. I’ve already mentioned the faux pas of pronouncing SIGGRAPH as “see-graph”, which seems to be popular among non-researchers (well, admittedly there’s no “correct” pronunciation on that one, it’s just that when the conference was small and mostly researchers that “sih-graph” was the way to say it. If the majority now say “see-graph”, so be it – you then identify yourself as a general attendee or a sales person and I can feel superior to you for no valid reason, thanks).

Certain spelling errors persist in computer graphics, perhaps because it’s more work to give feedback on writing mistakes. We also see others make the same mistakes and assume they’re correct. So, here are the two I believe are the most popular goofs in computer graphics (and I can attest that I used to make them myself, once upon a time):

Tesselation – that’s incorrect, it’s “tessellation”. By all rules of English, this word truly should have just one “l”: relation, violation, adulation, ululation, emulation, and on and on, they have just one “l”. The only exceptions I could find with two “l”s were “collation”, “illation” (what the heck is that?), and a word starting with “fe” (I don’t want this post to get filtered).

The word “tessellation” is derived from “tessella” (plural “tessellae”), which is a small piece of stone or glass used in a mosaic. It’s the diminutive of “tessera”, which can also mean a small tablet or block used as a ticket or token (but “tessella” is never a small ticket). Whatever. In Ionic Greek “tesseres” means “four”, so “tessella” makes sense as being a small four-sided thing. For me, knowing that “tessella” is from the ancient Greek word for a piece in a mosaic somehow helps me to catch my spelling of it – maybe it will work for you. I know that in typing “tessella” in this post I still first put a single “l” numerous times, that’s what English tells me to do.

Google test: searching on “tessellation” on Google gives 2,580,000 pages. Searching on “tesselation -tessellation”, which gives only pages with the misspelled version, gives 1,800,000 pages. It’s nice to see that the correct spelling still outnumbers the incorrect, but the race is on. That said, this sort of test is accurate to within say plus or minus say 350%. If you search on “tessellation -tesselation”, which should give a smaller number of pages (subtracting out those that I assume say “‘tesselation’ is a misspelling of ‘tessellation'” or that reference a paper with “tesselation” in the title), you get 8,450,000! How you can get more than 3 times as many pages as just searching on “tessellation” is a mystery. Finally, searching on “tessellation tesselation”, both words on the same page, gives 3,150,000 results. Makes me want to go count those pages by hand. No it doesn’t.

One other place to search is the ACM Digital Library. There are 2,973 entries with “tessellation” in them, 375 with “tesselation”. To search just computer graphics publications, GRAPHBIB is a bit clunky but will do: 89 hits for “tessellation”, 18 hits for the wrong one. Not terrible, but that’s still a solid 20% incorrect.

Frustrum – that’s incorrect, it’s “frustum” (plural “frusta”, which even looks wrong to me – I want to say “frustra”). The word means a (finite) cone or pyramid with the tip chopped off, and we use it (always) to mean the pyramidal volume in graphics. I don’t know why the extra “r” got into this word for some people (myself included). Maybe it’s because the word then sort-of rhymes with itself, the “ru” from the first part mirrored in the second. But “frustra” looks even more correct to me, no idea why. Maybe it’s that it rolls off the tongue better.

Morgan McGuire pointed this one out to me as the most common misspelling he sees. As a professor, he no doubt spends more time teaching about frusta than tessellations. Using the wildly-inaccurate Google test, there are 673,000 frustum pages and 363,000 “frustrum -frustum” pages. And, confusingly, again, 2,100,000 “frustum -frustrum” pages, more than three times as many as pages as just “frustum”. Please explain, someone. For the digital library, 1,114 vs. 53. For GRAPHBIB I was happy to see 42 hits vs. just 1 hit (“General Clipping on an Oblique Viewing Frustrum”).

So the frustum misspell looks like one that is less likely at the start and is almost gone by the time practitioners are publishing articles, vs. the tessellation misspell, which appears to have more staying power.

Addenda: Aaron Hertzmann notes that the US and Britain double their letters differently (“calliper”? That’s just unnatural, Brits). He also notes the Oxford English Dictionary says about tessellate: “(US also tesselate)”. Which actually is fine with me, except for the fact that Microsoft Word, Google’s spellchecker, and even this blog’s software flags “tesselate” as a misspelling. If only we had the equivalent of the Académie française to decide how we all should spell (on second thought, no).

Spike Hughes notes: “I think the answer for ‘frustrum’ is that it starts out like ‘frustrate’ (and indeed, seems logically related: the pyramid WANTS to go all the way to the eye point, but is frustrated by the near-plane).” This makes a lot of sense to me, and would explain why “frustra” feels even more correct. Maybe that’s the mnemonic aid, like how with “it’s” vs. “its” there’s “It’s a wise dog that knows its own fleas”. You don’t have to remember the spelling of each “its”, just remember that they differ; then knowing “it’s” is “it is” means you can derive that the possessive “its” doesn’t have an apostrophe. Or something. So maybe, “Don’t get frustrated when drawing a frustum”, remembering that they differ. Andrew Glassner offers: “There’s no rum in a frustum,” because the poor thing has the top chopped off, so all the rum we poured inside has evaporated.

Update on SIGGRAPH 2011 Beyond Programmable Shading Course

I have recently been notified by Aaron Lefohn that there have been some changes to the Beyond Programmable Shading course since I last described it here.

The new schedule is below. I’m especially interested to see the presentation by Raja Koduri (former CTO of AMD’s graphics division and now a graphics architect at Apple) – according to Aaron, it’s “an introduction to reasoning about power for rendering researchers”. Power is a very important constraint which is little-understood by most algorithm researchers and software developers. We are not too far from regularly having to take account of power consumption in graphics algorithm design (since an algorithm which causes the GPU to burn too much power may force clock speed reduction, negatively affecting performance). The topic of the closing panel is also an interesting one – graphics APIs have undergone some interesting changes, and I suspect will undergo more profound ones in the near future.

Beyond Programmable Shading I

9:00 Introduction [Aaron Lefohn, Intel]

9:20 Research in Games [Peter-Pike Sloan, Disney Interactive]

9:45 The “Power” of 3D Rendering [Raja Koduri, Apple]

10:15 Real-Time Rendering Architectures [Mike Houston, AMD]

10:45 Scheduling the Graphics Pipeline [Jonathan Ragan-Kelley, MIT]

11:15 Parallel Programming for Real-Time Graphics [Aaron Lefohn, Intel]

11:45 Software rasterization on GPUs [Samuli Laine and Jacopo Pantaleoni, NVIDIA]

Beyond Programmable Shading II

14:00 Welcome and Re-Introduction [Mike Houston, AMD]

14:05 Toward a blurry rasterizer (state of the art) [Jacob Munkberg, Intel]

14:45 Order-independent transparency (state of the art) [Marco Salvi, Intel]

15:15 Interative global illumination (state of the art) [Chris Wyman, Univ. of Iowa]

15:45 User-defined pipelines for ray tracing [Steve Parker, NVIDIA]

16:30 Panel: “What Is the Right Cross-Platform Abstraction Level for Real-Time 3D Rendering?”

  • Peter-Pike Sloan, Disney Interactive (Moderator)
  • David Blythe, Intel (Panelist)
  • Raja Koduri, Apple (Panelist)
  • Henry Moreton, NVIDIA (Panelist)
  • Mike Houston, AMD (Panelist)
  • Chas Boyd, Microsoft (Panelist)

SIGGRAPH 2011 Talks – Part 3

This is the third and last in a series of posts about the SIGGRAPH 2011 Talk program – see Part 1 and Part 2. If you found these useful you may also want to check out my previous series of posts about the SIGGRAPH 2011 Courses program (see Part 1, Part 2, Part 3, and Part 4). These posts are not intended as a general SIGGRAPH survey – they are focused on content related to real-time rendering and game development.

Show Me The Pixels

Three of the talks in this session have possibly relevant content:

  • Slow Art With a Trillion Frames Per Second Camera – I guess this one stretches the definition of “relevant” somewhat, but I just find it extremely cool and interesting. The talk describes some research done at MIT (in collaboration between the Media Lab and Department of Chemistry) in which a “trillion frames per second camera” captures how pulses of light travel within a scene, including bouncing off surfaces and scattering inside objects. Besides the general coolness factor, this may impart some insight into light behavior which could be useful when working on shading and lighting models.
  • Device-Independent Imaging System for High-Fidelity Colors – color management (including display calibration, color space management of data, etc.) is important for both game and film production. It turns out that getting good device-independent color reproduction is far from simple. This talk covers some advances in this field by SHARP Corporation and Shizuoka University.
  • Who Do You Think You Really Are? – augmented reality is becoming an important technology for handheld games (see examples on the Nintendo 3DS and iPhone); this talk discusses an interactive media installation at London’s Natural History Museum (in partnership with BBC Television) which includes augmented reality elements.

Hiding Complexity

This entire session is comprised of game industry talks:

  • Occlusion Culling in Alan Wake – occlusion culling is a key technology for many games, especially first-person shooters. This talk discusses the occlusion culling system (developed by Umbra Software) used in the game Alan Wake by Remedy Entertainment. Topics include visibility culling as well as shadow-caster culling for dynamic light sources.
  • Increasing Scene Complexity: Distributed Vectorized View Culling – another talk on visibility culling, this time focusing on the technical issues involved in parallelizing culling computations on current game platforms. The talk is given by Electronic Arts Blackbox.
  • Practical Occlusion Culling in Killzone 3 – the third occlusion culling talk of the session focuses on the implementation used by Guerrilla Games for the game Killzone 3. This implementation uses PlayStation 3 SPUs to rasterize a conservative depth buffer, against which occlusion queries are performed.
  • High-Quality Previewing of Shading and Lighting for Killzone 3 – another Killzone 3 talk but unrelated to occlusion culling, this talk by Guerrilla Games covers a content creation framework which supports high-fidelity previews of assets in Autodesk Maya.

Smokin’ Fluids

The talks in this session (three from the film industry and one from the academic research community) cover topics related to smoke and fluid simulation. Such simulations are currently too costly to be feasible for most games, though games such as the LittleBigPlanet and PixelJunk Shooter series (both featured at SIGGRAPH this year) include two-dimensional versions. In VFX and CG animation work smoke, fluid and fire simulations are common, forming one of the key elements differentiating film and game visuals. I firmly believe that as game platforms increase in computational power, we will start seeing full 3D simulations of this kind in games.

  • DB+Grid: A Novel Dynamic Blocked Grid For Sparse High-Resolution Volumes and Level Sets – The author, Ken Museth, has a history of developing novel data structures for level set and volumetric data and applying them for VFX, first at Digital Domain and now at DreamWorks Animation. His data structures have been constantly improving, from DT-Grid to DB-Grid and now DB+Grid, which is described in this talk.
  • Capturing Thin Features in Smoke Simulations – In production simulation work, there is a constant tension between the need to speed up simulation times for faster iteration (which implies reducing the resolution of the simulation grid) and the desire to simulate finer detail (which implies increasing the resolution). This talk covers a system developed by Sony Pictures Imageworks that allows thin smoke features to be captured even with low resolution simulation grids.
  • Implicit FEM and Fluid Coupling on GPU for Interactive Multiphysics Simulation – typically distinct simulation methods are used for fluids, rigid objects, deformable objects, etc. This can pose problems when different types of objects can affect each other, which requires coupling different simulation methods. This talk from INRIA and Université de Grenoble covers a GPU-based method for coupled simulation of deformable objects and fluids – interestingly “screen-space collision” is mentioned as one of the techniques employed.
  • Correcting Low-Frequency Impulses in Distributed Simulations – production rendering is typically distributed over a large number of machines. It is desirable to do the same for simulations, but often this is difficult since the simulation domain is not easily separable – each part of the simulation affects all other parts. This talk from Side Effects Software (developers of Houdini) describes a method for distributing level-set fluid simulations while keeping them coupled via a shared low-resolution pressure projection.

Volumes and Rendering

All four talks in this session (three from the film industry and one from the academic research community) contain potentially relevant content:

  • Gaussian Quadrature for Photon Beams in “Tangled” – rendering lighting effects in participating media (often called “light beams” or “god rays”) is a common problem in games and film, typically solved with various hacks. A recent Transactions on Graphics (ToG) paper presented a comprehensive analysis of the problem as well as a new rendering approach called “photon beams” which is both physically correct and efficient – it appears potentially feasible for real-time implementation. This talk (with authors from the University of Central Florida, Disney Research Zürich, and Walt Disney Animation Studios – including the first author of the aforementioned ToG paper) presents an efficient implementation of the photon beams technique in Renderman, extending it to artist-specified non-physical light attenuation curves. A broader overview of the artist-driven volumetric lighting in Tangled (of which this work is a part) is given in a Technical Paper.
  • Importance Sampling of Area Lights in Participating Media – in principle, ray tracers like the Arnold rendering engine (developed by Solid Angle SL and used by Sony Pictures Imageworks, among others) solve the participating media lighting problem in a straightforward manner by sampling the underlying integrals. In practice, achieving noise-free images in reasonable time requires a lot of engineering effort, mostly relying on various forms of importance sampling. This talk (with authors from both Solid Angle and Imageworks) presents an importance sampling method for single scattering of light from arbitrary area lights in homogeneous participating media.
  • Decoupled Ray Marching of Heterogeneous Participating Media – after two talks on the relatively easy problem of lighting homogeneous participating media, this talk (also from Sony Pictures Imageworks) covers heterogeneous media such as smoke. It covers a method for speeding up ray marching by decoupling lighting calculations from the sampling of volume properties. Ray marching is amenable to real-time implementation since it is easy to scale down (albeit with reduced visual quality) by reducing the number of samples – several companies have demonstrated real-time implementations (though I’m not sure if any shipping games yet use it). The technique presented in this talk can make raymarching for volumetric lighting even faster, so is definitely of interest.
  • Demand-Driven Volume Rendering of Terascale EM Data -unlike the other talks in this session which focus on volumetric lighting, this talk (from King Abdullah University of Science and Technology and Harvard University) focuses on a different issue – rendering volumetric datasets which are too large to fit in memory. Given a good solution to this problem, games should be able to precompute volumetric effects in certain situations and stream them from disk, so this looks interesting.

Heads or Tails

Rigging game or movie characters for animation is a very tricky problem – the rig needs to be powerful enough to handle all needed motions and deformations, while also being easy to control either via hand-keying or motion capture. This session includes two CG feature animation talks and one research talk, all covering the rigging problem from different angles (note that the game industry talk Modular Rigging in Battlefield 3 has been cancelled). Character rigging is one of the areas where film and game production are quite similar – there are differences in scale and complexity, but even these are not so large as differences in say, triangle count or shader instructions.

  • Building the Birds of “Rio” – this talk covers the process and technology used at Blue Sky Studios to build control systems for the bird characters in the movie Rio – using the main character “Blu” as a case study.
  • “Kung Fu Panda 2”: Rigging a Peacock Tail – this talk describes the approach DreamWorks Animation used to create the tail rig for the peacock character in the film Kung Fu Panda 2.
  • Optimized Local Blendshape Mapping for Facial-Motion Retargeting -this talk from the Graphics Lab at the USC Institute for Creative Technologies details an automatic facial-motion retargeting method for blendshapes.

Speed of Light

Three of the talks in this session contain potentially relevant content:

  • Run-Time Implementation of Modular Radiance Transfer – Precomputed Radiance Transfer is a powerful rendering technique which has spun off many variations. This talk from Disney Interactive Studios, Disney Research Zürich, the University of Utah and the University of North Carolina at Chapel Hill covers a modular variant which enables warping and combining precomputed transport from a small library of simple shapes. The technique was implemented for platforms from mobile devices to high-end GPUs – the talk discusses various implementation issues involved.
  • Next-Generation Image-Based Lighting Using HDR Video – image-based lighting is becoming a key rendering technique in both film and games. This talk from Linköping University and Spheron VR describes a system for high-dynamic-range video capture, reconstruction, and modeling of real-world scenes for use in image-based lighting of synthetic objects placed in the scene.
  • Triple Depth Culling – real-time rendering applications such as games rely heavily upon hardware features such as hierarchical Z-culling for performance. However, this has some drawbacks – it requires either depth sorting or a previous depth prepass, and it doesn’t work well with shaders that modify depth. This talk proposes a technique to avoid these drawbacks – the authors show a pixel shader implementation, though for best performance they suggest that the technique be implemented in hardware. The talk abstract and video are both available online.

Capture and Construction

This session has one film talk of relevance: Building and Animating Cobwebs for Antique Sets. It describes a workflow used at DreamWorks Animation to model and animate cobwebs, including a specialized modeling tool, a physics-based solver, and a procedural-modeling engine. These types of specialized asset workflows can be extremely effective for games or movies which require many examples of a given kind of asset.

Light My Fire

This session has one game talk, as well as three relevant film talks:

  • Simulating Massive Dust in “Megamind” – in film production, there is a constant push for fluid simulations to continually increase in size and complexity, but the need for fine control by artists implies fast turnaround times. For this reason a lot of research and development is spent on making these simulations faster – research that I hope will eventually benefit real-time applications as well. This talk from t DreamWorks Animation covers a fast fluid simulation framework used for the movie Megamind. The presentation covers the specific numerical methods used to ensure efficiency and quality, as well as the setup and control framework that allowed artists to work efficiently.
  • “Megamind”: Fire, Smoke, and Data – another Megamind talk, this time focusing on the specific case study of an especially large and involved explosion effect. I like attending such “war story” talks – the most interesting film and game work is done when trying to push boundaries, and the solutions are often a mixture of technical cleverness and artistic inspiration.
  • Volumetric Effects in a Snap – grid-based simulation and volumetric rendering frameworks have become a staple of VFX and CG feature animation work; every studio has its own system with different strengths. I suspect similar systems will start cropping up in game studios when the hardware becomes a bit faster and memory capacities increase a bit more. This talk describes the creation of the “Snap” system developed at Animal Logic and used in the films Legend of the Guardians: The Owls of Ga’Hoole and Sucker Punch.
  • Fluid Dynamics and Lighting Implementation in PixelJunk Shooter 2 – games rarely incorporate fluid simulations – including 2D games, though current platforms can run two-dimensional simulations quite quickly. LittleBigPlanet notably incorporated 2D fluid simulations in its fire and smoke effects, but these did not affect gameplay. The game PixelJunk Shooter incorporated some very nice fluid-simulation-driven gameplay, including several types of fluids and gases that affected each other in different ways. The recent sequel expanded this gameplay element, adding some novel light/darkness gameplay as well. This talk from independent developer Q-Games covers the technical aspects of these elements.

Now that I’ve finished the courses and talks, my next few blog posts will cover the remaining SIGGRAPH 2011 programs.

SIGGRAPH 2011 Talks – Part 2

This is the second in a series of posts on the SIGGRAPH 2011 Talk program – Part 1 can be found here. These posts focus on talks with relevant content for real-time rendering researchers and practitioners, including game developers.

Building Blocks

One of the talks in this session looks relevant – KinectFusion: Real-Time Dynamic 3D Surface Reconstruction and Interaction describes the use of a Kinect camera to acquire real-time dense 3D models of an entire room and its contents, enabling some interesting augmented reality and interaction possibilities. The reconstruction appears to require a high-end GPU to achieve real-time performance so this isn’t something for current generation consoles, but it definitely could be feasible on future platforms. It may also be interesting in the context of digitizing real-world objects as part of the film or game modeling process. The authors are from Microsoft Research Cambridge, except for one from Imperial College London.

Walk the Line

Two academic research talks in this session are potentially relevant for games and other real-time applications that use stylized rendering or deformations:

  • Parameterizing Animated Lines for Stylized Rendering – this talk describes a paper from the 2011 NPAR (Non-Photorealistic Animation and Rendering) conference (which is co-located with SIGGRAPH this year). It shows a way to have details along an outline track the geometry cleanly as the scene animates in 3D. Material from the NPAR paper can be found here. The authors are from École d’Ingénieurs Télécom ParisTech, except for one from Adobe.
  • Multiperspective Rendering for Anime-Like Exaggeration of Joint Models – this talk describes a more unusual type of stylization, where the model deforms in a stylized way as it animates, inspired by anime visual conventions. The authors are from Hitachi, except for one from The University of Tokyo.

1000 Points of Light

This session contains one game talk, as well as two relevant CG feature animation talks:

  • Lighting Tokyo for Pixar’s “Cars 2” – rendering cities at night is challenging (definitely for games, but even for CG feature animation) due to the extreme dynamic range and large number of lights. Tokyo, with its massive quantities of illuminated billboards and neon signs, is one of the most famous and extreme examples of this type of lighting situation. This talk covers the techniques used by Pixar Animation Studios to light a stylized version of nighttime Tokyo for the movie Cars 2 – note that the speaker will also present a Studio Talk on a similar topic.
  • “Megamind” – Lighting Metro City at Night – this task covers a similar challenge as the previous one, but with a distinct set of solutions from a different company (DreamWorks Animation) for a different film (Megamind).
  • Deferred Shading Techniques Using Frostbite in Need for Speed The Run – this talk will cover the tile-based deferred lighting architecture used in the Frostbite 2 engine, with emphasis on the PS3 implementation as used in the Electronic Arts game Need for Speed The Run (the talk was originally intended to cover the XBox 360 and Battlefield 3 as well, but has been refocused – the removed material will be covered in more depth in this course). It makes for an interesting combination with the previous two talks since it will show how the current state-of-the-art in game technology solves a similar problem as film (albeit at smaller scale) in real time.

Fur and Feathers

The three CG feature animation talks in this session cover fur and feather techniques which are too computationally costly to be feasible for most real-time applications today. They also don’t seem amenable to “animation baking” precomputation approaches since the resulting data would most likely be too heavy. However, these techniques should be able to run in real-time on future hardware platforms, making these talks of interest to forward-looking real-time researchers:

  • Quill: Birds of a Feather Tool – this talk describes a specialized pipeline developed by Animal Logic to procedurally model, animate and simulate feathers while avoiding intersections and rendering at various levels of detail.
  • Dynamic, Penetration-Free Feathers in “Rango” – somewhat similar to the previous talk, but focusing more narrowly on interpenetration avoidance and from the perspective of a different company (Industrial Light and Magic).
  • Accurate Contact Resolution for Interpolated Hairs – another ILM / Rango talk, but focusing on a different problem – handling collision between hairs and other geometry. The solution needed to be very fast and cheap since it was intended for use on interpolated hairs (it is common in CG feature animation and VFX to fully simulate a relatively small number of “guide hairs” and then interpolate a much larger number of cheap “interpolated hairs” between them).

Mixed Grill

This session contains two film talks, one game talk, and one academic research talk; all four are relevant:

  • The Power of Atomic Assets: An Automated Approach to Pipeline on “Legend of the Guardians: The Owls of Ga’Hoole” – games and movies share the challenge of structuring a production pipeline (software tools as well as workflow practices) to handle large numbers of assets. This talk will describe the system used at Animal Logic to handle the assets for the film Legend of the Guardians: The Owls of Ga’Hoole.
  • Animation Workflow in Killzone 3: A Fast Facial Retargeting System for Game Characters – handling facial motion capture data is tricky, especially retargeting to (possibly multiple) in-game models. This talk describes a technique used by Guerrilla Games to animate a large number of different faces for the extensive cut-scenes in the game Killzone 3.
  • Adaptive Importance Sampling for Multi-Ray Gathering – importance sampling (basically sampling a function more densely in areas that are estimated to have higher impact on the result) has recently become a key technology for production rendering. There was a whole SIGGRAPH course about it last year, and Pixar has added native support to the latest version of Renderman. Importance sampling is typically thought of as a ray tracing technique, but it is also important for image-based lighting (IBL) sources such as environment maps. Importance-sampled IBL is currently useful for game light baking tools, and is likely to be done in real-time on future platforms. This talk describes importance sampling improvements developed at Rhythm & Hues. Talk materials including an abstract and movie are available here.
  • High-Resolution Relightable Buildings From Photographs – efficient digitization of real-world scenes and objects is useful for both film and game development. Tools such as Crazybump are widely used in the game industry to infer relightable surface details from photographs, but do not always work as well as could be hoped. This research talk looks like it could offer some improvements in this area, making it of wide interest. The authors are from The University of Manchester, Loughborough University, and Dolby Canada.

From the Ground Up

All three CG feature animation talks in this session are relevant for game developers:

  • We Built This City: Big City Design and Implementation in “Kung Fu Panda 2” – games and movies sometimes contain large urban environments, which are very difficult to construct within reasonable time and staffing constraints. This talk will detail how DreamWorks Animation solved this problem for the film Kung Fu Panda 2.
  • The Visual Style of “Legend of the Guardians: The Owls of Ga’Hoole” – finding a good visual style is another difficult task shared by film and games; my feeling is that films tend to have more established processes for look and style development. This talk will detail the visual style established by Animal Logic for the movie Legend of the Guardians: The Owls of Ga’Hoole. I saw a similar presentation at FMX 2011, and it was full of interesting and relevant content.
  • Clouds in the Skies of Rio – in most games and films, clouds are off in the distance and can be handled with straightforward methods. But sometimes the camera needs to get up close and personal with the clouds, which can pose some interesting modeling and rendering challenges. Although cloud rendering techniques used in film can rarely run in real-time on current platforms, the way in which the clouds are art-directed and authored can be of interest. This talk discusses how Blue Sky Studios handled cloud authoring and rendering for the movie Rio.

Directing Destruction

Mixing simulation with manual control to create large, physically-believable and art-directed effects is a tough challenge which VFX and CG feature animation professionals have been focusing on for some time. The techniques used rarely lend themselves to real-time computation on current hardware. However, in many cases these effects can be pre-computed, and on future hardware they are likely to run in real time (perhaps with some reduction in scale). The four talks in this session discuss various case studies of this type:

  • End of Line: Character Destruction in “Tron: Legacy” – this talk discusses the tools developed by Digital Domain for the character destruction effects in Tron: Legacy.
  • Kali: High-Quality FEM Destruction in Zack Snyder’s “Sucker Punch” – in this talk, The Moving Picture Company discusses a finite-element simulation toolkit developed in partnership with Pixelux, with examples of its use in the film Sucker Punch. It is interesting to note that the tool is based on the same Digital Molecular Matter technology used in the games Star Wars: The Force Unleashed and Star Wars: The Force Unleashed II.
  • Directing Hair Motion on “Tangled” – this talk discusses the system developed by Walt Disney Animation Studios to animate the main character’s hair (almost a character in itself) in the movie Tangled.
  • Choreographing Destruction: Art Directing a Dam Break in “Tangled” – another Tangled talk from Walt Disney Animation, this one describes the way in which a complex water and rigid body simulation was art-directed for the “dam break” sequence.

Crowds

Scenes with large crowds are another differentiating factor between film and games. Sufficiently large crowds pose authoring and rendering challenges even for film; the solutions to these may be of interest to game developers working with smaller real-time crowds on next generation platforms. The three talks in this session discuss crowd case studies from three CG animated feature films:

  • Crowds on “Cars 2” – this talk discusses how Pixar Animation Studios improved their production pipeline to enable higher productivity when managing assets and controlling agent behaviors for Cars 2 crowd shots.
  • Synthesizing Complexity for Characters and Landscapes in “Rio” – this talk covers the systems used at Blue Sky Studios to procedurally generate large varied crowds of people and flocks of birds for the movie Rio, as well as the renderer enhancements done to efficiently ray-trace the resulting massive geometric detail.
  • Staging Carnival: Ray Tracing Crowds in “Rio” – another Blue Sky Studios talk about Rio, this time focusing on a specific case study (the carnival crowds).

There are eight more Talk sessions with relevant content, which I will cover in a subsequent blog post.

SIGGRAPH 2011 Talks – Part 1

After summarizing the course program, I’ll continue going over content in other SIGGRAPH 2011 programs which may be of interest to game developers or real-time rendering researchers. Next up is the Talks program; this post will also be a multi-parter, since there is a LOT of content to cover in this program.

Update July 16, 2011: Added link to “Coherent Out-of-Core Point-Based Global Illumination” EGSR 2011 paper.

Talks (which used to be called “Sketches” a few years ago) are short presentations – 20 minutes long (rare “long talks” are 40 minutes). Talks are a lot “leaner” than Technical Papers, which require detailed analysis, comprehensive citations of previous work, and comparisons to competing techniques. For this reason, SIGGRAPH Talks tend to be the venue of choice for industry practitioners, who often have limited time to spend on writing publications.

The SIGGRAPH Talk program has historically been dominated by talks from the fields VFX and CG feature animation – many of these contain relevant information for game developers, but the game industry itself has been under-represented. SIGGRAPH 2011 has a record number of game industry Talks, but there is still a lot to go before we match the film people (I hope to get a lot closer in future years!)

I will now summarize relevant Talks regardless of speaker affiliation. Since Talks are scheduled in sessions of four I will organize my summary along the same lines, skipping sessions without any relevant Talks and using the session order from the SIGGRAPH 2011 Talks page.

Pushing Production Data

This session contains four film talks, all of potential interest:

  • Coherent Out-of-Core Point-Based Global Illumination describes a system used at DreamWorks Animation for computing global illumination and ambient occlusion- the details may be of interest to game developers working on “baking” precomputation systems. There is also an EGSR 2011 paper by the authors on this topic.
  • Similarly, the information in Destroying Metro City: An Artist-Friendly and Efficient Demolition Pipeline for “Megamind” (also from DreamWorks Animation) could be relevant for precomputation of destroyed and fractured versions of game assets.
  • The efficient digital acquisition of real-world props is a problem facing games as well as film; PhotoSpace: A Vision-Based Approach for Digiziting Props describes an interesting system used at Weta Digital for this.
  • Games and film development are not “one size fits all” – individual games and films often require specific assets which can benefit from specialized authoring and rendering systems. Artistic Rendering of Feathers for Animated Films (yet another DreamWorks Animation talk) describes such a system.

Facing Hairy Production Problems

This session contains one game talk and three relevant film talks:

  • Extensive use of geometry instancing is important in both games and film to save on asset authoring time and memory. The talk Kami Geometry Instancer: Putting the “Smurfy” in Smurf Village describes an instancing pipeline developed by Sony Pictures Imageworks which allows for distinct deformation of individual instances.
  • The talk Making Faces: Eve Online’s New Portrait Rendering describes the impressive new avatar portrait system developed by CCP Games for the Eve Online space MMO.
  • SpeedFur: A GPU-Based Procedural Hair and Fur Modeling System describes a hair modeling system (developed by Fido). The procedural authoring system and the GPU-accelerated preview mode both appear relevant for hair and fur in games.
  • The talk GPU Fluids in Production: A Compiler Approach to Parallelism details a specialized CPU/GPU parallel compiler for fluid simulation developed by Double Negative Visual Effects. New parallelism approaches are always interesting, and I suspect fluid simulation will be a major differentiating feature for games on the next generation of platforms.

Eye on the Road

Two of the talks in this session (one by game developers, and one by academic researchers) appear relevant:

  • MotorStorm Apocalypse: Creating Urban Off-Road Racing – this talk by Evolution Studios presents rendering and tools advances which enabled adding large-scale dynamic events to MotorStorm Apocalypse, the latest entry in the MotorStorm racing game franchise (also showcased in The Sandbox).
  • Facial scanning has been a topic of heightened interest in the game industry since its highly publicized use in L.A. Noire. The talk R&D Facial Cartography: Interactive High-Resolution Scan Correspondence (by Paul Debevec’s graphics lab at the USC Institute for Creative Technologies) covers some interesting advances in this area.

Tiles and Textures and Faces Oh My!

This session contains talks by game developers, CG feature animation professionals, and hardware vendors; all four are relevant:

  • Artist-guided procedural authoring systems can help with the asset creation issues faced by both game and film production. The talk Procedural Mosaic Arrangement In “Rio” details Blue Sky Studios‘ art-directable procedural pipeline for sidewalk and street tile mosaics.
  • Programmable tessellation is one of the primary features of DirectX11, but authoring content for it can be challenging. NVIDIA‘s talk Generating Displacement From Normal Map for Use in 3D Games describes one possible solution to this problem.
  • The film industry has found the open-source Ptex (per-face texture mapping) technology developed by Walt Disney Animation extremely useful for getting rid of UV layout issues. The talk Per-Face Texture Mapping for Real-Time Rendering (jointly presented by an NVIDIA developer technology engineer and the first author of the original Ptex paper) presents a real-time implementation of this technology.
  • Skinning is one of the most fundamental technologies in game rendering and has not changed much in the last twenty years. The talk Spherical Skinning With Dual Quaternions and QTangents presents some skinning improvements achieved by Crytek during development of the Crysis franchise.

Let There Be Light

This session contains three CG feature animation case studies, all with interesting information for game  developers:

  • I find Rango to be an intriguing case of live-action and VFX methods being used by Industrial Light and Magic to make a CG animated feature with a unique photorealistic style. The talk “Rango”: A Case of Lighting and Compositing a CG-Animated Feature in an FX-Oriented Facility appears to have some interesting information on the methods used by the lighting and compositing artists.
  • Ocean Mission on “Cars 2” – this talk describes how Pixar addressed several multi-disciplinary challenges involving the ocean in the opening sequence of Cars 2.
  • Hair is another area where games lag noticeably behind film, so learning about film methods is valuable. The talk Untangling Hair Rendering at Disney details technology, tools and workflow advances adopted by Walt Disney Animation for the film Tangled.

Out Of Core

The four talks in this session are all relevant for game developers or real-time rendering researchers:

  • Google Body: 3D Human Anatomy in the Browser – this talk describes how Google used the WebGL API in an innovative way to create an impressive in-browser application. Browser-based games are a rapidly increasing market, making APIs such as WebGL important to many game developers. The Google Body application is also showcased in The Sandbox.
  • As a possible future alternative to the traditional rendering pipeline, ray tracing sparse voxel octrees has attracted some interesting research work, including GigaVoxels (several publications on which can be found on Cyril Crassin’s webpage). The talk Interactive Indirect Illumination Using Voxel Cone Tracing: An Insight builds on the GigaVoxels work to compute indirect lighting and ambient occlusion for complex scenes in real time.  A preview was presented as an I3D 2011 poster, and various materials relating to the SIGGRAPH Talk can be found on a dedicated web page.
  • Rendering the Interactive Dynamic Natural World of the Game: From Dust – in this talk, Ubisoft Montpellier discusses the simulation and rendering techniques used for the dynamic world of the game From Dust.
  • Out-of-Core GPU Ray Tracing of Complex Scenes –  this talk covers the CentiLeo GPU ray tracer (based on Kirill Garanzha’s PhD research at the Keldysh Institute of Applied Mathematics), which can render models composed of several hundred million polygons in real-time. More information on CentiLeo (including a video of it in action) can be found here.

SIGGRAPH 2011 Courses – Part 4

This is the fourth and last of a series of posts on the SIGGRAPH 2011 course program; each describing several of the courses that will be presented at the conference. Links to previous posts in the series: Part 1, Part 2, Part 3.

Cinematography: The Visuals & the Story

Cinematography is the art of communicating a story via camera and lighting choices. As a game developer, I find it fascinating for several reasons. One is that it is such a well-established art; over a century old, and based upon many of the principles of still older arts such as photography and painting. The maturity of the field can be seen in the way that the practice is codified – there are clear roles in film production, everyone knows what a director of photography, first camera assistant, etc. do from film to film. The field’s most prominent professional organization, the American Society of Cinematographers, was created in 1919 and its magazine American Cinematographer has been discussing tips and tricks of the trade since 1920. It’s an interesting contrast to game development – an extremely young discipline where most of the fundamentals are still being figured out.

Another reason I’m interested in cinematography is its relevance to game visuals; the primary problem (turning three-dimensional scenes into compelling screen images that carry a narrative) is the same. While issues of camera placement may be less relevant for some game genres (e.g. first person shooters), lighting, color, and scene composition considerations are relevant for almost any game.

The third reason is that most game developers (including myself until fairly recently) are either unaware of this vast wealth of relevant knowledge, or are indifferent to it. CG animated features have made great strides by incorporating principles of live-action cinematography; not many videogames are doing the same.

For these reasons, I’m glad to see a SIGGRAPH course covering cinematographic fundamentals. The speaker, Bruce Block, has had a lot of experience working in film (albeit not in the camera department) and has written a well-regarded and influential book (The Visual Story) about how visual structure is used to present story in film.

Storytelling With Color

The way in which color choices are applied throughout production is another area where I think games have a lot to learn from film. In film, the colors of almost every costume and piece of set decoration are part of a conscious choice to drive the narrative, establish a mood, or support character development. This was brought home to me last year when I visited Pixar and saw the “color script” for Toy Story 3 – a wall covered by postcard sized sketches, one for each shot in the film. Each rough sketch blocked out the shapes and colors in the shot, and when they were put together, you could clearly see how the carefully chosen color palette helped drive the story and emotional tone of the movie. Two of the Toy Story 3 color script images can be seen here, and the entire color script for a different Pixar film (Up) can be seen here.

This course will cover exactly these kinds of color choices, and will be presented by Kathy Altieri (Production Designer, Dreamworks Animation) and Dave Walvoord (Digital FX Supervisor, Dreamworks Animation). Kathy’s career in TV and film spans three decades; after working on backgrounds for multiple animated TV shows as well as classic animated feature films such as The Little Mermaid, Aladdin and The Lion King, she moved to Dreamworks, where she was Art Director on The Prince of Egypt and Production Designer on Spirit: Stallion of the Cimarron, Over the Hedge, and How to Train Your Dragon. Dave has 15 years of experience in VFX and CG feature animation, working at Blue Sky on films such as Fight Club and Ice Age before joining Dreamworks, where he was CG Supervisor on Shark Tale, Over the Hedge and Kung Fu Panda and Digital FX Supervisor on Kung Fu Panda 2.

Applying Color Theory to Digital Media and Visualization

This is another course on color, but focused more on theory and on non-entertainment applications, such as scientific visualization. The course is presented by Theresa-Marie Rhyne, a prominent visualization expert with three decades of experience as a researcher, educator, designer and artist. She has taught several courses on this topic, most recently at IEEE Visualization 2010 (a video of her slides from that talk is available online), and has a blog on the topic as well. Interestingly, she has already put up a video of the slides from the upcoming SIGGRAPH 2011 course.

Liquid Simulation With Mesh-Based Surface Tracking

While most fluid rendering and simulation work over the years has focused on level-set approaches, an important recent trend in this area consists of tracking a mesh over the surface of the fluid, thus enabling more detailed surfaces. This advanced course (prior knowledge of fluid simulation techniques is assumed) covers the current state of the art in this important area, and is presented by Chris Wojtan (Assistant Professor, Institute of Science and Technology Austria), Matthias Müller-Fischer (Research Lead, NVIDIA), and Tyson Brochu (PhD Candidate, University of British Columbia). Having performed much of the leading research in this area, the speakers are uniquely qualified to speak about the topic.

Although complex fluid simulations are used extensively in film VFX and animated features, they are currently too computationally expensive for games. As game platforms become more powerful, I believe this will change. There are already some impressive real-time demonstrations, for example the Raging Rapids Rides demo which will be shown at the SIGGRAPH 2011 Real-Time Live! program and the SIGGRAPH 2011 paper Real-Time Eulerian Water Simulation Using a Restricted Tall-Cell Grid, which has an impressive video here (check out the lighthouse part at the end). Note that one of the course speakers (Matthias) was involved with both of these examples.

Introduction to Modern OpenGL Programming

Dave Shreiner (co-author of the famous OpenGL Red Book, which has a new edition coming out this November) has taught an introductory course on OpenGL (almost) every year at SIGGRAPH since 1998. He was accompanied by various co-lecturers – most often Edward Angel – and evolved the course content to keep up with changes in the OpenGL API. The only two years Dave didn’t do this course were 2003 (when he  did a “performance OpenGL” course instead of an introductory course – in some other years he did both), and 2010 (when there was no OpenGL course for the first time since 1992). Dave and Edward are back this year with an updated course, which should be of great interest to beginning graphics programmers, OpenGL programmers who have been using older versions of the API, or experienced graphics programmers with plans to start working with OpenGL.

An course on this topic couldn’t hope for better speakers. Besides his highly influential books and courses, Dave Shreiner also had an important role in the development of OpenGL (and its spinoff OpenGL ES) in the 15 years he worked at SGI (where OpenGL evolved from the proprietary IRIS GL library) and since, as Technical Advisory Panel Chair for The Khronos Group and Director of Graphics Technology at ARM. Edward Angel has taught at the University of New Mexico for over 30 years; he holds the positions of Professor Emeritus of Computer Science and Founding Director of the Art, Research, Technology and Science Laboratory (ARTS Lab). Edward has written several influential books on computer graphics, most notably the OpenGL Primer and Interactive Computer Graphics.

Modeling 3D Urban Spaces Using Procedural and Simulation-Based Techniques

As scene complexity increases, the amount of artist work (and thus the expense) required to create these scenes increases commensurately, a problem that afflicts both film and game production. Audience expectations are always increasing, and budgets cannot keep pace – more efficient ways to model large, complex scenes must be found. While most natural scenes are very complex, techniques for procedurally modeling them have been used in production for some time; see off-the-shelf products such as Vue and Speedtree, or in-house tools such as were used to model trees in Tangled. Urban scenes can be as complex, but tools for modeling them procedurally have been less widely used (the creation of 1930’s New York City in the 2005 remake of King Kong is a notable example – more details here). The last few years have seen a flourishing of research into procedural modeling of buildings and cities, and the fruits of this research are finding their way into production. This course will cover procedural as well as image-based and simulation-based modeling techniques, and is targeted at applications including computer games, movies, architecture, and urban planning.

This course will have five speakers, each extremely well-suited to teach a course of this type: Peter Wonka (Associate Professor, Arizona State University), Daniel Aliaga (Associate Professor, Purdue University), Carlos Vanegas (Research Assistant, Purdue University), Pascal Mueller (Founder & CEO, Procedural Inc.), and Michael Frederickson (Technical Director, Pixar).The first four speakers have, between them, performed or led most of the notable academic research in this area. Pascal Mueller has founded a company (Procedural Inc.) based on his research, which sells a commercial software package (CityEngine) for procedural urban modeling (Peter Wonka serves on Procedural’s advisory board). The last speaker, Michael Frederickson, was responsible for modeling the 40,000 buildings in the city of London as seen in the movie Cars 2, and it appears that this will be the topic of his presentation. Presumably (given his participation in this course, and also given the magnitude of the task) this was done procedurally. While watching Cars 2 (story issues aside) I was struck by the visuals in the film – the urban environments, especially London, in particular; I look forward to finding out how this was done.

3D Spatial Interaction: Applications for Art, Design, and Science

This course will be taught by Joseph LaViola (Assistant Professor, University of Central Florida) and Daniel Keefe (Assistant Professor, University of Minnesota). Last year at SIGGRAPH 2010, Prof. LaViola taught (with Richard Marks, the primary researcher behind Sony’s EyeToy and Move peripherals), a course about spatial interaction with videogame motion controllers. This year’s course, judging by its abstract, appears to be focused on applications other than videogames. These novel interfaces surely have interesting applications in many fields, and this course will be of interest to many. Both Prof. LaViola and Prof. Keefe have done important research in this field, and Prof. LaViola has authored a book on the subject.

Build Your Own Glasses-Free 3D Display

Last year, two of this course’s speakers, Douglas Lanman (Postdoctoral Associate, MIT Media Lab) and Matthew Hirsch (PhD Student, MIT Media Lab), taught a SIGGRAPH 2010 course called Build Your Own 3D Display. This year, they are joined by Gregg Favalora (Principal, Optics for Hire) and are focusing the course specifically on autostereoscopic displays, which do not require glasses. Douglas and Matthew have done important research into this area – most notably this SIGGRAPH Asia 2010 paper, and have taught versions of this course not only at SIGGRAPH 2010 (as mentioned), but also at SIGGRAPH Asia 2010. Gregg has 15 years experience as an entrepreneur, inventor and researcher and has authored multiple key publications and patents relating to autostereoscopic display design.

Advances in New Interfaces for Musical Expression

This course is presented by Michael Lyons (Professor, Ritsumeikan University) and Sidney Fels (Professor, University of British Columbia) who in 2001 organized the first workshop on New Interfaces for Musical Expression (NIME). This workshop, dedicated to scientific research on the development of new technologies for musical expression and artistic performance, has since blossomed into a full-fledged international conference. This course will summarize the content of the last several years of NIME, including both theory and practice, and presenting several case studies.

SIGGRAPH 2011 Courses – Part 3

Third post in a series about the SIGGRAPH 2011 courses (Part 1 and Part 2).

Stereoscopy From XY to Z

Although there had been fits and starts since the mid-1950’s, stereoscopic (“3D”) feature films really kicked off in 2009. This was primarily due to the convergence of two factors: CG animation and Avatar. CG animated features are easier for stereoscopy since they don’t require bulky and expensive stereoscopic cameras; Disney Animation had been doing all their CG animated films in 3D since Chicken Little (2005), joined in 2009 by Pixar and Dreamworks with Up and Monsters vs. Aliens respectively. Avatar‘s huge box-office success in the same year goosed studio executives into mandating stereoscopic releases of VFX-heavy live-action films as well. Although somewhat controversial among experts (mostly due to brightness issues), the increase in stereoscopic theatrical content resulted in a push for compatible televisions, Blu-ray players and game consoles at home. Around the same time, the PC side of the game market also saw an increase in stereoscopic support (mostly led by NVIDIA). By 2011, stereoscopy had become a dominant trend in computer graphics, with implications in areas ranging from videogame user interfaces to feature shot editing. Many of these implications are as yet not commonly understood, which increases the need for courses like this one.

The course is presented by Samuel Gateau (3D Software Engineer, NVIDIA) and Robert Neuman (Stereoscopic Supervisor, Walt Disney Animation Studios) who have presented earlier versions of it at SIGGRAPH Asia 2010 and at FMX 2011. This time Samuel and Robert are joined by Marc Salvati (R&D Software Engineer, OLM Digital). It appears that the course will cover both the technical and aesthetic aspects of stereoscopy, for games as well as film. The speaker lineup is well-suited for this scope; Samuel has helped many game developers integrate stereoscopy into their titles, Marc has worked on tools for converting Japanese animation to 3D (the topic of a separate talk this year), and Robert has supervised stereoscopy for several films at Disney Animation, most recently working on the stereoscopic conversions of classic hand-animated Disney films (also the topic of a separate talk).

Production Volume Rendering (Part I and Part II)

The SIGGRAPH 2010 course Volumetric Methods in Visual Effects was a great look into an important and little-understood area of production rendering, so I was happy to see that an updated and expanded version will be presented this year. Both courses are organized by Magnus Wrenninge (Senior Technical Director, Sony Pictures Imageworks) and Nafees bin Zafar (Senior Production Engineer, Dreamworks Animation). Magnus has been working on visual effects software at Imageworks (and previously at Digital Domain) for almost a decade, in later years mostly focusing on volumetric modeling and rendering. He is currently in the process of writing a book on the topic, which will include source code for a fully functional volume renderer. Nafees has worked on simulation and volumetrics tools (at Dreamworks and previously at Digital Domain) for over ten years, winning a Scientific and Engineering Academy Award in 2007. The course is divided into two parts. Part I (“Fundamentals”) is presented by Magnus and Nafees, and is an overview of the fundamental technologies behind computer generated volumetric elements such as clouds, fire, and whitewater. At 90 minutes, Part I is an expansion of the first hour of last year’s course, and includes an introduction to the subject, followed by in-depth explanations of how volumetric effects are modeled and rendered.

Over three hours long, Part II (“Systems”) is a greatly expanded version of the second half of last year’s course. It will focus on specific VFX volumetric technologies, tools, workflows and case studies. Nafees and Magnus will each give a presentation on the systems used at their respective studios. In addition, there will be presentations by speakers from the following companies:

  • Double Negative: presented by Ollie Harding (R&D Programmer) and Gavin Graham (CG Supervisor). I wasn’t able to find out much about Ollie; Gavin has worked at Double Negative for over ten years, during which he did various shot based effects work, assisted R&D in battle testing in-house volumetric rendering and fluid simulation tools, and CG-supervised several effects heavy feature films.
  • Rhythm & Hues: Jerry Tessendorf (former Principal Graphics Scientist) and Victor Grant (FX Supervisor). Jerry Tessendorf is currently Director of the Digital Production Arts Program at Clemson University, following an extensive and highly influential body of work in simulation and VFX production spanning three decades. Notable achievements include a Technical Achievement Academy Award and a series of hugely influential SIGGRAPH presentations on ocean wave simulation (the latest version of the notes and slides are well worth reading). Victor Grant has worked on VFX for many feature films over the past decade, specializing in volumetric modeling and rendering as well as particle and fluid simulation.
  • Side Effects Software: Andrew Clinton (Software Developer). Side Effects’ Houdini software is used extensively in the VFX industry; Andrew is responsible for the research and development of Houdini’s Mantra renderer. He has worked on improvements to the volumetric rendering engine, a micropolygon-like approach to volume rendering, a physically-based renderer, and a port of the renderer to the Cell processor.
  • Weta Digital: Antoine Bouthors (R&D Engineer): Weta is a new addition over last year’s course. Before joining Weta, Antoine worked on research including realistic rendering of clouds in real-time.

Volumetric effects are one of the areas where the gap between game and film visuals is biggest; as game platforms become more powerful, game developers will start focusing R&D efforts on this topic. In parallel, VFX houses will develop ways to rapidly previsualize feature film volumetric effects, to allow for better artist control and directability. I predict that in the next few years these converging lines of research will “meet in the middle”, enabling unprecedented scale and quality of volumetric effects in games. Attending this course is a good way for game developers and real-time rendering researchers to get a head start on this process.

Compiler Techniques for Rendering

This course is a bit more specialized than the others I’ve discussed. It is focused on the uses of advanced compiler technology for rendering, covering five different projects which are on the cutting edge of this technology trend. Most of the techniques use LLVM and/or involve the compilation of shading languages. The course is comprised of five talks:

  • Intro to LLVM, and Native RSL Shader Compilation, presented by Mark Leone (Researcher, Weta Digital):  Before joining Weta, Mark led development at Intel of a new shading language for native rendering on Larrabee, and previously worked on the RenderMan shading system at Pixar. His talk will begin with an overview of LLVM (useful background for several of the other talks), and continue with a description of the implementation of the PostHaste system, which analyzes RenderMan shaders and automatically identifies kernels within them that can be compiled for x86 native execution using LLVM.
  • Open Shading Language, presented by Larry Gritz (Principal Engineer, Sony Pictures Imageworks): Larry Gritz is the chief architect of the Imageworks in-house renderer, as well as the designer and open source administrator of the Open Shading Language (OSL) and OpenImageIO projects. Other rendering systems for which he’s had a leading architectural role include NVIDIA’s Gelato GPU-accelerated film-quality renderer, Exluna’s Entropy renderer, Pixar’s PhotoRealistic RenderMan, and BMRT. Larry’s talk describes the design and implementation of OSL, which was developed by Imageworks for use in its in-house renderer, and released as open source software. OSL is specifically designed for advanced rendering algorithms and has a number of key technologies whose implementations will be discussed: radiance closures, light path expressions, automatic differentiation, and LLVM just-in-time compilation.
  • AnySL: Efficient Portable Multi-Language Shading, presented by Philipp Slusallek (Scientific Director, German Research Center for Artificial Intelligence – DFKI): Philipp leads the “Agents and Simulated Reality” research lab at DFKI. He is also a full professor for Computer Graphics at Saarland University, where he holds the additional positions of Director of Research at the Intel Visual Computing Institute, principal investigator at the Cluster of Excellence in Multimodal Computing and Interaction, and founding speaker of the Competence Center for Computer Science. Philipp’s talk will describe the AnySL system, which compiles shaders from different languages into a common, portable representation, using a generic shading library. AnySL also incorporates an embedded compiler based on LLVM that instantiates this generic code in terms of the renderers native types and operations. AnySL also supports programmable kernels for tasks other than shading – such as animation, geometry processing, tesselation, and image processing.
  • Automatic Shader Bounding for Efficient Global Illumination, presented by Bruce Walter (Research Associate, Cornell University Program of Computer Graphics): Bruce’s research focuses on expanding the capabilities of physically-based rendering and global illumination algorithms with respect to robustness, scalability, and generality. He has published many related research papers at SIGGRAPH and elsewhere, including my favorite BRDF paper. This talk will discuss research that was published in a SIGGRAPH Asia 2009 paper, which uses a compiler to automatically generate interval versions of programmable shaders. These interval versions can be used to provide the high level query functions needed by physically-based rendering systems (such as ray tracers).
  • Compilation for GPU Accelerated Ray Tracing in OptiX, presented by Steven Parker (Director of HPC & Computational Graphics, NVIDIA): Steven also leads the OptiX ray tracing team; prior to joining NVIDIA he developed a long history of research and publication in interactive ray tracing and scientific computing. Steven’s talk will discuss the domain-specific just-in-time compiler that lies at the core of the NVIDIA OptiX ray tracing engine. This compiler generates custom ray tracing kernels by combining user-supplied programs for ray generation, material shading, object intersection, and scene traversal. The CUDA C compiler is used for writing shader programs with function overloading, templates, and full pointer support while a just-in-time compiler provides ray tracing specific optimizations. Steven will discuss some of the compiler analysis techniques that enables a natural programming model, supports a rich object model designed for compact scene representation, provides dynamic dispatch for complex scenes, and continuations for recursion while executing efficiently on a CUDA-enabled GPU.

Another project which seems to fit in with this “compilers for rendering” trend (though not covered in this course) is Microsoft’s recent work to enable symbolic differentiation in HLSL.

SIGGRAPH 2011 Courses – Part 2

This is a continuation of the series of posts started here.

Character Rigging, Deformations, and Simulations in Film and Game Production

Character animation is one of those areas where film and game production have intriguing similarities as well as differences; especially in the ways that the character meshes deform in response to animation and simulation. This course includes three talks, each covering a different application domain: games, visual effects, and feature animation. These talks will be presented by:

  • David Coleman (Senior CG Supervisor, Electronic Arts Canada). David (who has worked at Electronic Arts for 15 years and is currently responsible for the central team that provides rigging for many of EA’s sports titles) will present the games portion of the course. He will discuss character rigging, deformations and simulations in game production, emphasizing the technical restrictions imposed due to the real-time and interactive nature of games. This talk will also cover some strategies for setting up procedural secondary rigging systems in Maya, MotionBuilder and at run-time in games.
  • Tim McLaughlin (Department Head and Associate Professor, Department of Visualization at Texas A&M University). Tim (who had 13 years of experience at ILM – most of it on digital creatures – before heading the Texas A&M Department of Visualization) will discuss rigging for visual effects. He will cover the unique requirements brought on by integration with live action but also the affordances offered by the limited range of scope of performance requirements relative to feature animation and games. Tim will discuss rigging modularity, provisions for animator control, non-linear deformations, areas of highest importance for deformations, and the efficient use of muscle systems.
  • Larry Cutler (Supervising Character TD, DreamWorks Animation). Larry (who worked at Dreamworks Animation for 10 years, and at Pixar for four years previously) will be discussing rigging issues for feature animation. Larry’s talk will deal with the impact of character design, modeling, and scalability for thousands of shots on rigging, deformation, and simulation. He will discuss the issues arising from the unique needs of feature animation: accommodating for extreme range of motion, and increased emphasis on art directability and animator control. Larry will also cover hair, cloth, and facial animation systems.

Destruction and Dynamics for Film and Game Production

Another “X for film and games” course, this time focusing on rigid body dynamics and destruction / fracturing methods. The course will cover production aspects such as authoring tools and game engine integration, in addition to the computational and algorithmic aspects. Like the last course, this one will highlight interesting commonalities and differences between film and game practice. There are areas where each can learn from the other: the film techniques can point the way to future methods for games running on more powerful platforms; and the efficient game methods are useful for fast prototyping, previsualization and even speeding up final shots in film.

The course will start with a 30-minute presentation by the course organizer, Erwin Coumans (Principal Physics Engineer, AMD). Erwin has worked on physics in games for over a decade, and is also the main author of the open-source Bullet Physics Library. Although Bullet was originally designed for game use, it has been used on many films as well, including big-budget Hollywood blockbusters such as How to Train Your Dragon, Sherlock Holmes and 2012. Erwin will give an overview of the course, as well as a brief introduction to the basic theory of rigid body dynamics and destruction/fracturing methods. He will also cover collision detection and handling contacts, approximate methods for the modeling of stress and strain, and how to decide when and where to break rigid bodies into several parts. The course will continue with the following talks:

  • Authoring Destruction With the Dynamica Bullet Maya Plugin (15 minutes), by Michael Baker (Faculty, Art Institute of Las Vegas): Michael has worked on Las Vegas casino games, visual effects for various short films and games, and the Bullet Physics Library (in particular the Dynamica Maya plugin which is the primary topic of his talk). Michael will discuss the development and use of Dynamica to support choreographed rigid body behavior such as progressive crumbling of pre-shattered objects, sequential structural failure and timed directional explosions.
  • Destruction and Dynamics Artist Tools for Film (45 minutes), by Nafees Bin Zafar (Senior Production Engineer, Dreamworks Animation) and Mark T. Carlson (Lead Engineer, Dreamworks Animation): Nafees has worked on simulation and volumetrics tools (both at Dreamworks and in his previous job at Digital Domain) for over ten years, winning a Scientific and Engineering Academy Award in 2007. Mark has worked on cloth, fluid and crowd simulation for six years at DNA Productions, Walt Disney Animation and Dreamworks Animation. This talk will cover 3rd party software integration in the movie pipeline, building artist tools with Bullet, and authoring of destruction using Maya and Houdini. Examples from recent Dreamworks Animation movies will showcase the techniques described.
  • Deformable Rigid Bodies and Fragment Clustering for Film (45 minutes), by Brice Criswell (Senior Software Engineer, Industrial Light & Magic): Brice has been developing production related software for 12 years with ILM, and specializes in rigid body and crowd dynamics. Brice’s talk is divided into three presentations. The first discusses a deformable rigid system which efficiently simulates on-impact bending and denting of normally rigid bodies. The second covers a fragment clustering system which allows artists to initialize sets of geometry as a single rigid body, then dynamically break the objects during the progression of the simulation. The third presentation covers the challenges involved in animating, simulating, and deforming the tentacle beard of the Davy Jones character in the Pirates of the Caribbean movies. Each of the talks will detail algorithms as well as production issues, and will include VFX production examples from prominent feature films.
  • Procedurally Generating Fragmented Meshes for Games (15 minutes), by Phil Knight (Lead Programmer, Avalanche Software – a division of Disney Interactive Studios): Phil has 13 years of game development experience, working most recently on Cars 2, Toy Story 3, and Bolt, and previously on the Links and Amped series. His talk will cover a procedural technique for automatically generating fragmented meshes, especially useful for modeling large explosions with lots of fragmentation and debris. Besides detailing the technique itself, Phil will also describe the fragmentation tool (‘Frag’) which implements it, and its use in game production at Disney Interactive Studios.
  • Accelerating Rigid Body Simulation and Fracture Using the GPU (30 minutes), by Takahiro Harada (Researcher, AMD): Takahiro Harada has performed research and development into physics simulation at The University of Tokyo and Havok as well as his current position at AMD (where he focuses on the use of GPU computing for physics simulation). He will present a GPU-based rigid body simulation which can be used to quickly simulate the large numbers of rigid bodies typically created by object destruction. The talk starts with an overview of the simulation and proceeds to the detailed GPU implementation of each stage of the simulation.

PhysBAM: Physically Based Simulation

Similarly to the previous course, this is targeted at physics simulation and has strong ties to film production. However, its structure is very different; instead of covering a variety of production examples, it focuses on one code library – PhysBAM, initially developed by Ronald Fedkiw and continued by him and many others at Stanford. PhysBAM is used by many VFX and feature animation houses including ILM, Disney Animation, and Pixar; large portions were recently released under an open-source license. This course is presented by Craig Schroeder (PhD Student, Stanford Computer Science Department); it will cover information on the PhysBAM library release: how to obtain the source code, set up the library, and use it to run example smoke and water simulations, as well as descriptions of visualization and rendering tools included in the release. In addition to the PhysBAM library, the course will explain the underlying techniques that make these simulations possible, in particular level set methods such as fast marching, fast sweeping, and the particle level set method. It will also address the important aspects of a fluid simulation, including advection, viscosity, and projection.

There are 12 courses left to cover; I’ll do so over my next few blog posts.