CIC traditionally includes a strong course program, with a two-day course on fundamentals (a DVD of this course presented by Dr. Hunt can be purchased online) and a series of short courses on more specialized topics. Since I attended the fundamentals course last year, this year I only went to short courses. This blog post will detail three of these courses, with the others covered by a future post.
Color Pipelines for Computer Animated Features
The animated feature pipeline has many steps, some of which are color-critical (underlined) and some which aren’t: Story, Art, Layout, Animation, Shading, Lighting, Mastering, and Exhibition. The people working on the underlined stages are the ones with color-critical monitors on their desks. Rod’s talk went through the color-critical stages of the pipeline, discussing related topics on the way.
In this stage people look at reference photos, establish color palettes, and do look development. Accurate color is important. Often, general studies are done on how exteriors, characters, etc. might look. This is mostly done in Photoshop on a Mac.
Art is the first stage where people make color-critical images. In general, all images made in animated feature production exist for one of two reasons – for looking at directly, or to be used for making more images (e.g., textures). The requirements for image processing will vary depending on which group they belong to. During the Art stage the images generated are intended for viewing.
Images for viewing can be quantized as low as 8 bits per channel, and even (carefully) compressed. Pixel values tend to be encoded to the display device (output referred). In the absence of a color management system, the encoding just maps to frame buffer values, which feed into a display response curve. However, it is better to tag the image with an assumed display device (ICC tagging to a target like sRGB; other metadata attributes can be stored with the image as well). It’s important to minimize color operations done on such images, since they have already been quantized and have no latitude for processing. These images contain low dynamic range (LDR) data.
During the Art phase, images are typically displayed on RGB additive displays calibrated to specific reference targets. Display reference targets include specifications for properties such as the chromaticity coordinates of the RGB primaries and white point, the display response curve, the display peak white luminance and the contrast ratio or black level.
Shading and antialiasing operations need to occur on linear light values – values that are proportional to physical light intensity. Other operations that require linear values include resizing, alpha compositing, and filtering. Rendered buffers are written out as HDR values and later used to generate the final image.
Lighting is sometimes done with special light preview software, and sometimes using other methods such as “light soloing”. “Light soloing” is a common practice where a buffer is written out for the contribution of each light in the scene (all other lights are set to black) and then the lighters can use compositing software to vary individual light colors and intensities and combine the results.
For images such as these “solo light buffers” which are used to assemble viewable images, Pixar uses the OpenEXR format. This format stores linear scene values with a logarithmic distribution of numbers – each channel is a 16-bit half-float. The range of possible values is -65505.0 to +65505.0. The positive range can be thought of as 32 stops (powers of 2) of data, with 1024 steps in each of the stops.
After images are generated, they need to be viewed. This is done in various review spaces: monitors (CRT or calibrated LCD) on people’s desks, as well as various special rooms (review rooms, screening rooms, grading suites) where images are typically shown on DLP projectors. In review rooms the projector is usually hooked up directly to a workstation, while screening rooms use special digital cinema playback systems or “dailies” software. Pixar try not to have any monitors in the screening rooms – screening rooms are dark and the monitors are intended (and calibrated) for brighter rooms.
The mastering process includes in-house color grading. This covers two kinds of operations: shot-to-shot corrections and per-master operations. An example of a shot-to-shot correction: in “Cars” in one of the shots the grass ended up being a slightly different color than in other shots in the sequence – instead of re-rendering the shot, it was graded to make the grass look more similar to the other shots. In contrast, per-master operations are done to make the film fit a specific presentation format.
Mastering for film: film has a different gamut than digital cinema projection. Neither is strictly larger – each has colors the other can’t handle. Digital is good for bright, saturated colors, especially primary colors – red, green, and blue. Film is good for dark, saturated colors, especially secondary colors – cyan, magenta, and yellow. Pixar doesn’t generate any film gamut colors that are outside the digital projection gamut, so they just need to worry about the opposite case – mapping colors from outside the film gamut so they fit inside it, and previewing the results during grading. Mapping into the film gamut is complex. Pixar try to move colors that are already in-gamut as little as possible (the ones near the gamut border do need to move a little to “make room” for the remapped colors). For the out-of-gamut colors, first Pixar tried a simple approach – moving to the closest point in the gamut boundary. However, this method doesn’t preserve hue. An example of the resulting problems: in the “Cars” night scene where Lightning McQueen and Mater go tractor-tipping, the closest-point gamut mapping made Lightning McQueen’s eyes go from blue (due to the night-time lighting) to pink, which was unacceptable. Pixar figured out a proprietary method which involves moving along color axes. This sometimes changes the chroma or lightness quite a bit, but tends to preserve hue and is more predictable for the colorist to tweak if needed. For film mastering Pixar project the content in the P3 color space (originally designed for digital projection), but with a warmer white point more typical of analog film projection.
Mastering for digital cinema: color grading for digital cinema is done in a tweaked version of the P3 color space – instead of using the standard P3 white point (which is quite greenish) they use D65, which is the white point people have been using on their monitors while creating the content. Finally a Digital Cinema Distribution Master (DCDM) is created – this stores colors in XYZ space, encoded at 12 bits per channel with a gamma of 2.6.
Mastering for HD (Blu-ray and HDTV broadcast): color grading for HD is done in the standard Rec.709 color space. The Rec.709 green and red primaries are much less saturated than the P3 ones; the blue primary has similar saturation to the P3 blue but is darker. The HD master is stored in RGB, quantized to 10 bits. Rod talked about the method Pixar use for dithering while quantization – it’s an interesting method that might be relevant for games as well. The naïve approach would be to round to the closest quantized value. This is the same as adding 0.5 and rounding down (truncating). Instead of adding 0.5, Pixar add a random number distributed uniformly between 0 and 1. This gives the same result on average, but dithers away a lot of the banding that would otherwise result.
Exhibition for digital cinema: this uses a Digital Cinema Package (DCP) in which each frame is compressed using JPEG2000. The compression is capped to 250 megabits per second – this limit was set during the early days of digital cinema, and any “extra features” such as stereo 3D, 4K resolution, etc. still have to fit under the same cap.
Exhibition for HD (Blu-ray, HDTV broadcast): the 10-bit RGB master is converted to YCbCr, chroma subsampled (4:2:2) and further quantized to 8 bits. This is all done with careful dithering, just like the initial 10 bit quantization. MPEG4 AVC compression is used for Blu-ray, with a 28-30 megabits per second average bit rate, 34 megabits per second peak.
Disney’s Digital Color Workflow – Featuring “Tangled”
The second part of the course was presented by Stefan Luka, a senior color science engineer at Walt Disney Animation Studios. Disney uses various display technologies, including CRT, LCD and DLP projectors. Each display has a gamut that defines the range of colors it can show. Disney previously used CRT displays, which have excellent color reproduction but are unstable over time and have a limited gamut. They now consider LCD color reproduction to finally be good enough to replace CRTs (several in the audience disputed this), and primarily use HP Dreamcolor LCD monitors. These are very stable, can support wide gamuts (due to their RGB LED backlights), and include programmable color processing.
Disney considered using Rec.709 calibration for the working displays, but the artists really wanted P3-calibrated displays, mostly to see better reds. Rec 709′s red primary is a bit orangish – P3′s red primary is very pure, it’s essentially on the spectral locus. Disney calibrate the displays with P3 primaries, a D65 white point, and a 2.2 gamma (which Stefan says matches the CRTs used at that time). The viewing environment in the artist’s rooms is not fully controlled, but the lighting is typically dim.
Disney calibrate their displays by mounting them in a box lined with black felt in front of a spectroradiometer. They measure the primaries and ramps on each channel to build lookup tables. For software Disney use a custom-tweaked version of a tool from HP called “Ookala” (the original is available on SourceForge). When calibrating they make sure to let the monitor warm up first, since LEDs are temperature dependent. The HP DreamColor has a temperature sensor which can be queried electronically, so this is easy to verify before starting calibration. Disney uses a spectroradiometer for calibration – Stefan said that colorimeters are generally not good enough to calibrate a display like this, though perhaps the latest one from X-Rite (the i1Display Pro) could work. Only people doing color-critical work have DreamColor monitors – Disney couldn’t afford to give them to everyone. People with non-color-critical jobs use cheaper displays.
During “Tangled” production, the texture artists painted display encoded RGB, saved as 16-bit (per channel) TIFF or PSD. They used sRGB encoding (managed via ICC or external metadata/LUT) since it makes the bottom bits go through better than a pure power curve. Textures were converted to linear RGB for rendering. Rendering occurred in linear light space; the resulting images had a soft roll-off applied to the highlights and were written to 16-bit TIFF (if they were saving to OpenEXR – which they plan to do for future movies – they wouldn’t have needed to roll-off the highlights). Compositing inputs and final images were all 16-bit TIFFs.
During post production final frames are conformed and prepared for grading. The basic grade is done for digital cinema, with trim passes for film, stereoscopic, and HD.
The digital cinema grade is done in a reference room with a DLP projector using P3 primaries, D65 white point, 2.2 gamma, and 14 foot-Lamberts reference white. The colorist uses “video” style RGB grading controls, and the result is encoded in 12-bit XYZ space with 2.6 gamma, dithered, and compressed using JPEG2000.
For the film deliverable, Disney adjust the projector white point and view the content through the same film gamut mapping that Pixar uses. They then do a trim pass. White point compensation is also needed; the content was previously viewed at D65 but needs to be adjusted for the native D55 film white point to avoid excessive brightness loss. A careful process needs to be done to bridge the gap between the two white points. At the output, film gamut mapping as well as an inverse film LUT is applied to go from the projector-previewed colors to values suitable for writing to film negative. Finally, Disney review the content at the film lab and call printer lights.
Stereo digital cinema – luminance is reduced to 4.5 foot-Lamberts (in the field there will be a range of stereo luminances, Disney make an assumption here that 4.5 is a reasonable target). They do a trim pass, boosting brightness, contrast, and saturation to compensate for the greatly reduced luminance. The colorist works with one stereo eye at a time (working with stereo glasses constantly would cause horrible headaches). Afterwards the result is reviewed with glasses, output & encoded similarly as the mono digital cinema deliverable.
HD mastering – Disney also use a DLP projector for HD, but view it through a Rec.709 color-space conversion and with reference white set to 100 nits. They do a trim pass (mostly global adjustments needed due to the increase in luminance), output and bake the values into Rec.709 color space. Then Disney compress and review final deliverables on a HD monitor in a correctly set up room with proper backlight etc.
After finishing “Tangled”, Disney wanted to determine whether it was really necessary for production to work in P3; could they instead work in Rec.709 and have the colorist tweak the digital cinema master to the wider P3 gamut? Stefan said that this question depends on the distribution of colors in a given movie, which in turn depends a lot on the art direction. Colors can go out of gamut due to saturation, or due to brightness, or both. Stefan analyzed the pixels that went out of Rec.709 gamut throughout “Tangled”. Most of the out-of-gamut colors were due to brightness – most importantly flesh tones. A few other colors went out of gamut due to saturation: skies, forests, dark burgundy velvet clothing on some of the characters, etc.
Stefan showed four example frames on a DreamColor monitor, comparing images in full P3 with the same images gamut-mapped to Rec.709. Two of the four barely changed. Of the remaining two, one was a forest scene with a cyan fog in the background which shifted to green when gamut-mapped. Another shot, with glowing hair, had colors out of Rec.709 gamut due to both saturation & brightness.
At the end of the day, the artists weren’t doing anything in P3 that couldn’t have been produced at the grading stage, so Stefan doesn’t think doing production in P3 had much of a benefit. P3 was mostly used to boost brightness, so working in 709 space with additional headroom (e.g. OpenEXR) would be good enough.
After “Tangled”, Disney moved from 16-bit TIFFs to OpenEXR, helped by their recent adoption of Nuke (which has fast floating-point compositing – “Tangled” was composited on Shake). They also eliminated the sRGB encoding curve, and now just use a 2.2 gamma without any LUTs. Disney no longer need to do a soft roll off of highlights when rendering since OpenEXR can contain the full highlight detail. They are doing some experiments with HDR tone mapping, especially tweaking the saturation. Disney have also moved to working in Rec.709 instead of P3 for production (for increased compatibility between formats) and are using non-wide-gamut monitors (still HP, but not DreamColor).
In the future, Disney plan to do more color management throughout the pipeline, probably using the open-source OpenColorIO library. They also plan to investigate improvements in gamut mapping, including local contrast preservation (taking account of which colors are placed next to each other spatially, and not collapsing them to the same color when gamut mapping).
Color in High-Dynamic Range Imaging
This course was presented by Greg Ward. Greg is a major figure in the HDR field, having developed various HDR image formats (LogLuv TIFF and JPEG-HDR, as well as the first HDR format, RGBE), the first widely-used HDR rendering system (RADIANCE), and the first commercially available HDR display, as well as various pieces of software relating to HDR (including the Photosphere HDR image builder and browsing program). He’s also done important work on reflectance models, but that’s outside the scope of this course.
HDR Color Space and Representations
Images can be scene-referred (data encodes scene intensities) or output-referred (data encodes display intensities). Since human visual abilities are (pretty much) known, and future display technologies are mostly unknown, then scene-referred images are more useful for long-term archival. Output-referred images are useful in the short term, for a specific class of display technology. Human perceptual abilities can be used to guide color space encoding of scene-referred images.
The human visual system is sensitive to luminance values over a range of about 1:1014, but not in a single image. The human simultaneous range is about 1:10,000. The range of sRGB displays is about 1:100.
The HDR imaging approach is to render or capture floating-point data in a color space that can store the entire perceivable gamut. Post-processing is done in the extended color space, and tone mapping is applied for each specific display. This is the method adopted in the Academy Color Encoding Specification (ACES) used for digital cinema. Manipulation of HDR data is much preferred because then you can adjust exposure and do other types of image manipulation with good results.
HDR imaging isn’t new – black & white color film can hold at least 4 orders of magnitude, and the final print has much less. Much of the talent of photographers like Ansel Adams was darkroom technique – “dodging” and “burning” to bring out the dynamic range of the scene on paper. The digital darkroom provides new challenges and opportunities.
Camera RAW is not HDR; the number of bits available is insufficient to encode HDR data. A comparison of several formats which are capable of encoding HDR follows (using various metrics, including error on an “acid test” image covering the entire visible gamut over a 1:108 dynamic range).
- Radiance RBGE & XYZE: a simple format (three 8-bit mantissas and one 8-bit shared exponent) with open source libraries. Supports lossless (RLE) compression (20% average compression ratio). However, does not cover visible gamut, the large dynamic range comes at the expense of accuracy, and the color quantization is not perceptually uniform. RGBE had visible error on the “acid test” image, XYZE performed much better but still had some barely perceptible error.
- IEEE 96-bit TIFF (IEEE 32-bit float for each channel) is the most accurate representation, but the files are enormous (even with compression – 32-bit IEEE floats don’t compress very well).
- 16-bit per channel TIFF (RGB48) is supported by Photoshop and the TIFF libraries including libTIFF. 16 bits each of gamma-compressed R G and B; LZW lossless compression is available. However, does not cover the visible gamut, and most applications interpret the maximum as “white”, turning it into a high-precision LDR format rather than an HDR format.
- SGI 24-bit LogLuv TIFF Codec: implemented in libTIFF. 10- bit log luminance, and a 14-bit lookup into a ‘rasterized human gamut’ in CIE (u’,v’) space. It just covers the visible gamut and range, but the dynamic range doesn’t leave headroom for processing and there is no compression support. Within its dynamic range limitations, it had barely perceptible errors on the “acid test” image (but failed completely outside these limits).
- SGI 32-bit LogLuv TIFF Codec: also in libTIFF. A sign bit, 16-bit log luminance, and 8 bits each for CIE (u’,v’). Supports lossless (RLE) compression (30% average compression). It had barely perceptible errors on the “acid test” image.
- ILM OpenEXR Format: 16-bit float per primary (sign bit, 5-bit exponent, 10-bit mantissa). Supports alpha and multichannel images, as well as several lossless compression options (2:1 typical compression – compressed sizes are competitive with other HDR formats). Has a full-featured open-source library as well as massive support by tools and GPU hardware. The only reasonably-sized format (i.e. excluding 96-bit TIFF) which could represent the entire “acid test” image with no visible error. However, it is relatively slow to read and write. Combined with CTL (Color Transformation Language – a similar concept to ICC, but designed for HDR images), OpenEXR is the foundation of the Academy of Motion Picture Arts & Sciences’ IIF (Image Interchange Framework).
- Dolby’s JPEG-HDR (one of Greg’s projects): backwards-compatible JPEG extension for HDR. A tone-mapped sRGB image is stored for use by naïve (non-HDR-aware) applications; the (monochrome) ratio between the tone-mapped luminance and the original HDR scene luminance is stored in a subband. JPEG-HDR is very compact: about 1/10 the size of the other formats. However, it only supports lossy encoding (so repeated I/O will degrade the image) and has an expensive three-pass writing process. Dolby will soon release an improved version of JPEG-HDR on a trial basis; the current version is supported by a few applications, including Photoshop (through a plugin – not natively) and Photosphere (which will be detailed later in the course).
HDR Capture and Photosphere
Standard digital cameras capture about 2 orders of magnitude in sRGB space. Using multiple exposures enables building up HDR images, as long as the scene and camera are static. In the future, HDR imaging will be built directly into camera hardware, allowing for HDR capture with some amount of motion.
Multi-exposure merge works by using a spatially-variant weighting function that depends on where the values sit within each exposure. The camera’s response function needs to be recovered as well.
The Photosphere application (available online) implements the various algorithms discussed in this section. Exposures need to be aligned – Photosphere does this by generating median threshold bitmaps (MTBs) which are constant across exposures (unlike edge maps). MTBs are generated based on a grayscale image pyramid version of the original image, alignments are propagated up the pyramid. Rotational as well as translational alignments are supported. This technique was published by Greg in a 2003 paper in the Journal of Graphics Tools.
Photosphere also automatically removes “ghosts” (caused by objects which moved between exposures) and reconstructs an estimate of the point-spread function (PSF) for glare removal.
Greg then gave a demo of new Windows version of PhotoSphere, including its HDR image browsing and cataloging abilities. It’s merging capabilities also include the unique option of outputting absolute HDR values for all pixels, if the user inputs an absolute value for a single patch (this would typically be a grey card measured by a separate device). This only needs to be done once per camera.
Take an HDR (bracketed exposure) image of a mirrored ball, use for lighting. Use a background plate to fill in the “pinched” stuff in the back. Render synthetic objects with the lighting and composite into the real scene, with optional addition of shadows. Greg’s description of HDR lighting capture is a bit out of date – most VFX houses no longer use mirrored balls for this (they still use them for reference), instead panoramic cameras or DSLRs with a nodal mount are typically used.
Tone-Mapping and Display
A renderer is like an ideal camera. Tone mapping is medium-specific and goal-specific. The user needs to consider display gamut, dynamic range, and surround. What do we wish to simulate – cinematic camera and film, or human visual abilities and disabilities? Possible goals include colorimetric reproduction, matching visibility, or optimizing contrast & color sensitivity.
Histogram tone-mapping is a technique that generates a histogram of log luminance for the scene, and creates a curve that redistributes luminance to fit the output range.
Greg discussed various other tone mapping methods. He mentioned a SIGGRAPH 2005 paper that used an HDR display to compare many different tone-mapping operators.
HDR Display Technologies
- Silicon Light Machines Grating Light Valve (GLV) – amazing dynamic range, widest gamut, still in development. Promising for digital cinema.
- Dolby Professional Reference Monitor PRM-4200. It’s a LED-based 42″ production unit based on technology that Greg worked on. He says this is extended dynamic range, but not true HDR (it goes up to 600 cd/m2).
- SIM2 Solar Series HDR display: this is also based on the (licensed) Dolby tech- Greg says this is closer to what Dolby originally had in mind. It’s a 47” display with a 2,206 LED backlight that goes up to 4000 cd/m2.
As an interesting example, Greg also discussed an HDR transparency (slide) viewer that he developed back in 1995 to evaluate tone mapping operators. It looks similar to a ViewMaster but uses much brighter lamps (50 Watts for each eye, necessitating a cooling fan and heat-absorbing glass) and two transparency layers – a black-and-white (blurry) “scaling” layer as well as a color (sharp) “detail” layer. Together these layers yield 1:10,000 contrast. The principles used are similar to other dual-modulator displays; the different resolution of the two layers avoids alignment problems. Sharp high-contrast edges work well despite the blurry scaling layer – scattering in the eye masks the artifacts that would otherwise result.
New displays based on RGB LED backlights have the potential to achieve not just high dynamic range but greatly expanded gamut – the new LEDs are spectrally pure and the LCD filters can select between them easily, resulting in very saturated primaries.
HDR Imaging in Cameras, Displays and Human Vision
The course was presented by Prof. Alessandro Rizzi from the Department of Information Science and Communication at the University of Milan. With John McCann, he co-authored the book “The Art and Science of HDR Imaging” on which this course is based.
The imaging pipeline starts with scene radiances generated from the illumination and objects. These radiances go through a lens, a sensor in the image plane, and sensor image processing to generate a captured image. This image goes through media processing before being shown on a print or display, to generate display radiances. These go through the eye’s lens and intraocular medium, form an image on the retina, which is then processed by the vision system’s image processing to form the final reproduction appearance. Prof. Rizzi went over HDR issues relating to various stages in the pipeline.
The dynamic range issue relates to the scene radiances. Is it useful to define HDR based on a specific threshold number for the captured scene dynamic range? No. Prof. Rizzi defines HDR as “a rendition of a scene with greater dynamic range than the reproduction media”. In the case of prints, this is this is almost always the case since print media has an extremely low dynamic range. Renaissance painters were the first to successfully do HDR renditions – example paintings were shown and compared to similar photographs. The paintings were able to capture a much higher dynamic range while still appearing natural.
A table was shown of example light levels, each listed with luminance in cd/m2. Note that these values are all for the case of direct observation, e.g. “sun” refers to the brightness of the sun when looking at it directly (not recommended!) as opposed to looking at a surface illuminated by the sun (that is a separate entry).
- Xenon short arc: 200,000 – 5,000,000,000
- Sun: 1,600,000,000
- Metal halide lamp: 10,000,000 – 60,000,000
- Incandescent lamp: 20,000,000 – 26,000,000
- Compact fluorescent lamp: 20,000 – 70,000
- Fluorescent lamp: 5,000 – 30,000
- Sunlit clouds: 10,000
- Candle: 7,500
- Blue sky: 5,000
- Preferred values for indoor lighting: 50 – 500
- White paper in sun: 10,000
- White paper in 500 lux illumination (typical office lighting): 100
- White paper in 5 lux illumination (very dim lighting, similar to candle-light): 1
The next issue, range limits and quantization, refers to the “captured image” stage of the imaging pipeline. A common misconception is that the problem involves squeezing the entire range of intensities which the human visual system can handle, from starlight at 10-6 cd/m2 to a flashbulb at 108 cd/m2, into the 1-100 cd/m2 range of a typical display. The fact is that the 10-6 — 108 cd/m2 range is only obtainable with isolated stimuli – humans can’t perceive a range like that in a single image. Another common misconception is to think of precision and range as being linked; e.g. 8-bit framebuffers imply a 1:255 contrast. Prof. Rizzi used a “salami” metaphor – the size of the salami represents the dynamic range, and the number of slices represents the quantization. Range and precision are orthogonal.
In most cases, the scene has a larger dynamic range than the sensor does. So with non-HDR image acquisition you have to give up some dynamic range in the highlights, the shadows, or both. The “HDR idea” is to bracket multiple acquisitions with different exposures to obtain an HDR image, and then “shrink” during tone mapping. But how? Tone mapping can be general, or can take account of a specific rendering intent. Naively “squeezing” all the detail into the final image leads to the kind of unnatural “black velvet painting”-looking “HDR” images commonly found on the web.
As an example, the response of film emulsions to light can be mapped via a density-exposure curve, commonly called a Hurter-Driffield or “H&D” curve. These curves map negative density vs. log exposure. They typically show an s-shape with a straight-line section in the middle where density is proportional to log exposure, with a “toe” on the underexposed part and a “shoulder” on the overexposed part. In photography, exposure time should be adjusted so densities lie on the straight-line portion of the curve. With a single exposure, this is not possible for the entire scene – you can’t get both shadow detail and highlight detail, so in practice only midtones are captured with full detail.
History of HDR Imaging
Before the Chiaroscuro technique was introduced, it was hard to convey brightness in painting. Chiaroscuro (the use of strong contrasts between bright and dark regions) allowed artists to convey the impression of very high scene dynamic ranges despite the very low dynamic range of the actual paintings.
HDR photography dates back to the 1850s; a notable example being the photograph “Fading Away” by H.P.Robinson, which combined five exposures. In the early 20th century, C. E. K. Mees (director of research at Kodak) worked on implementing a desirable tone reproduction curve in film. Mees showed a two-negative photograph in his 1920 book as an example of desirable scene reproduction, and worked to achieve similar results with single-negative prints. Under Mees’ direction, the Kodak Research Laboratory found that an s-shaped curve produced pleasing image reproductions, and implemented it photochemically.
Ansel Adams developed the zone system around 1940 to codify a method for photographers to expose their images in such a way as to take maximum advantage of the negative and print film tone reproduction curves. Soon after, in 1941, L. A. Jones and H. R. Condit published an important study measuring the dynamic range of various real-world scenes. The range was between 27:1 and 750:1, with 160:1 being average. They also found that flare is a more important limit on camera dynamic range than the film response.
The Retinex theory of vision developed around 1967 from the observation that luminance ratios between adjacent patches are the same in the sun and the shade. While absolute luminances don’t always correspond to lightness appearance (due to spatial factors), the ratio of luminances at an edge do correspond strongly to the ratio in lightness appearance. Retinex processing starts with ratios of apparent lightness at all edges in the image and propagates these to find a global solution for the apparent lightness of all the pixels in the image. In the 1980s this research led to a prototype “Retinex camera” which was actually a slide developing device. Full-resolution digital electronics was not feasible, so a low-resolution (64×64) CCD was used to generate a “correction mask” which modulated a low-contrast photographic negative during development. This produced a final rendering of the image which was consistent with visual appearance. The intent was to incorporate this research in a Polaroid instant camera but this product never saw the light of day.
Measuring the Dynamic Range
The sensor’s dynamic range is limited but slowly getting better – Prof. Rizzi briefly went over some recent research into HDR sensor architectures.
Given limited digital sensor dynamic range, multiple exposures are needed to capture an HDR image. This can be done via sequential exposure change, or by using multiple image detectors at once.
There have been various methods developed for composing the exposures. Before Paul Debevec’s 1997 paper “Recovering High Dynamic Range Radiance Maps from Photographs”, the emphasis was on generating pleasing pictures. From 1997 on, research focused primarily on accurately measuring scene radiance values. Combined with recent work on HDR displays, this holds the potential of accurate scene reproduction.
However, veiling glare is a physical limit on HDR image acquisition and display. At acquisition time, glare is composed of various scattered light in the camera – air-glass reflections at the various lens elements, camera wall reflections, sensor surface reflections, etc. The effect of glare on the lighter regions of the image is small, but darker regions are affected much more strongly, which limits the overall contrast (dynamic range).
Prof. Rizzi described an experiment which measured the degree to which glare limits HDR acquisition, for both digital and film cameras. A test target was assembled out of Kodak Print Scale step-wedges (circles divided into 10 wedges which transmit different amounts of light, ranging from 4% to 82%) and neutral density filters to create a test target with almost 19,000:1 dynamic range. This target was photographed against different surrounds to vary the amount of glare.
In moderate-glare scenes, glare reduced the dynamic range at the sensor or film image plane to less than 1,000:1; in high-glare scenes, to less than 100:1. This limited the range that could be measured via multiple digital exposures (negative film has more dynamic range – about 10,000:1 – than the camera glare limit, so in the case of film multiple exposures were pointless).
While camera glare limits the amount of scene dynamic range that can be captured, glare in the eye limits the amount of display dynamic range which is useful to have.
Experiments were also done with observers estimating the brightness of the various sectors on the test target. There was a high degree of agreement between the observers. The perceived brightness was strongly affected by spatial factors; the brightness differences between the segments of each circle were perceived to be very large, and the differences between the individual circles were perceived to be very small. Prof. Rizzi claimed that a global tone scale cannot correctly render appearance, since spatial factors predominate.
Spatial factors also required designing a new target, so that glare could be separated from neural contrast effects. For this target, both single-layer and double-layer projected transparencies were used, allowing them to vary the dynamic range from about 500:1 to about 250,000:1 while keeping glare and surround constant.
For low-glare images (average luminance = 8% of maximum luminance), the observers could detect appearance changes over a dynamic range of a little under 1000:1. For high-glare images (average luminance = 50% max luminance), this decreased to about 200:1. Two extreme cases were also tested: with a white surround (extreme glare) the usable dynamic range was about 100:1 and with black surround (almost no glare at all) it increased to 100,000:1. The black surround case (which is not representative of the vast majority of real images) was the only one in which the high-dynamic range image had a significant advantage, and even there the visible difference only affected the shadow region – the bottom 30% of perceived brightnesses. These results indicate that dramatically increasing display dynamic range has minor effects on the perceived image; glare inside the eye limits the effect.
Separating Glare and Contrast
Glare inside the eye reduces the contrast of the image on the retina, but neural contrast increases the contrast of the visual signal going to the brain. These two effects tend to act in opposition (for example, brightening the surround of an image will increase both effects), but they vary differently with distance and do not cancel out exactly.
It is possible to estimate the retinal image based on the CIE Glare Spread Function (GSF). When doing so for the images in the experiment above, the high-glare target (where observers could identify changes over a dynamic range of 200:1) formed an image on the retina with a dynamic range of about 100:1. With white surround (usable dynamic range of 100:1) the retinal image had a dynamic range of about 25:1 and with black surround (usable dynamic range of 100,000:1) the retinal image had a dynamic range of about 3000:1. It seems that neural contrast partially compensates for the intra-ocular glare; both effects are scene dependent.
Scene Content Controls Appearance
The appearance of a pixel cannot be predicted from its intensity values – no global tone mapping operator can mimic human vision. An image dependent, local operator is needed. The human visual system performs local range compression. It is important to choose a rendering intent – reproduce the original scene radiances, scene reflectances, scene appearance, a pleasing image, etc. If the desire is to predict appearance then Retinex processing does a pretty good job in many cases.
Color in HDR
Two different data sets can be used to describe color: CMF (color matching functions – low-level sensor data) or UCS (uniform color space – high-level perceptual information).
CMF are used for color matching and metamerism preservation. They are linear transforms of cone sensitivities modified by pre-retinal absorptions. They have no spatial information, and cannot predict appearance.
UCS – for example CIEL*a*b*. Lightness (L*) is a cube root of luminance, which compresses the visible range. 99% of possible perceived lightness values fall in a 1000:1 region of scene dynamic range. This fits well with visual limitations caused by glare.
There are some discrepancies between data from appearance experiments with observers and measurements of retinal cone response.
First discrepancy: the peaks of the color-matching functions do not line up with the peaks of the cone sensitivity functions. This is addressed by including pre-retinal absorptions, which shift peak sensitivities to longer wavelengths.
Second discrepancy: retinal cones have a logarithmic response to light, but observers report a cube-root response. This is addressed by taking account of intra-ocular glare; it turns out that due to glare, a cube-root variation in light entering the eye turns into a logarithmic variation in light at the retina.
HDR Image Processing
Around 2002-2006, Robert Sobol developed a variant of Retinex which was implemented in a (discontinued) line of Hewlett-Packard cameras; the feature was marketed as “Digital Flash”. This produced very good results and could even predict certain features of well-known perceptual illusions such as “Adelson’s Checkerboard and Tower”, which were commonly thought to be evidence of cognitive effects in lightness perception.
ACE (Automatic Color Equalization) (which Prof. Rizzi worked on) and STRESS (Spatio-Temporal Retinex-inspired Envelope with Stochastic Sampling) are other examples of spatially-aware HDR image processing algorithms. Several examples were shown to demonstrate that spatially-aware (local) algorithms produce superior results to global tone mapping operators.
Prof. Rizzi described an experiment made with a “3D Mondrian” model – a physical scene with differently colored blocks, under different illumination conditions. Various HDR processing algorithms were run on captured images of the scene, and compared with observers estimations of the colors as well as a painter’s rendition (attempting to reproduce the perceptual appearance as closely as possible). The results were interesting – appearance does not appear to correlate specifically to reflectance vs. illumination, but rather to edges vs. gradients. The results appeared to support the goals of Retinex and similar algorithms.
Prof. Rizzi finished the course with some “take home” points:
- HDR works well, because it preserves image information, not because it is more accurate (accurate reproduction of scene luminances is not possible in the general case).
- Dynamic range acquisition is limited by glare, which cannot be removed.
- Our vision system is also limited by glare, which is counteracted to some degree by neural contrast.
- Accurate reproduction of scene radiance is not needed; reproduction of appearance is important and possible without reproducing the original stimulus.
- Appearances are scene-dependent, not pixel-based.
- Edges and gradients generate HDR appearance and color constancy.