SIMD | Real-Time Rendering

There’s been some great stuff lately:

Gustavo Oliveira has an article in Gamasutra about writing an efficient cross-platform SIMD vector library and the tradeoffs involved. The last page was of particular interest, as I had wondered how effective the Intel C++ Compiler (ICC) was vs. Microsoft’s. He also provides downloadable source code and in-depth statistics.
NVIDIA has given some information abour Fermi, their next GPU. Warning: their page will automatically start some audio – annoying. You could just skip to the white paper. One big deal about Fermi is its support of doubles, which means it can be used for more science & engineering number-crunching. The Tech Report has a good overview article of other interesting features, and also presents benchmarking results.
Tests of OpenCL, the platform-independent parallel programming standard, have started to appear for AMD and NVIDIA GPUs.
Speaking of NVIDIA, their PhysX engine is getting some attention. The first video clip in this article gives a sense of the sorts of effects it can add. Pretty stuff, but the funny thing about PhysX is that it must accelerate computations that do not actually affect gameplay (i.e. it should not move around any objects in the scene differently than non-PhysX machines). This limits its use to particle systems and other eye candy. Not a diss—heck, most game graphics are about eye candy—but something to keep in mind.
Naty pointed out an article about how increasing the number of megapixels in a camera is just salesmanship and gains no actual benefit. The author later gives more explanation of his argument, which is that diffraction puts a physical limit on the useful size of a pixel for a given camera size.
Sony Pictures Imageworks has released a draft describing their Open Shading Language (OSL). While aimed at high-end rendering for films, it’s interesting to see what is built-in (e.g. deferred ray tracing) and what they consider important. Read the introduction for more information, or the draft itself.
My favorite infographic of the week: Avatar vs. Modern Warfare 2. Ignore the weird chartjunk concentric circles, focus on the numbers. The most amazing stat to me is the $200M advertising budget for MW2.

… and that’s seven; more later.

There has been a spate of Larrabee information during the last two weeks. Two GDC talks (slides near the bottom of this page), a prototype library, and an article by Michael Abrash on the Dr. Dobb’s website.

Dr. Dobb’s Journal has been out of print since February, but for many years it was one of the leading software publications. When initially published in 1976 (as Dr. Dobb’s Journal of Computer Calisthenics & Orthodontia) it was the first journal focusing on software development for microcomputers. Michael Abrash wrote many articles for Dr. Dobb’s over the years, including a series on the Quake software renderer in the mid 90’s. This series made a great impression on me; when it was published I was considering a career change from microprocessor design to graphics programming. At the time, I was working on Intel’s P55C processor, publicly known as “Pentium with MMX Technology”. This chip was notable both for being the first X86 processor with a SIMD (single instruction multiple data) instruction set, and for being the last CPU to use the in-order Pentium micro-architecture.

When Michael Abrash wrote the Quake articles game rendering was 100% software, mostly written in assembly language. Abrash was the uber-game programmer, having worked on DOOM, written the Quake renderer, and published (in addition to his Dr. Dobb’s articles) many influential books about graphics programming, assembly and optimization (the last of which is available online).

Within a few years (around the time I finally made the jump from CPU design to game graphics programming), it seemed to many that graphics hardware and compiler improvements had made software rendering and hand-coded assembly obsolete. This was mirrored by my own experience; I was hired to my first game industry job on the strength of a software rasterization demo (written mostly in assembler) and by the time the game shipped, it required graphics hardware and contained very little assembly (none written by me). Abrash started applying his considerable skills to what he saw as the next unsolved hard problem: natural language processing.

But he couldn’t stay away from graphics for long; when Microsoft started working on the XBox console he got involved in its design. In the early 2000’s, he figured out that there was a market for software renderers after all, mostly due to the mess of caps bits, unorthogonal feature support, and flaky compliance that characterized low-end graphics hardware at the time (Intel was among the greatest offenders; compounding the problem, its graphics chips sold very well so there were a lot of them out there). With Mike Sartain (another XBox designer), he wrote Pixomatic, a software renderer published by RAD Game Tools (until then mostly known for the Miles sound library, perhaps the most widely-used middleware in the games industry). Of course, he published another series of articles in Dr. Dobb’s about the experience, where he discussed how he made use of SIMD instruction sets such as MMX and SSE when optimizing Pixomatic.

I found this particularly interesting due to my personal involvement with these instruction sets. After working on the first MMX hardware implementation I helped define its successor, which was twice as wide (128 bits instead of 64) and added support for floating-point SIMD. This instruction set was at first called MMX2, then VX, and finally split into two separate instruction sets: SSE and SSE2. By this time SIMD instruction set extensions were becoming quite popular; AMD had their own version called 3DNow!, and PowerPC had the AltiVec instruction set. Intel kept on adding new SIMD extensions: SSE3, SSSE3, SSE4.1, SSE4.2, and AVX.

As Abrash details in the Larrabee article, Larrabee got started when he decided to talk to Intel about some ideas for SIMD instructions to accelerate software rasterization. As a result, Larrabee includes a powerful set of SIMD instructions. Much wider than previous instruction sets (512 bits instead of 128, or 256 in the case of AVX), Larrabee’s instruction set contains several instructions tailored to software rasterization. It is also general enough to allow for automatic code vectorization of a wide variety of loops. Abrash had a key role in the design of the instruction set, bringing software rasterization back into the mainstream.

Besides a good instruction set, Larrabee also needed an efficient hardware design with a large number of cores. Each of these cores needed to be very efficient in terms of performance-per-Watt and per-transistor. Since the Larrabee team started out as a skunkworks, they couldn’t afford to design a brand-new core so they looked at previous Intel cores, and the old in-order Pentium core (almost the same one I used in the P55C) was the one chosen.

What I find fascinating about this story is that Abrash managed to follow rasterization all the way around the Wheel of Reincarnation. This term refers to the common process where a piece of computing functionality is first implemented in software, then moves to special-purpose hardware which gradually becomes more general until it rivals a CPU in complexity, at which point the functionality is folded back into software. It was coined in a 1968 article by T. H, Myer and Ivan Sutherland (the latter is widely considered the father of computer graphics).

Real-Time Rendering

Tracking the latest developments in interactive rendering techniques

Tag Archives: SIMD

7 things for January 22

Connections: Larrabee, Michael Abrash, Intel, Dr. Dobb’s and me