Larrabee

Solid information about Intel’s new Larrabee architecture came out a few days ago, the Level of Detail blog has a good set of links. The major news is that Intel’s SIGGRAPH paper is now available for download from ACM’s Digital Library. Unfortunately, not everyone has access to this site’s resources (it costs money to subscribe). My contribution to the cause:

http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf

Thanks to Tom Forsyth for the link.

I’m excited by Larrabee not because of any particular technical feature (though I’m entirely savoring the paper itself, reading two pages a day at lunch), but rather by the fact that it opens up a whole new ecosystem for implementing graphics algorithms. Regardless of whether Larrabee wins or loses in the long-run, it will have a huge effect in increasing our knowledge by helping us explore different hardware and software designs for rendering.

Direct3D 11 Details Part II: Tessellation

Direct3D 11 adds three new pipeline stages, with the goal of enabling efficient tessellation of higher order surfaces. This is the Direct3D 10 pipeline, as shown in “Real-Time Rendering, 3rd Edition”:

Direct3D 10 Pipeline

The color of each stage indicates whether it is fully programmable (green), configurable (yellow) or fixed function (blue). The stages are described more fully in the “Graphics Processing Unit” chapter of the book. Note that the “Geometry Shader” stage is new to Direct3D 10, but the other stages have been in the pipeline for quite a while.

The Direct3D 11 pipeline adds three new stages between the vertex and geometry shader stages (framed in red). Two of the new stages are programmable (the hull and domain shader stages) and one is configurable (the tessellator stage):

Direct3D 11 Pipeline

This pipeline operates on meshes represented as a series of surface patches. Triangle and quad surface patches are primitives in Direct3D 11 (there is also a tessellated line primitive). The shape of each patch is defined by a number of control points. These control points are transformed, skinned and / or morphed one by one in the vertex shader.

The hull shader is called for each patch, using the patch control points from the vertex shader as inputs. The hull shader has two main responsibilities. The first is to (optionally) convert the control points from one representation (basis) to another. for example, it can implement the technique introduced in Loop and Schaefer‘s paper “Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches“. The control points are sent directly to the domain shader, bypassing the tessellator. The hull shader’s second responsibility is to compute appropriate tessellation factors, which are passed to the tessellation stage. This allows for adaptive tessellation, which can be used for continuous view-dependent LOD (level of detail). The tessellation factors are specified per patch edge, and range from 2 to 64. This means that each edge of the patch may be split into at least 2 (and as many as 64) triangle (or quad) edges.

The tessellator is a fixed-function (but highly configurable) stage, which uses the tessellation factors to tessellate (subdivide) the patch into multiple triangle or quad primitives. The tessellator does not have access to the control points – all tessellation decisions are made based on configuration and the tessellation factors passed on from the hull shader. Each vertex resulting from the tessellation is output to the domain shader. Only the patch parametrization coordinates are passed on for each vertex.

The domain shader operates on the patch parametrization coordinates of each vertex separately, although it can also access the transformed control points for the entire patch. The domain shader sends the complete data for the vertex (position, texture coordinates, etc.) to the geometry shader (or the clipping stage if no geometry shader is present). Effectively, it evaluates the surface representation at each vertex. Techniques such as displacement mapping can also be applied by this shader stage.

Although Microsoft gave an example using Catmull-Clark subdivision surfaces, the programmability of the pipeline enables other surface representations to be used. Alternatively, the tessellation stages can be turned off and traditional triangle or quad meshes can be used.

Direct3D 11 Details Part I: Intro

I attended Gamefest 2008 last week. Gamefest (formerly called Meltdown) is a Microsoft-run Windows and Xbox 360 game development conference. This year there were two notable announcements: XNA Community games (discussed in a previous blog post) and the first public disclosure of Direct3D 11.

Direct3D is, of course, the API used by most Windows games, but its importance extends beyond Windows. Direct3D features guide the development of graphics hardware in general, so these features are bound to show up in future consoles, as well as in OpenGL.

The announcement that Direct3D 11 would not be tied to the next version of Windows (as many had feared), and would be available on Windows Vista was very significant to Windows developers, many of whom complained about the tying of Direct3D 10 to Windows Vista. Direct3D 11 will also be available on Direct3D 9, 10, and 10.1 level graphics hardware (although the new features will not be available there, with the exception of some multithreading enhancements).

The fact that the Direct3D 11 API is a strict superset of the 10/10.1 API is also cause for relief among game developers. From Direct3D 9 to 10, the API went through extensive changes. These changes were mostly long-overdue cleanups and improvements, but they left developers supporting two very different APIs if they wanted to support the many customers using Windows XP and also expose the new Direct3D 10 hardware features.

This is the first part of a multi-part post which will summarize the essential facts about Direct3D 11, as known from the Gamefest slides. Eventually, the slides should show up on the XNA Presentations page.

Full disclosure of Direct3D 11 should occur later this year – the November 2008 DirectX SDK release will feature a preview version of the API, including full documentation and code samples.

Community-built games on Xbox 360

Many of our readers are not professional game developers, but do graphics or game programming as a hobby or as part of their academic research. Microsoft’s XNA Game Studio is interesting since it allows free development of Xbox 360 games. To be precise, although the software is free, Xbox 360 development does require a $99/year premium membership – still a bargain compared to the many thousands of dollars required for a professional console development kit. However, the resulting games could only be played by other people with premium memberships – not exactly a mass market.

This week, at Gamefest, Microsoft announced that these “homebrew” games could now be sold to Xbox 360 owners in general. Interestingly, the games will not be selected by Microsoft themselves (although I am sure they will do some gatekeeping) but by the community (similarly to the selection of posts at Digg or Slashdot).

If you are a game or graphics hobbyist who is intrigued by the idea of creating games to sell to almost 12 million Xbox 360 owners, then check out Microsoft’s XNA Creators Club website.

TrueSpace Free, iPhone/iPod engines, Cache misses

So you want to play with a 3D modeler, or want to teach a class using one, but have zero budget. TrueSpace is now free. This is pretty darn wonderful; TrueSpace has been around approximately forever – I once wrote an exporter from the Trispectives modeler to its file format back in 1994 – and has grown in capabilities over the years.

The Torque game engine is now available for making games on the iPhone. The licensing terms are of the “email us and we’ll tell you” type, but the standard Torque engine is ridiculously affordable for indie game developers at $150, including all source, etc. If you spent all your spare money on an iPhone, oolong is a free engine for games on the iPhone/iPod, originated by Wolfgang Engel and Erwin Coumans, along with assets from PowerVR – it even has a physics engine.

There’s an interesting performance post on cache misses from Dave Moore. Dave Eberly told me a related tale recently: “I am the PS3 programmer.  I spent a lot of time trying to write code to avoid branching, to remove load-hit-stores, and to avoid cache misses. For example, our physics programmer decided that if one function in a class is virtual, then make them all virtual.  He did not realize that a look-up in the virtual function table invariably causes a cache miss.  Make a lot of function calls (like physics systems tend to do), and now you have a serious performance problem.  I removed all the unnecessary virtual modifiers and reduced frame time by 5 milliseconds.  When your goal is 30 fps (33 millisecond frame time), 5 ms is significant.”

Amazon discount, SIGGRAPH booth time

The book’s not quite shipping yet, but at this point Amazon has it heavily discounted, 33% off. I’m happy about this, as it makes the book cheaper than the second edition, which wasn’t discounted at all by Amazon until recent years. The weird bit is that this discount was available a few weeks back, then was gone when I checked last weekend. Someone let me know today that it’s back, and I just ordered an extra copy (this discount is higher than my author’s discount at AK Peters). I’ve noticed a strong correlation between the discount’s availability and the humidity in Flagstaff multiplied by the average hourly meteor siting rate in Anchorage. In other words, I have no clue when someone will wake up at Amazon and realize they’re paying more for the book than they’re selling it for (it’s true: my publisher said so).

While I’m thinking of it: Naty and I will be at the AK Peters’ booth at SIGGRAPH from 12:30 to 1:30 pm on Wednesday.

Command buffers, JGT online, workstation cards charts

Vincent Scheib discusses how to implement command buffers (essentially, OpenGL display lists) on DirectX 9 and 10. He notes that DirectX 11 will have display list support in the API, but if you don’t want to wait 4+ years for general adoption of that API, consider Emergent’s method. By having various threads generate command buffers and a single thread executing these, they are able to take advantage of multiple cores. On a dual core they show x1.4 to x1.9 speedup with multiple threads. Best of all, they provide open source for their system, with a very liberal license.

I am happy to see that PDFs of articles in recent issues of the journal of graphics tools are now available online, free to current subscribers.

I’ll be adding this one to the portal and resources page: sets of charts listing the capabilities of professional graphics cards.

Giant displays, better division, NPR, and NVIDIA

In wading through my bookmark collection, there were a few sites that I felt were appropriate for the blog but not the resources pages. Basically, interesting tidbits, but not worth the (semi-)permanence of the website’s other pages.

First, Naty pointed out that NPR is used in the next Prince of Persia. Interesting style, and I look forward to seeing how well it animates. Update: Mikkel Gjøl at Zero Point Software pointed out that, with E3 just having happened, game trailers galore have come out, including an animated trailer for Prince of Persia.

I was trying to find what are the largest (highest resolution) commercial, or at least public, display systems available. Two I found: someone’s flight simulator setup, and the Zenview Command Center Elite. If you know of larger, please say so. Coolest death-star-related display system was easily The Emperor.

Tidbit: Intel division is still slow, but will someday be twice as fast.

There’s a quick little article in Forbes on NVIDIA. You already know 80% of it, but there are some new bits. Huang’s education at a reform school is a classic tale (though Wired’s piece is a little more detailed).

OK, my queue is now cleared!