Tag Archives: Larrabee

Some Actual Larrabee Information

Tom Forsyth, one of the many programmers and engineers on Larrabee, passed on this link to a lecture he gave at Stanford on January 6 for their weekly Computer System Colloquium class. At the beginning he gives a bit about Intel’s view of Larrabee and the effect of “cancellation”, i.e., it’s not cancelled, just the first hardware release is off. He notes the day-to-day work of most Larrabee developers is unaffected. I appreciating him walking through the Intel position, as I haven’t been able to find any hard information (press releases, etc.) on their site. In retrospect, rumor-mill articles like this one (which we passed on earlier, lacking any sound data) appear to have extremely little resemblance to reality.

The rest of his lecture is about Larrabee itself. Early on he talks about the new instructions in Larrabee, something like Abrash’s article but more entertaining. Around minute 37 he gets more into graphics rendering per se. I’ve been listening to it in bits, in the background.

7 things for December 24

Here are 7 for the day:

More on Larrabee

I wrote earlier on Larrabee being delayed. A coworker pointed out this article from Jon Peddie Research, who know (and usually charge) more than I do. It makes a plausible case that cancelling this first version of Larrabee was the correct move by Intel, and that the experience gained is not wasted. JPR argues that the high-performance computing market is also high-margin, so needs fewer sales to be profitable. There are other gains from the project to date – anyway, a worthwhile read. I’ll be interested to see what’s next for Larrabee.

The magic of marketing and price differentials is fascinating to me. Books like The Underground Economist have some entertaining tales of how prices are set. Here’s a marketing story I heard (elsewhere), and it might even be true: HP had two versions of the series 800 workstation in the late 80’s/early 90’s, the only difference being, literally, one bit on a ROM chip. If the bit was set, then HP-UX could not be run on the workstation. Amazingly, the price for this version of the workstation was higher, even though it was seemingly less capable. This version was marketed to hospital administration, which at the time didn’t use HP-UX (so didn’t care); the workstations that could run HP-UX were sold to engineers. HP could honestly say there was a difference between the two workstations, say that one was tailored to hospital admin and the other to engineers, and so justify the price differential. If anyone wants to confirm or deny, great!

Larrabee Chip Delayed/Cancelled

The news for the day is that the current hardware version of Larrabee, Intel’s new graphics processor, for the consumer market has been delayed (or cancelled, depending on what you mean by “cancelled”). Intel is not commenting on possible future Larrabee hardware, so the Larrabee project itself exists. I don’t see an official press release (yet) from Intel. The few solid quotes I’ve seen (in CNET) is:

“Larrabee silicon and software development are behind where we hoped to be at this point in the project,” Intel spokesperson Nick Knupffer said Friday. “As a result, our first Larrabee product will not be launched as a standalone discrete graphics product,” he said.

along with this:

Intel would not give a projected date for the Larrabee software development platform and is only saying “next year.”

The Washington Post gives this semi-quote:

Intel now plans its first Larrabee product to be used as a software development platform for both graphic and high performance computing, Knupffer said.

See more from The Inquirer, CNET, ZDNet, Washington Post, and the Wall Street Journal. Many more versions via Google News.

In my opinion, Intel has a tough row to hoe: catch up in the field of high-performance graphics, when all they’ve had before is the ~$2 chip low-end GMA series. This series probably has a larger market share in terms of units sold than NVIDIA and AMD GPUs combined (basically, any Intel computer without a GPU card has one), but I assume makes pennies per unit and by its nature is limited in a number of ways. Markets like high-performance computing, which make the most sense for Larrabee (since it appears to have the most flexibility vs. NVIDIA or AMD’s GPUs, e.g. it’s programmable in C++), is a tiny piece of the market compared to “I just want DirectX to run as fast as possible”. The people I know on the Larrabee team are highly competent, so I don’t think the problem was there. I’d love to learn what hurdles were encountered in the areas of design, management, algorithms, resources, etc. Even all the architectural choices of Larrabee are not understood in their particulars (though we have some good guesses), since it’s unreleased. Sadly, we’re unlikely to know most of the story; writing “The Soul of An Unreleased Machine” is not an inspiring tale, though perhaps a fascinating one.

Clearing the Queue

I’ve have a goal this week (it should be clear by now) of clearing my queue of stored-up RTR links by my birthday, today! (Hint: I want a pony.) So excuse the excessively-long list o’ links. Next task on my list, update the main RTR page itself.

  • StructureSynth. This looks pretty cool, and I love procedural models (my ancient SPD package was all about this, back in the days when downloading models was oppressively slow). I do wish they just provided an executable – building looks like a pain.
  • That previous link was on Meshula.net, which also blogs about Pixel Bender Fractals. Great stuff, sort of steampunk computer graphics: you must click this link, if no other on this page, and look on in awe.
  • Shapeways has a blog, and it’s not just dull company announcements. I’m glad they find people as pixels as interesting as I do. They also cover exporting Spore characters to Collada files (which is a great addition to Spore) and creating physical models from these.
  • In related news, The Economist has a reasonable summary of some trends in 3D printing. Their Technology Quarterly also has articles on Augmented Reality, 3D displays, and CAPTCHAs, among other topics.
  • This is one more reason the Internet is great: an in-depth article on normal compression techniques, weighing the pros and cons of each. This sort of article would probably not see the light of day in traditional publications, even Game Developer – too long for them, but all the info presented here is worthwhile for a developer making this decision. Aras’ blog has other nice bits such as packing a float into RGBA and SSAO blurring.
  • I need to add a link to the article itself to the object intersection page, but Morgan McGuire recently verified that he found this ray/box algorithm super-fast in SIMD. Code’s downloadable from that page, free version of article is downloadable here. Morgan uses this test in the ray tracer for his cool photon mapping paper at HPG 2009; if nothing else, you should at least see the video.
  • In related news, I am happy to see that AK Peters is beginning to put past journal of graphics tools articles online. At $15 each, the price of an article is quite high for individuals (or at least this individual), but current journal of graphics (gpu, & game) tools subscribers have full access to this archive for free. The mechanism to get access is a little clunky right now: if you’re a subscriber, you need to register with Metapress, then tell AK Peters your userid and they’ll provide you access.
  • Related to this, I hope Google Books conquers the world (or anyone else doing similar work, as long as it isn’t Apple or Amazon or other overcharging closed-box “we’re just protecting the authors, who get 10% or less for a purely digital sale with nil physical cost to us per unit” retailers – rant over, and I do understand there are fixed start-up costs for the retailer/publisher/etc., but really…). Google Books is so darn handy to look for short articles in books at Google’s repository, such as this one giving a clean way to build an orthonormal basis given a vector, from Graphics Tools: The JGT Editors’ Choice.
  • Humus provides a whole slew of new cubemaps he captured, if you’re getting tired of Grace Cathedral.
  • CUDA itself (vs. others) may or may not be a critical technology, but what it shows about the underlying GPU architecture is fascinating.
  • It should be mentioned: August 2009 DirectX SDK is available. Includes the first official release of DirectX 11.
  • This is hilarious, and possibly even useful!
  • I love seeing things like this: build your own multitouch display. Not that I’ll ever do it, but I hope others will.
  • You might be sick of Larrabee news (ship one, already!), but I found Phil Taylor’s article pleasantly hype-free and informative.
  • ATI’s Eyefinity (cute marketing name, I must admit – now I want to use the word everywhere) seems to me to solve a problem that rarely occurs: too much GPU for too few screens. Still, it’s nice to have the option. Eyefinity allows up to six monitors to be driven by a single GPU. I guess Eyefinity is useful when running older flight simulator programs on newer GPUs; otherwise, Eyefinity is pretty irrelevant. Eyefinity, eyefinity, eyefinity. At work I find two displays is plenty, one to run, one to debug. Anyway, the sweet spot for the monitor:GPU ratio is 13:1, as can be seen here:
    Flight Simulator - living the dream
  • There’s an article on instancing animated grass using DX10 on Gamasutra.
  • Humus’ summary of z interpolation is a good summary of the topic. He gives some of the key tricks, e.g., if you’re using floating point, use a near=1.0 and far=0.0 to help preserve precision.
  • Here’s a basic tutorial on different projection methods used in videogames, with lots of visual examples (add “Zaxxon” and it’s complete, for me). The one new tidbit I learnt from it was about reverse perspective, an effect I’ve made myself once every now and then when I screw up a projection matrix.
  • While I’ve been on break (one of the reasons I’ve been posting so much – Autodesk gives wonderful 6 week “sabbaticals”, aka “long vacations”, to U.S. employees every four years you’re there; it’s like being French or Swedish every fourth year), the rest of the company’s been busy: this new sketch application for the iPhone looks pretty cool, at the usual $2.99 “cup of coffee” type price.
  • Caustics can be dangerous. I can attest to this myself; a goofy award Andrew Glassner gave me long ago sat on my windowsill for years (I moved once, as you should discern from the picture), until I noticed what was happening to the base:
    caustics
  • I usually don’t have time to keep up with Slashdot, but SeenOnSlash, the funny bits of SlashDot, is sometimes entertaining. Graphics-related example: AMD’s latest chip.

7 Things for July 19th

Seven more:

  • Michael Abrash has an in-depth article on rasterization on Larrabee. Perhaps a little too in-depth at times; just skim past the assembly instructions. I also found myself asking, “why do that?” – the key is to just keep reading. He tries to make his examples simple and comprehensible, but at the cost of sometimes feeling like they’re oversolving the problem. They aren’t, it’s just that the solution is in fact used in different circumstances in order to be efficient.
  • SIGGRAPH has an interactive rendering event summary page. This page is more for the art production side of things, though; Naty’s coursetalks, and production sessions summaries are more comprehensive and more useful for programmer attendees.
  • NVIDIA has a number of events they’re involved in at SIGGRAPH 2009. Here’s the list.
  • I love this sort of madness: a business-card ray tracer that does depth of field.
  • Accumulated SSAO: the idea of reprojection, of using previous results by finding where they lie on this frame’s view, is one that seems a tad expensive for interactive rendering. It’s hard to know anything about performance and quality from this page, but I thought it was interesting to see.
  • I mentioned Processing in the last post. Another language-related resource for graphics and game programming is pygame, a set of Python modules for writing games. A friend said he found this system to be pretty great, that he could whip up a fairly involved game idea in a few hours.
  • Scribblenauts sounds like the coolest game that will ever come out, period. Even if it’s only 1/10th as good as the previews read, it looks to be pretty darn entertaining.

Connections: Larrabee, Michael Abrash, Intel, Dr. Dobb’s and me

There has been a spate of Larrabee information during the last two weeks.  Two GDC talks (slides near the bottom of this page), a prototype library, and an article by Michael Abrash on the Dr. Dobb’s website.

Dr. Dobb’s Journal has been out of print since February, but for many years it was one of the leading software publications.  When initially published in 1976 (as Dr. Dobb’s Journal of Computer Calisthenics & Orthodontia) it was the first journal focusing on software development for microcomputers.  Michael Abrash wrote many articles for Dr. Dobb’s over the years, including a series on the Quake software renderer in the mid 90’s.  This series made a great impression on me; when it was published I was considering a career change from microprocessor design to graphics programming.  At the time, I was working on Intel’s P55C processor, publicly known as “Pentium with MMX Technology”.  This chip was notable both for being the first X86 processor with a SIMD (single instruction multiple data) instruction set, and for being the last CPU to use the in-order Pentium micro-architecture.

When Michael Abrash wrote the Quake articles game rendering was 100% software, mostly written in assembly language.  Abrash was the uber-game programmer, having worked on DOOM, written the Quake renderer, and published (in addition to his Dr. Dobb’s articles) many influential books about graphics programming, assembly and optimization (the last of which is available online).

Within a few years (around the time I finally made the jump from CPU design to game graphics programming), it seemed to many that graphics hardware and compiler improvements had made software rendering and hand-coded assembly obsolete.  This was mirrored by my own experience; I was hired to my first game industry job on the strength of a software rasterization demo (written mostly in assembler) and by the time the game shipped, it required graphics hardware and contained very little assembly (none written by me).  Abrash started applying his considerable skills to what he saw as the next unsolved hard problem: natural language processing.

But he couldn’t stay away from graphics for long; when Microsoft started working on the XBox console he got involved in its design.  In the early 2000’s, he figured out that there was a market for software renderers after all, mostly due to the mess of caps bits, unorthogonal feature support, and flaky compliance that characterized low-end graphics hardware at the time (Intel was among the greatest offenders; compounding the problem, its graphics chips sold very well so there were a lot of them out there).  With Mike Sartain (another XBox designer), he wrote Pixomatic, a software renderer published by RAD Game Tools (until then mostly known for the Miles sound library, perhaps the most widely-used middleware in the games industry).  Of course, he published another series of articles in Dr. Dobb’s about the experience, where he discussed how he made use of SIMD instruction sets such as MMX and SSE when optimizing Pixomatic.

I found this particularly interesting due to my personal involvement with these instruction sets.  After working on the first MMX hardware implementation I helped define its successor, which was twice as wide (128 bits instead of 64) and added support for floating-point SIMD.  This instruction set was at first called MMX2, then VX, and finally split into two separate instruction sets: SSE and SSE2.  By this time SIMD instruction set extensions were becoming quite popular; AMD had their own version called 3DNow!, and PowerPC had the AltiVec instruction set.  Intel kept on adding new SIMD extensions: SSE3, SSSE3, SSE4.1, SSE4.2, and AVX.

As Abrash details in the Larrabee article, Larrabee got started when he decided to talk to Intel about some ideas for SIMD instructions to accelerate software rasterization.  As a result, Larrabee includes a powerful set of SIMD instructions.  Much wider than previous instruction sets (512 bits instead of 128, or 256 in the case of AVX), Larrabee’s instruction set contains several instructions tailored to software rasterization.  It is also general enough to allow for automatic code vectorization of a wide variety of loops.  Abrash had a key role in the design of the instruction set, bringing software rasterization back into the mainstream.

Besides a good instruction set, Larrabee also needed an efficient hardware design with a large number of cores.  Each of these cores needed to be very efficient in terms of performance-per-Watt and per-transistor.  Since the Larrabee team started out as a skunkworks, they  couldn’t afford to design a brand-new core so they looked at previous Intel cores, and the old in-order Pentium core (almost the same one I used in  the P55C) was the one chosen.

What I find fascinating about this story is that Abrash managed to follow rasterization all the way around the Wheel of Reincarnation.  This term refers to the common process where a piece of computing functionality is first implemented in software, then moves to special-purpose hardware which gradually becomes more general until it rivals a CPU in complexity, at which point the functionality is folded back into software.  It was coined in a 1968 article by T. H, Myer and Ivan Sutherland (the latter is widely considered the father of computer graphics).

Tim Sweeney Interview

Tim Sweeney is a cofounder of Epic Games and lead developer behind the graphics engines for the Unreal series of games. Jon Stokes has a meaty interview with him, up on Ars Technica; go read it!

fourth edition might be C++?Tim talks about how the GPU has become general enough that we will soon be able to get away from rasterization as the only rendering algorithm. Back ten years ago, dealing with an API to do all interactive graphics was limiting. Widening it out with programmable shaders gives more flexibility, but at the cost of complexity of managing the programming environment. Nowadays you’re programming two separate computers that talk to each other. The shift to parallel programming is already a major change in how we need to think about computers, one that hasn’t become a core concept for most of us, yet (myself included; I’m doing my best to wrap my head around Intel’s Threading Building Blocks, for example). Doing such programming in a few different languages is a “feature” we’d all love to see go away.

With Larrabee, CUDA, and compute shaders, the trends of more flexibility continue, though in different flavors. It seems unlikely to me that the pipeline model itself for rendering will fade in popularity any time soon, though rasterization (traditional GPUs) vs. tiling (Larrabee, handhelds) will continue to be a debate. Tim mentions voxel rendering techniques (really, heightfield, in the old games) as something that died once the GPU took over. True. Such techniques are making a return on the GPU even today, via relief mapping and adaptive tessellation. We’re also seeing volume rendering by marching along rays; if an algorithm can be refit to work on a GPU, it will find some use.

So I agree, the increase in flexibility will be all to the good in letting programmers again do much more than render textured opaque triangles via a Z-buffer really fast and most everything else not-so-fast. Frankly, I believe much of the buzz about interactive ray tracing is more an expression of yearning by us graphics programmers that we could actually program again, vs. calling an API. The April Fool’s Day spoof about ray tracing in DirectX 11 fooled a number of people I know, I believe because they wished it were true. Having hacked my fair share of rendering algorithms, I certainly see the appeal.

I think Tim’s a bit overoptimistic on the time frame in which such changes will occur. First, everyone needs to get this future hardware. Sure, NVIDIA points out there are 70 million CUDA-capable graphics cards out there today, but no one is floating CUDA-based programs as alternative interactive renderers at this point (though NVIDIA’s experiments with CUDA ray tracing are wonderful to see). DirectX 9 graphics cards will be around for years to come. As significant, making such techniques part of the normal development toolchain also takes awhile. I think of how long normal (dot-product) bump mapping, introduced around 2001, took to become a feature that was used in games: first most GPUs had to support it, then tools had to generate and manage the maps, then artists had to be trained to use the tools, etc.

When the second edition of our book came out, it was a few hundred pages longer than the first. I held out the hope to Tomas that our third edition would be shorter. My logic was that, with programmable shaders coming to the fore, we wouldn’t have to cover all the little variants that were possible, but rather could just present pure algorithms and not worry about the implementation details.

This came true to some extent. For example, we could cut out chunks of text about extremely specific ways to efficiently compute the Fresnel term, or give examples showing how assembly instructions are packed together in a pixel shader. There was now plenty of space on the GPU for shader instructions, so such detail was nonsensical. It would be like a programming languages book listing all the programs that could be written in the language. We still do have to spend time dealing with the vagaries of the APIs, such as the relatively space-inefficient ways in which triangles are fed through the pipeline (e.g. a “compact” representation of a cube must use 24 separate vertices, when all that is really needed are 8 points and 6 normals).

Counterbalancing such cuts in text, we found we had many more algorithms to write about. With both the increase in abilities in each successive generation of GPUs and APIs, coupled with research into ways to efficiently map algorithms onto new architectures, the book became considerably longer (and certainly heavier, since each illustration’s atoms now each needed 3 bytes instead of 1). So, I’m not holding out much hope for a shorter edition next time around-there’s just so much cool stuff that we can now do, and more yet to come.

Incidentally, we had asked Tim for a pithy quote for our new Hardware chapter. He said he didn’t have anything, but passed on one from Bily Zelsnack. This quote was tempting, but instead we used it in our last chapter: “Pretty soon, computers will be fast,” which I just love for some reason. It may sometimes take 20 seconds to open a file folder on Windows today, but I remain hopeful that someday, someday…

Larrabee

Solid information about Intel’s new Larrabee architecture came out a few days ago, the Level of Detail blog has a good set of links. The major news is that Intel’s SIGGRAPH paper is now available for download from ACM’s Digital Library. Unfortunately, not everyone has access to this site’s resources (it costs money to subscribe). My contribution to the cause:

http://softwarecommunity.intel.com/UserFiles/en-us/File/larrabee_manycore.pdf

Thanks to Tom Forsyth for the link.

I’m excited by Larrabee not because of any particular technical feature (though I’m entirely savoring the paper itself, reading two pages a day at lunch), but rather by the fact that it opens up a whole new ecosystem for implementing graphics algorithms. Regardless of whether Larrabee wins or loses in the long-run, it will have a huge effect in increasing our knowledge by helping us explore different hardware and software designs for rendering.