_ __ ______ _ __ ' ) ) / ' ) ) /--' __. __ , --/ __ __. _. o ____ _, / / _ , , , _ / \_(_/|_/ (_/_ (_/ / (_(_/|_(__<_/ / <_(_)_ / (_ tymax) || (tymin > tmax) ) // branch 1 and branch 2 return false; if (tymin > tmin) tmin = tymin; if (tymax < tmax) tmax = tymax; tzmin = (parameters[r.sign[2]].z() - r.origin.z()) * r.inv_direction.z(); tzmax = (parameters[1-r.sign[2]].z() - r.origin.z()) * r.inv_direction.z(); if ( (tmin > tzmax) || (tzmin > tmax) ) // branch 3 and branch 4 return false; if (tzmin > tmin) tmin = tzmin; if (tzmax < tmax) tmax = tzmax; // Note: I have changed the end condition here to fit standard practice return ( (tmin <= t1) && (tmax >= t0) ); // branch 5 (fail tmin), 6 (fail tmax), and 7 (success) } You pass in a ray and the beginning and end of the ray's extent (i.e. a line segment, essentially; shadow rays, for example, have a maximum distance they can travel, not extending past the light), and get back whether the ray hit the box. So, looking at that code, can you see any optimizations? I saw three potential minor speedups. But, given that this is inner-loop code, every little optimization helps. First, we want to test for early outs as soon as possible. So the first "if" loop can be split into two and only those variables needed computed for the first, then for the second if needed: tmin = (param_list[r.sign_loc[0]] - r.origin.x()) * r.inv_direction.x(); tymax = (param_list[r.inv_sign_loc[1]] - r.origin.y()) * r.inv_direction.y(); if ( tmin > tymax ) // branch 1 return false; tmax = (param_list[r.inv_sign_loc[0]] - r.origin.x()) * r.inv_direction.x(); tymin = (param_list[r.sign_loc[1]] - r.origin.y()) * r.inv_direction.y(); if ( tymin > tmax ) // branch 2 return false; By doing so, there is a fair chance that the computations for tmax and tymin won't have to be done. I became interested to know which branches were actually taken when a box was missed. The answer turns out to be less than obvious. In the original code I've labelled the branches in the comments, e.g. the test "tmin > tymax" is what is called "branch 1". Similarly, branches 3 and 4 could be split up, having tzmax computed only if branch 3 does not give an early exit. Another optimization is one to avoid a few retrievals and subtractions. The code: tmax = (parameters[1-r.sign[0]].x() - r.origin.x()) * r.inv_direction.x(); does a subtraction, for example, of "1-r.sign[0]". This is information that is entirely based on the ray, so could be computed once for the ray and reused. Also, the "parameter" value then has its x() value retrieved separately. Instead of retrieving "parameter" and its value "x()", it would be more direct to store the parameters as an array of 6 floats and access them directly. So the first two lines became: tmin = (param_list[r.sign_loc[0]] - r.origin.x()) * r.inv_direction.x(); tymax = (param_list[r.inv_sign_loc[1]] - r.origin.y()) * r.inv_direction.y(); with "sign_loc" and "inv_sign_loc" being integer arrays precomputed for the ray. The "param_list" is simply an array of the 6 float values of the box's corners. The rest of the code was similarly changed. At first I simply slopped these extra arrays in the original Ray class, leaving the original "sign" values there. All of these optimizations gave a whopping 2.25% speed up - not much. I decided to see if there was some caching effect of having all the various old and new variables sloppily shoved inside Ray and Box. There was, but not what I expected: suddenly my speed-up went below a 1% increase overall, and was actually a few percent slower compared to the original Williams code when all the rays missed the boxes. After some experimentation I found the problem: caching of some sort. By making the "sign_loc" and "inv_sign_loc" variables into "unsigned char" instead of "int", my optimized code was faster, in fact about 5.5% faster overall. I'm guessing data cache access is what changed. Amazing how these little details matter. The other interesting tidbit was optimization. Visual C++ has two major optimizer settings for compilation (at least that mattered for this code), "Favor Fast Code" and "Favor Small Code". The take-away here: "Favor Small Code" was the way to go. The "Small Code" setting would give code about 35-60 bytes smaller (e.g. 309 bytes vs. 370 bytes for the original Williams code), and would be consistently faster (though not much, say 1%). However, you can't conclude shorter code is also code. When I optimized the code with "sign_loc" and "inv_sign_loc" as integer arrays, the new code was 13 bytes shorter than the code that had just the reordering optimization, but it was slower. I was first using the test harness at http://jgt.akpeters.com/papers/LofstedtAkenineMoller05/, which analyzes a wide range of ray/triangle algorithms. It generates tests by taking two random points on a unit sphere and creating a ray. For test boxes I again grab two random points on the sphere and create a box. This way of generating data has its problems. For example, it's fairly unlikely that a box will be behind a ray. Also, thin or flat boxes get generated by this method. This was fine for my original purposes, as I was just planning on testing the original Williams algorithm against my own improvements. Note: if you use this test framework, make sure to set the "results" array to zero when it is allocated. This is a bug, soon fixed in the code distribution I hope. While I'm on the subject of bugs, if you use Williams' code, change the assert in box.h to be "(min <= max)" instead of "<". This fixes a failure if a box's min == max in a dimension, which is unlikely in practice (most people add and subtract an epsilon from the box's dimensions). Also change the return condition for Box.intersect to what is shown in the code above. This fixes an inconsistency with how Mahovsky's test suite, below, works. What was interesting was the distribution of what branches were taken for a large set of ray/box pairs that did not intersect (everything that intersects always gets to branch 7, of course): 1 2 3 4 5 6 7 33.10% 32.45% 16.62% 17.53% 0.30% 0.00% 0.00% As can be seen, a ray rarely fails because the box is behind it (branch 5). So my two optimizations (splitting branches 1&2 and 3&4) should pay off about half the time under these conditions, whenever the box is missed by the ray. Thinking about different ray directions and how they cause early outs made me try some other variations. There might be some correlation between the ray's primary direction and which slabs to test first. That is, given a ray direction of say (-5,2,3), that ray travels mainly in the X direction, then some in the Z direction more than the Y direction. Call the X direction for this ray the "major axis", the way the ray mostly points. Is it better to test the X and Z slabs and maybe get an early out before trying the Y slab, or better to try the Y and Z slabs first? I tried a number of variants, testing against cubes, which seemed more likely to be seen in actual bounding volume hierarchies than random boxes (though whether that's true is another good question!). I also changed the primitive to a cube of size 0.4 on a side, centered at the origin, since it seems more likely that an arbitrary bounding box in a bounding volume hierarchy would be cubic than entirely random. I also decided to ignore the effect of the box being behind the ray, so the first four branches early outs in each set of tests are scaled up to equal 100%. Here are the results: 1 2 3 4 32% 32% 18% 18% - cube boxes, consistent order for slabs 34% 34% 16% 16% - cube boxes, accessing major and second major axis first 39% 39% 11% 11% - cube boxes, accessing major axis last Testing against the "easy to hit" slab last seems to be somewhat better in this case in getting early outs. This has a certain logic to it: the slab facing the ray is associated with the cube face itself that is most likely to be hit. I'm not sure the cost of the slightly longer code to access the slabs in the order dictated by the ray is worth it. Anyway, it was interesting to explore, though given the unrealistic test conditions I would have preferred gathering data from a real ray tracer rendering a real scene. But that's also real work. So the speedup was no big deal, 5% on my Pentium, but the exercise highlighted for me the effect of data caches, how the "Favor Fast Code" setting appears to slow performance down, and how the order in which slabs are accessed appears to matter a bit. So now the bad news: my optimization didn't do a lot in the grand scheme of things. 5% is useful, but there's another recent paper on the topic: "Fast Ray-Axis Aligned Bounding Box Overlap Tests with Plücker Coordinates," by Jeffrey Mahovsky and Brian Wyvill, JGT 9:1 (2004), pp. 35–46. Code's at http://jgt.akpeters.com/papers/MahovskyWyvill04/, paper at http://pages.cpsc.ucalgary.ca/~blob/ps/jgt04.pdf. Jeff's Plücker Coordinates test is excellent for what it's designed for, testing whether a box is intersected by a ray. The basic version does not return the intersection distance, and is considerably faster. In timing my improved code against it, using Jeff's framework: Plücker boolean test: 80.84 Williams/Haines boolean test: 92.15 If the intersection distance to the box is needed, the Plücker test loses its edge but is still slightly faster: Plücker intersection test: 92.50 Williams/Haines intersection test: 94.18 Jeff's framework generates rays in a biased fashion: three random numbers between -1 and 1 are chosen for the direction, which gives a bias towards rays travelling diagonally. Marta and Tomas' framework also biases: points on the unit sphere are equally distributed inside a circle in the XY plane, instead of there being a higher density of rays nearer the edge of this circle. Pete Shirley gave a method for distribution in RTNv17n1#art5. His solution is for a cosine distribution on the hemisphere, which is essentially what Marta and Tomas' (incorrectly, for this purpose) does, it distributes equally on a circle and is then projected on a hemisphere (the Nusselt analog). What actually is needed is a distribution on a sphere's surface, not cosine weighted. I tried to derive this, failed, and asked the master, Pete Shirley, who writes: If you want stratification though, here is a derivation for computing spherical coordinates. [I don't really want stratification (though it's a bonus), but certainly substituting s and r with random values in the range [0,1] gives a random ray direction. - Eric] p(phi) = 1/(2*pi) // this is the angle around the equator q(theta) = k*sin(theta) The CDF of q is INT_0^theta k*sin(theta') dtheta' which is k*(1 - cos(theta)). Since the CDF should be 1 at theta = pi, we can conclude that k = 1/2. So Q(theta) = (1-cos(theta))/2 So for random seed r, we solve for Q(theta) = r which is cos(theta) = 1-2r So the whole shebang: use stratified seeds r,s in [0,1]^2 phi = 2*pi*s cosT = 1-2*r sinT = sqrt(1-cosT*cosT) x = cos(phi)*sinT y = sin(phi)*sinT z = cosT In other words, you can vary "r" and "s" randomly in the formula above to get an unbiased random ray direction. If you want to perform stratified sampling (i.e. with a guarantee that there will be a more uniform distribution over the sphere), then you can vary "r" and "s" each over smaller intervals. For example, to get 100 rays that have less clumping than a truly random set would provide, pick a random number from [0,0.1), [0.1,0.2), [0.2,0.3), etc. in two loops for "r" and "s" and generate the direction. Pete also writes, "Dave Edwards has some nice notes on MC that explains why the general mechanics above work: http://www.cs.utah.edu/classes/cs7650/, see Monte Carlo notes." Pete notes that this info is also available in his article, "Nonuniform Random Point Sets," Graphics Gems III, p. 80-83, along with the formulations for a number of other commonly-used surfaces and volumes. Before Pete sent this to me, I cheated and used the other (slower, but easy to remember) way to get an unbiased ray direction: generate three random numbers from [-1,1]. If the length of the vector is greater than 1 (or is 0, extremely rare), generate three more random numbers and test again. Else normalize this vector and use it as the direction. The idea here is simple: you're generating random points in space. If they're inside or on the unit sphere, then the direction generated is an unbiased sample of the unit sphere. This trick works for only the sphere, though. I tried this unbiased direction for Mahovsky's test suite (which, by the way, is very easy to use and extend) and it didn't have any appreciable effect, I'm happy to say. Timings were pretty much the same, and I would chalk up any difference to noise. The real conclusion of much of this sort of exploration, to me, is that it's best to use a working ray tracer and some set of real scenes. I have never seen a summary of the average distribution of bounding box dimensions for boxes actually tested, or data about the relative frequency of early-out branching. With this sort of information in hand we could design a random ray and random box generator that would better simulate real-world conditions. Anyone out there have an undergrad looking for a quick project? But wait, there's more. So, armed with the knowledge that Williams' algorithm was one of the best published algorithms out there, I decided to port it to POV-Ray. It's not the absolute fastest, but it's way easier to understand than Mahovsky's Plücker test, so it seemed a good candidate. There was already a slabs-style box intersector in POV-Ray. It's from 1993, ancient code, so all this recent research has to have paid off, right? Looking it up, back on August 17, 1993 I did a bunch of optimizations to the original bounding box test they were using, fixing a bug and making it 25% faster (speeding up overall performance by about 10%). Xander Enzmann put the code into the mainstream. It's old-school code, no clever use of IEEE floating point math. My basic ideas: * Precompute for each ray bit flags that determine whether the ray direction is positive, negative, or zero * Use the nonzero flag for the special "divide by zero" direction case; handle this one optimally. * Use the positive/negative flag to know in advance whether you are computing tmin or tmax. * Think carefully about each branch and determine which quick-exit tests you can do as soon as possible. * Test for tmax < 0 after each slab intersection test, i.e. a total of up to three times per test. I'm not sure the last bit is really worthwhile; again, an analysis of the frequency at which "early out" conditions are reached would help here. Given that in practice in POV-Ray (and other ray tracers I've seen) bounding boxes are actually intersected only about 20% of the time (or less), it seems early outs are worth stressing. One headache with adding Williams' code was that POV-Ray uses a peculiar definition for a bounding box, defining a lower-left corner and a length of each side of the box. Converting to lower-left/upper-right corners meant changing about 20 files. I didn't use the SPD tests for testing, as they would render a bit too fast! I could of course shoot more rays, but I admit I didn't want to bug around with it - if I saw an encouraging difference in speed, then I'd be more formal. Instead I went with the Balcony and Benchmark POV scenes, as these perform a large number of ray/box tests. It was a wash: the differences in overall times were at most 2% either way, with perhaps a slight advantage going to POV-Ray. I went back to Mahovsky's intersection test harness and timed the Williams code with my optimizations against my old POV-Ray box intersector. The timings were: Williams, with my optimizations: 117.6s POV-Ray: 118.8s About a 1% difference. OK, I think we've reached a plateau of optimization here, at least when you need the intersection point back, like POV-Ray does. Which may in fact be a mistake on POV-Ray's part; it uses a priority queue to order the intersections and attempt to find a close intersection quickly. Not using a priority queue and so being able to use the faster Plücker boolean test might serve POV-Ray better, especially given that 80% of all bounding boxes are missed. It would be interesting to know how much time is saved by finding a closest intersection early on and so discarding the boxes that were hit beyond this distance. Certainly this optimization is useless for shadow rays. Anyway, there's not much time difference between the two tests, new and old. That said, I prefer Williams' code, as it's much more elegant and compact. It would make a better pixel shader routine, for example, assuming pixel shader hardware properly follows IEEE specs. Still, I found it interesting that the lowly, ancient POV-Ray code from long ago turns out to be pretty optimal, even with its goofy bounding box structure. Shame on me, I should have explained this optimized code here long ago! But I think I felt that level of detail would have been boring, just like this article's pretty darn boring (did you actually make it this far?). The last tidbit was that I then tried "Favor Fast Code" vs. "Favor Small Code" for the whole POV-Ray project. This time, "Fast Code" was the way to go, the system appeared to be faster overall with this setting. So much for my earlier conclusions about "Small Code". Perhaps "Small Code" optimization for the bounding box code, "Fast Code" for everything else...? No, in that way lieth madness. So the real take-away for me was: test under real conditions, with real scenes that you care about. Early outs are nice to optimize when possible, as are precomputes. Yeesh, not much to show for all of this. Well, now I've saved others the trouble, I hope. If nothing else, we have the correct formula for distributing rays in an unbiased spherical pattern. But it didn't stop there. Solomon Boulos and I started to correspond on the issue. Solomon likes this test: float tmin = 1e-5f; float tmax = rays.getMinT(ray); for (int c = 0; c < 3; c++) { float t0 = (box[0][c] - rays.getOrigin(ray,c)) * rays.getInverseDirection(ray,c); float t1 = (box[1][c] - rays.getOrigin(ray,c)) * rays.getInverseDirection(ray,c); float near = (t0 < t1) ? t0 : t1; float far = (t0 < t1) ? t1 : t0; tmin = (tmin < near) ? near : tmin; // max of tmin, near tmax = (far < tmax) ? far : tmax; // min of tmax, far } if (tmin <= tmax) // valid intersection return ray; Excerpting what he writes: I believe that this code is faster. It lends itself to allowing the compiler to easily convert it to single scalar SSE code (giving a huge boost). Basically, recent x86 systems (and to some extent PPC systems, but especially the Cell) need you to write code without branching. In any case, it's way faster than all the possible early exits at least for my machine with my compiler (i686-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5363) on a 2.0 GHz Macbook -- yeah, the cheap white one). Now since I happened to do an internship at Microsoft a couple of years ago, I'm more than willing to say that I will ignore any performance results that use its compiler (so sorry about the hard work you put in, but the VC compiler is truly awful -- I've got some radix sort code that is 5x faster on the same machine when compiled under g++ even using all sorts of flags for VC). I don't know if you happen to run Cygwin, but the radix sort test I mentioned was g++ under Cygwin. You should give it a try. Also note that I'm not using the Williams clever sign test in my code anymore (I used to think it was important, but in the world of interactive ray tracing it's actually slower because it confuses the compiler and produces branches). ---- I tried his code under Visual C's compiler, with some options he recommended: "It looks like you can pass /arch:SSE2 to your list of optimization flags. You may also have to turn on intrinsics by doing /Oi. Looks like you might want /fp:fast too: http://msdn.microsoft.com/vstudio/tour/vs2005_guided_tour/VS2005pro/Framework/CPlus32BitOptimization.htm Doing so did convert Solomon's test to use SSE2 instructions (I checked the assembly code, it was using SSE - pretty cool to see the optimizer actually unwind the loop, as I hoped it would), but it also converted the other tests to use these, too. The results: % hit pluecker plueckerint hainesint hainesint_oct boulos -------------------------------------------------------------------- 0 13.58s 13.76s 14.21s 13.91s 15.30s 10 13.39s 13.82s 14.42s 13.67s 15.18s 20 13.96s 13.90s 14.43s 13.76s 15.31s 30 13.89s 13.88s 14.19s 13.97s 15.61s 40 13.64s 13.88s 14.42s 14.25s 15.38s 50 13.93s 14.02s 14.71s 14.67s 15.11s 60 14.12s 14.29s 15.98s 14.36s 15.40s 70 14.32s 14.79s 14.98s 14.23s 15.07s 80 14.40s 14.34s 14.91s 15.28s 15.86s 90 14.46s 14.90s 15.16s 14.99s 15.10s 100 14.38s 15.00s 15.20s 14.58s 14.95s -------------------------------------------------------------------- Sum: 154.07s 156.59s 162.62s 157.68s 168.29s The "% hit" is how many rays hit the box; early-out tests should help more when the hit percentage is low. "Pluecker" is the test that returns no intersection distance, "plueckerint" does. "Hainesint" is my modified Williams test, returning an intersection. Solomon suggested that some speedup could be had by using a switch statement instead of using indexing to extract the proper direction values, and "hainesint_oct" is indeed a bit faster for doing so, comparable with "plueckerint" (which does this same technique). "boulos" is Solomon's test, which appears to run slower than the others, at least on Visual C's compiler. Definitely a "your mileage may vary" kind of thing - without the compiler flags Solomon recommended, all the code runs maybe 3% faster overall. I'm not sure how it runs under g++, though - Solomon may be right about that. The thing to notice is that the early-outs do have an effect, lowering the intersection times when the ray misses the box (and in reality the ray hits the box maybe 20% of the time at most, it seems). Boulos' test, with no early-outs, takes longer. I guess my final conclusion is, "stop wasting time on finding a better ray/box test, the ones we have are pretty near-optimal". Figuring out the optimal compiler and compiler settings is certainly worthwhile if you're planning on shooting a lot of rays (like Utah does), but the basic algorithms we currently have appear to be pretty much the cat's meow (or the cat's pajamas, if you prefer). ---------------------------------------------------------------------------- SIGGRAPH 2006 Report, by Eric Haines Before I start, one little tip in order to be a cool graphics person (oxymoron that that is). "SIGGRAPH" is pronounced "sig-raff" (short "i", rhymes with "big"), not "see-graff" or "seeg-raff". Some highlight pictures (by someone else) are at http://www.pcmag.com/slideshow/0,1206,pg=0&s=1489&a=185055,00.asp First the fun stuff. I've been convinced for a number of years that if you throw enough money and people at a graphics rendering problem, you can do just about anything. If my mind were maybe made of adamantine or silicon or something, I would care for two solid hours only about the technical excellence of pieces in the Electronic Theater program. It's not, so now the ET is mostly about entertainment for me. With that in mind, my two favorite pieces happened to be beer commercials. It's the soundtrack that makes The Big Ad funny from the get-go: http://www.youtube.com/watch?v=tZslkC4y2qY, the graphics is secondary (and is done mostly to save money, not do something undoable with traditional means). Guiness' "noitulovE" is less hilarious, but more technically impressive: http://www.youtube.com/watch?v=9L4MOHpOLWo. Otherwise it's mostly a blur; I felt like there were fewer long, dark, man's- inhumanity-to-man pieces this year, and talking with Paul Debevec, he's committed to avoiding these for next year. That's a trend I'd like to see continue. Oh, there were also the usual red/green paddle games by Cinematrix, http://www.cinematrix.com/, before the show. What was interesting was to realize how limited groupthink can be. If the goal is obvious, like moving the Pong paddle, there's a little overshoot and whatnot, but overall the crowd does well. For other tasks there were two groups, one controlling the vertical, the other the horizontal. A game where the player goes through a maze, in a top-down view, was fun in that most people seemed to immediately shoot off and try to move as quickly as possible to somewhere (who cares where?), at least in our session. Eventually (accompanied by yelling from various "pixels") people stopped moving, figured out the path to the end location, and got it together and moved there. The Etch-a-Sketch game was one where things mostly fell apart. A circle is shown, the nib starts on the outline - draw it. Do you, as a controller, go left or right? Generally anything where either choice was a fine strategy for an individual led to gridlock and confusion, since you didn't know what the "up-and-down" guys were really going to do (or you didn't think much about them; maybe with enough practice and trust you'd be able to build up ways to interact successfully). Fun to think about; I wonder if they had made an arrow pointing clockwise around the figure whether we would have done any better; we were also pretty bad at drawing anyway, given how the drawing nib was controlled by velocity. In Emerging Technologies: * The plasma display was probably the most talked-about emerging technology. No, not a 2D plasma display, an Evil Genius focus a beam of laser light on a point in space, heat it up to 50k degrees and turn it into plasma display. Hellacious racket, and you could get only about 50 pixels for drawing things, but pretty amazing. See http://www.gamasutra.com/siggraph2006/news.php?story=10306, and another image at http://www.pcmag.com/slideshow_viewer/0,1205,l=&s=1489&a=185055&po=9,00.asp. * The spinning house, with Morphovision. Think of how a strobe light can freeze action, making a spinning object appear to be still. Now, instead of a strobe, sweep a vertical beam of light over the house, in sync with its rotation. Each vertical slice of the house is now frozen, but at a different time, so the house looks warped. Now play with the vertical beam of light, making it wavy, or broken into pieces, or whatever. You get all sorts of great effects. See http://www.youtube.com/watch?v=g0FpgYhlizM. It looks much better than this in real-life, without video problems and compression artifacts, but it gives you the idea. * Jeff Han's multitouch wall was great to see in action, after reading about it last year: http://mrl.nyu.edu/~jhan/ftirtouch/. Quite natural to just use, with only the occasional misregister. His Media Mirror was also fun to try: http://mrl.nyu.edu/~jhan/mediamirror/index.html * The Holovizio 3D display was impressive. Not just a two-image autostereoscopic display, it's more a holographic display; they claim each image is equivalent to about 100 2D images. It worked without glasses with a pretty wide view angle. Everything in the world is on YouTube now, and it's worth a look: http://www.youtube.com/watch?v=rPWFzq7C-v4 Not at SIGGRAPH, as far as I know, but it should have been: http://www.youtube.com/watch?v=Yd99gyE4jCk - LED fabrics from Philips. I didn't spend a lot of time on the exhibit hall floor, other than looking at books. The technology that made the most splash was Mova's "Contour Reality System", see http://www.mova.com/. Instead of gluing dots and whatnot onto people, cloth, or other objects for motion capture, just put on Halloween makeup with phosphorescence. A light flashes rapidly; when it's on, the normal face is captured by a camera. When it's off, other cameras capture the glowing phosphorescent dots. Now use software to correlate all of these separate dot locations among the cameras. As the face moves, the dot locations are tracked. I have to admit to not searching out all the ray tracing options available on the exhibit floor. For production rendering, RenderMan and MentalRay are the big two, with a number of small fry nipping at their heels. That said, there were ray tracers to be seen, the draw being that they were interactive. One new entry is RealTrace, from Michael McCool's company RapidMind (once known as Serious Hack). RealTrace was running on a number of platforms, including IBM's Cell: http://rapidmind.net/technology.php. Instead of aiming at large model visualization, the stress was on using ray tracing to improve shading quality. NVIDIA has a pretty impressive scene graph they're offering for free, NVSG: http://developer.nvidia.com/object/nvsg_home.html. My own experience with past scene graphs is that they almost do what I want them to, but there's always some gotcha on down the road. So I'm personally not interested, until the source code is provided. But, if you need a scene graph for some quick experimentation, this one might be worth trying out. NVIDIA also now provides their high-end GPU-accelerated renderer Gelato for free: http://www.nvidia.com/page/gz_home.html. You can also buy a professional version, which gives you support, the Sorbetto fast relighter, and some other bits needed for production rendering. It was interesting to see a Google booth this year, for the first time, mostly for the purpose of recruiting professionals. They were showing Google Earth and SketchUp, a fun architectural modeler from a company that Google purchased, and that's now free. Get it at http://www.sketchup.com/, and it's worth trying out. What's clever is that anyone can make a model of a building in the world and upload it to Google, through their 3D Warehouse system: http://sketchup.google.com/product_3dwh.html. This integrates with Google Earth, http://earth.google.com/3d.html, which will show links to buildings that exist and let you download the models. At 100 million downloads by June 12, 2006 (vs. essentially 0 downloads in June 12, 2005, when the product launched), Google Earth is perhaps now the most widely used 3D application in existence. http://www.google.com/press/pressrel/geoday.html. Admittedly, of those 100 million downloads, my guess is that the user base is far smaller, and most are just casual users. For example, my younger son loves to zoom in on cities and see what models are now there. Still, impressive stuff. As far as books go, there were not a huge number that interested me (hey, I'm still working through the ones I've got; just Shirley's "Fundamentals of Computer Graphics" will keep me a good long while). The most noteworthy, for me, was the book "Real-time Volume Graphics", by Engel et al, http://www.amazon.com/exec/obidos/tg/detail/-/1568812663/realtimerenderin/. Based on their SIGGRAPH course, it appears to cover the field quite well; it's good to finally have a book about this area of graphics. The second edition of "Advanced Global Illumination" by Dutre et al. is now available: http://www.amazon.com/Advanced-Global-Illumination-Philip-Dutre/dp/1568813074/realtimerenderin. The major additions are sections on light cuts, precomputed radiance transfer (PRT), and exercises at the end of each chapter. I was also interested to see "The Game Programmer's Guide to Torque", http://www.amazon.com/exec/obidos/tg/detail/-/1568812841/realtimerenderin/, as it's nice to see the extremely inexpensive Torque game engine, http://www.garagegames.com/ ($100 for indie developers), get more exposure. A book on the "Collada" 3D file specification is now available: http://www.amazon.com/gp/product/1568812876/realtimerenderin/. Collada is an open 3D file format developed by some researchers at Sony, led by Remi Arnaud. I spent a fair bit of time talking with Remi at the A.K. Peters' reception. He says Collada is not officially supported by Sony, since his effort was simply a part of their research group. That said, he approximates that about 15% of the games development community use Collada. Feeling Software provides FeelingViewer, a Collada viewer free for non-commercial use. The book itself goes over the specification in detail, having much more information than I could find online. It's interesting to look at what's of interest to the format, as it mirrors the concerns and workflow of game developers. On to courses, sketches, and papers. This isn't comprehensive coverage, rather I'm using the filter of "what do I remember a month later?" As usual, Tim Rowley has tracked where the SIGGRAPH papers and demo videos can be found, see http://www.cs.brown.edu/~tor/sig2006.html. I attended a fair bit of the Computational Photography course. This field is going to have a large effect on many areas in the future, with the most obvious to most of us being consumer products. It was fascinating to see how multiple images of various sorts with various filters could be captured and used to improve communication of what is in a scene. Some simple examples were high-dynamic range cameras combined with tone mapping software to give nicely balanced images. But that's the tip of the iceberg, there's all sorts of interesting experiments going on in this area. Shree Nayar (coauthor on 6 SIGGRAPH papers this year!) had, among other things, a catadioptric camera. This is a normal camera with a cylindrical reflector along the view direction. It give multiple images from different angles of a person's face or object, so allowing a 3D reconstruction of the geometry of the surface. Simple, and I like the idea that a single camera image gets used to capture multiple views. There were tons of other examples, like digital refocusing, deblurring scenes using structured light, extraction of information from diffuse reflections, and many more. There were many papers presented in the field. I liked Weiss' "Fast Median and Bilateral Filtering" paper, which I appreciated both for its results and its clarity in explaining the filter and its optimization. The filter can keep sharp edges sharp while smoothing other data, and Weiss gives an implementation that is much faster than Photoshop's. Another nice technique presented separately in a paper by Luft et al., "Image Enhancement by Unsharp Masking the Depth Buffer", see results at http://www.cgmi.inf.uni-konstanz.de/publications/2006/image_enhancement/results.html. Given an image with depth information, a few simple filtering and blending operations gives a stronger differentiation between objects. Image and video stream manipulation is a growing trend at SIGGRAPH. There's now enough computational power around that quite complex algorithms can be applied without (much) fear of glacial computation times. For example, "Real- time Video Abstraction", http://www.cs.northwestern.edu/~holger/Research/VideoAbstraction/. Rotoscoping, where you take live action and draw an abstract version, is a painstaking process. It was most recently used in "A Scanner Darkly", where with partial automation it still took 500 hours of work per minute of processed film (see http://en.wikipedia.org/wiki/A_Scanner_Darkly_(film)#Rotoscoping). This paper addresses the temporal coherence problem. It's not the last word in the subject, since each video stream will have its own challenges, but it's the type of research that will make this effect be more affordable, and so become more commonplace and eventually, overused and trite. That's OK, I like the effect, and really, better solutions to this problem will help many forms of video stream processing and so lead to a wider range of possibilities. In ray tracing there were bits of progress here and there. Ingo Wald mentioned that, due to a seminar at the University of Utah, some clever ideas came from students. These were somewhat impractical at first, but got researchers there thinking in new ways. A few papers came out of this collaboration, including the SIGGRAPH paper, "Ray Tracing Animated Scenes using Coherent Grid Traversal". This is a clever paper, with a different way of thinking of how to traverse a beam through a grid (3DDDA doesn't work), among other things. Ray tracing continues to be researched, due to the ability to use it for interactive rendering, and fast updates to the efficiency structure is an active area, as are new ways of intersecting groups of rays. Trends: Intel's announced a multicore processor with 4 cores. Put two of these on a motherboard and you've got an 8 core system on your desktop. Gordon Stoll (ray tracing researcher at Intel) says that these are quite parallelizable for ray tracing, they're not overwhelmed with memory contention or other communications problems. The two directions researchers are going with ray tracing systems are those for improved shading, and those for improved large scene rendering. Shading means reflection, refraction, photon map, and all sorts of other rays. Some of these sets of rays are pretty coherent (shadows, glossy reflection bundles, etc.), but many have little coherence. Lack of coherence also kills caching schemes. Also, a trend in interactive ray tracing is to track beams of rays, opening the beams as needed, and this lack of coherence hurts such approaches. How to ameliorate the problems of shooting these secondary rays is the challenge. Large model visualization is the other direction for research. There's lots more coherence when you limit rendering to eye and shadow rays, and rays do have the intrinsic advantage of O(log N) time to find what a particular ray hits. Z-buffers in theory are O(N). This is the basis of Kajiya's old argument, that ray tracing must be faster than z-buffering in the far future because of the difference in order; at some point the number of primitives overwhelms the z-buffer. It turns out this argument is less relevant in practice, as z-buffer algorithms for large models include using bounding volumes for frustum culling, occlusion culling, and rough front to back rendering to avoid reshading. I find it interesting that the trends are that single rays are getting grouped as beams for coherence, and that for z- buffering schemes such kd-trees and bounding volume hierarchies are used for increasing efficiency. Occlusion cull testing of boxes can be thought of as a way of checking a beam of rays (those that would hit the box) vs. some previously rendered objects to see if all rays have hit closer objects, similar to ray tracing. That said, ray tracing does have advantages in areas such as transparency, where Z-buffers have no ability to retain more than one surface at each pixel. Stephens et al. had a demo for last SIGGRAPH showing the 350 million polygon Boeing 777 model at interactive rates (see http://www.cs.utah.edu/~boulos/research.htm). In the field of GPU rendering, as usual there were some incredible demos. Most impressive to me was Natalya Tatarchuk's talk "Artist-Directable Real- Time Rain Rendering in City Environments", http://www.ati.com/developer/siggraph06/Tatarchuk-Rain.pdf. This talk is maybe 7 sketches worth of ideas and techniques, ways of solving various problems by using shaders and techniques of all different sorts. It went on and on, getting more and more involved, like physics models for raindrops streaking on glass, with larger droplets creating caustics in their shadows. Sure, these sorts of elaborate effects are common for movies, but to see it in an interactive context just drives home how far GPU capabilities have come. I attended David Blythe's talk about the new features coming up in DirectX 10. Short version: sounds good! More speed, more capabilities, more flexibility. The most interesting qualitative change is how geometry can be manipulated (created or destroyed) and written to a buffer for reuse in subsequent passes. But DirectX 10 will only run under Vista, so it'll be awhile before it's generally adopted. One clever sketch was "Single-Pass Wireframe Rendering". The short version: it's hard to render wireframe edges on top of models without artifacts. There is a three-pass algorithm I know that works perfectly, but ugh it's three passes. Their technique is to use a pixel shader to decide: if the pixel drawn is near an edge, draw the edge color, else shade normally. I can't find anything by the original researchers, but a nice summary is at http://www.stoneschool.com/Work/Siggraph/2006/index2.html. I liked this idea because it rethought the problem, making it a pixel shader algorithm instead of a line drawing and offset algorithm. And here's dessert: http://guir.cs.berkeley.edu/people/yangli/research/monet/ - pretty cute idea for prototyping UI: draw simple icons and widgets, glue them up, and you have a demoable application. Fun to go to the page and watch the animations. Lots more happened, but this is what I felt was of general interest and worth passing on for now. ---------------------------------------------------------------------------- END OF RTNEWS