_ __ ______ _ __ ' ) ) / ' ) ) /--' __. __ , --/ __ __. _. o ____ _, / / _ , , , _ / \_(_/|_/ (_/_ (_/ / (_(_/|_(__<_/ / <_(_)_ / (_ -------- A nice page on the classic point in polygon problem: http://www.geometryalgorithms.com/Archive/algorithm_0103/algorithm_0103.htm This whole site, http://www.geometryalgorithms.com/, has a lot of worthwhile material and links. -------- A couple of people suggested that I put my Eurographics talk on "Rendering: Input/Output" online, so it is now available at: http://www.research.ibm.com/people/h/holly/pres.html. It's in pdf, and with all the images still pretty bulky, so I divided it into 2 to 6 Mb pieces. I also put a couple other presentations, including my attempt to be controversial for the "Newton's Nightmare" panel at SIGGRAPH. - Holly Rushmeier -------- lRay, a ray tracer with caustics, soft shadows, and implicit surfaces, is available at: http://www.cfxweb.net/~lyc/lray.html -------- It had to happen eventually: http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=2061898269 This page may have disappeared by the time you look, so here's the gist: Selling NURBS Sphere for Maya 4.0!!! Cheap!! That's right, you lucky animators! I'm selling a collection of control vertexes in 3D space! If you are the lucky bidder I will send you the MEL script file containing the power to create the one, the only, the ... NURBS Sphere!!! Payment will be accepted by PayPal only. Good Luck! May the Best Animator Win!!!!! With 4 days to go, the auction had 13 bids and the current price was $10.50. -------- A little real-time ray tracer that raytraces only cubes: http://www.perilith.com/~lode/trixel/Files/Trixel002.zip Seems a bit slow, but it's kind of cute. -------- A followup on RTNv9n2, which was all about the pitfalls of writing a book, is this article, sent in by John Peterson: http://www.mediachannel.org/views/oped/bookcontract.shtml I should mention that my original "fear and loathing" column was not representative of my personal views (I had never dealt with a publisher at that point), but rather of a number of anonymous authors. My advice to prospective authors is to ask other authors how they feel about the various publishers - there are excellent publishers out there (and I'm not going to tread on any toes by making a list here and inadvertently leaving any out). That said, it does pay to be careful. Two other pieces of advice are to (a) get a lawyer to read over the book contract and (2) work your hardest to make your deadlines. If you're going to miss a deadline, tell your publisher sooner rather than later; in this way they can schedule resources better to meet everyone's needs. -------- For an entertaining use of global illumination (especially the Cornell Box) in computer games, check out this page: http://www.sync.dk/~khn/stuff/ - now with extra color bleeding. ---------------------------------------------------------------------------- RealStorm: A Realtime Raytracing Engine, by Michael Piepgras [You owe it to yourself to visit http://www.realstorm.com, download their benchmark, and give it a whirl; it's pretty impressive. I find Michael's approach to be very interesting in that it uses techniques normally used in the graphics accelerator world, such as view frustum culling, shadow volumes, and reflecting the viewpoint, instead applying them to the acceleration of ray tracing. - EAH] RealStorm is a RTRT-engine based on polynomial primitives like planes and quadrics. The idea is to build an engine for games, one that can handle raytracing effects with large scenes at interactive rates. In this way we could improve the quality of game-engines and give them a more realistic look. As the number of polygons representing a curved surface gets higher and higher, the model representation converges to polygons being smaller than a screen pixel, which wouldn't make any sense at all for a traditional Z-buffer to render. The raytracing engine is based on adaptive supersampling, with a 4x4 pixel grid for visibility, shadows, 1st recursion reflection, and shadows in 1st recursion reflections, each with its own buffer. In other words, only every 16th pixel is rendered, with more pixels sampled as needed. The primitives used are polynomial objects like polygons, planes and quadrics, because intersection calculations with these can be solved very rapidly. Each primitive uses its own optimized intersection code. In the text that follows, whenever "view frustum" is used, it means to check the viewcone first (the cone that contains the view frustum), and if the viewcone is intersected, then check the view frustum (both are tested against the object's bounding sphere and bounding box). Checking against a cone is faster than checking against a frustum, so we often can obtain quick object rejection. The engine lets each light be given a maximum radius of effect; beyond this radius the light does not affect any surface's shade. A far plane distance is also defined, as this is normally used in most game engines. These scene attributes are typically set by the world designer. Optimisation techniques per frame used are: (a) build a local scene out of the global scene objects which have any effect on the picture. (b) visibility check with viewport and build up an orthogonal bsp-tree per frame. (c) find the minimum projected viewplane area for each object. (d) "object receives light" check per light and object. (e) global shadow checking: obj<-->obj shadows, light<-->obj shadows. (f) shadow checking for 1st recursion shadows by checking the shadow volumes with the view frustum and the bsp tree. (g) for 1st recursion reflective surfaces, do a visibility check for possible reflected objects by reflecting the view frustum. (h) visibility checking for 1st recursion shadows in per object and shadow volume by reflecting the viewport. In other words, reflect the viewport on the reflective surface and do a shadow volume test with the reflected viewport to see if an object on a reflected surface can receive shadow. (i) As needed, tessellate the projection plane to 2^n by 2^n picture parts (depending on the actual frame complexity) and do (b)-(g) for each part. Some more details about (a)-(i): for (a): One possibility for checking an object against the local render space is: if (object.distance + object.radius < max_view_distance + max_light_radius) add_to_localscene; In other words, if an object's distance from the eye is less than that of the far clipping plane's distance plus the maximum radius a light is allowed to affect, then that object potentially affects the scene. for (b): An orthogonal BSP-tree is used because it can be calculated very quickly. for (c): This rectangle, i.e., the near plane of the view frustum, helps make the visibility check for (i) faster. for (d): The "object receives light" test simply checks if the object's bounding sphere is close enough to the light source, as each light is given a maximum radius of effect. If not, this reduces the loop-calls for lighting and shadows during rendering. That is, "for (l=0; lobj shadows" means finding what object can cause a shadow on another object: if ( distance(object_a,object_b) + BoundingSphereSize(object_a) + BoundingSphereSize(object_b) > max_light_radius) then object_a can't cause shadow on object_b (for all visibility and reflection rays). "light<-->obj shadows" means checking if object_a can cause any shadow in the picture. For example, if (ShadowVolume(object_a,light) == inside viewport) object_a can cause a shadow; (works for visibility, not for reflection). In other words, take the shadow volume for object_a and test it for intersection with the viewport frustum (using, say, bounding spheres around each of these as possible to try to obtain a quick exit). for (f): "1st recursion shadows" are those which are directly cast onto a visible object, and not the shadows which can be seen on reflected objects. Surviving shadow volumes from (e) are then checked with the orthogonal Bsp-Tree down to the leaves. If the shadow volume does not overlap any (visible) receiving object, then the cast shadow has no effect on the scene. Another approach is to compute the minimum projected shadow volume, which means to get the minimum shadow volume rectangle on the projection plane to reduce the shadow tests in (i). In other words, I project the shadow volume onto the view plane and find the 2D rectangle it covers on the screen. Using this 2D rectangle, a simple "is pixel in rectangle" test tells if a shadow test is need or not. for (g): This means building up a list of objects which can be seen in the reflection object from camera. This can be done easily for planes by reflecting the viewcone/frustum at the reflection plane. For quadrics I use something different: calculate the maximum rectangle inside the quadric parallel to the projection plane. For example, for a cylinder you can compute a rectangle formed of the two silhouette edges of the curved sides of the cylinder, along with two edges for the endcaps. This rectangle is then used to cull away anything behind the cylinder. In other words, a rectangle inside the cylinder is used to perform occlusion culling, and any objects behind the cylinder are thus eliminated from consideration. for (h): Here the obj<-->obj shadow lists are used. for (i): The list of visible objects has already been built for the whole image, as well as lists of possible shadowing objects. The next step is to find if image subdivision is needed. What I do is: subdivide the image if the number of rays needed for visibility tracing is more than 4 times higher than the resulting view frustum tests for the next subdivision level. For example: if (needed_rays>4*view_frustum_tests_for_next_subdivision_level) subdivide_subimage; (I am sure this is not the best way, but it's the best I found so far ;-). In other words, say I know that 8000 rays will be needed for the image, and I have a current list of 1700 objects. I know I'll need to do 1700 view frustum/object tests. As 8000 is greater than 4*1700, the image will be subdivided. Every subdivision level uses the list of the level above it, so in the first step you get 4 sublists from the single list, from each of these 4 you get 4 new sublists, and so on. Normally I don't use more than 3 subdivision levels, to avoid getting too much overhead. All these pre-frame checking techniques are optimized with special routines for each primitive (sphere<-->plane, sphere<-->cylinder, sphere<-->sphere, sphere <-->ellipsoid, ...) and the use of bounding boxes and bounding spheres. With our benchmark at 320x180 during 2375 frames, (b)-(i) reduces the number of intersection calculations by a high ratio (of course, this depends on the scene to be raytraced): No. Of Rays | without (b)-(i) with (b)-(i) ratio ---------------------------------------------------------------- Visibility | 8,589,935,274 84,251,189 101.9:1 | Shadows | 15,533,963,921 69,563,770 223.2:1 | Visibility | 6,444,228,737 52,383,307 123.0:1 in reflections | | Shadows | 7,920,241,335 112,368,826 70.4:1 in reflections | ---------------------------------------------------------------- Avg. Fps | 1.18 24.26 20.5:1 (on AMD1200) The main code optimization used is to reduce cache-misses, so lots of lists are copied around during rendering. For the rest the code is C, setting the compiler options to do Pentium II optimizations. Assembler optimizations, such as using MMX, SSE or 3Dnow, are not used yet. About image quality: To provide a realistic look it is important to handle different material shaders, therefore 7 layers of textures are used. These are color, reflection, bump, diffuse intensity, specular intensity, glow, and translucency (some of these handle procedural textures). Another improvement to materials is to handle more than Lambert and Phong (cos, cos()^n) for shading. In fact, 12 different shading models for diffuse and specular light are implemented. To improve the lighting quality, volume lights (and radiosity in the next benchmark) are implemented, but this is still too slow. Some more work has to be done there to get it to be realtime with dynamic scenes. For more information about RealStorm: http://www.realstorm.com ---------------------------------------------------------------------------- Efficient Bounding Box Intersection, by Brian Smits [Brian commented on my article in the previous issue, RTNv14n1, in which I give a somewhat crazy way of testing boxes for intersection. Rereading that article, I realize that I did not stress enough that the approach there was not meant to be considered efficient or practical. I was more interested in the fact that an alternate approach was possible. Anyway, the silver lining was that I received a nice little article from Brian. It's something he's discussed in passing elsewhere, and I'm glad to have it written out in full here. - EAH] I looked over the BBox intersection idea at the end of the last RTN. It seems way too complex. As I recall: intervalMin = ray.T.min; intervalMax = ray.T.max; t1 = (minX - originX ) * invDirX t2 = (maxX - originX) * invDirX if (invDirX > 0) { if (t1 > intervalMin) intervalMin = t1; if (t2 < intervalMax) intervalMax = t2; if (intervalMin > intervalMax) return false; } else { if (t2 > intervalMin) intervalMin = t2; if (t1 < intervalMax) intervalMax = t1; if (intervalMin > intervalMax) return false; } repeat for each direction. Note that the first "if" statement is constant for all bboxes intersected by the ray, so if you want to uglify your code you can create 8 versions of this function [i.e., one for each octant, dependent on the ray's direction], have no nested ifs (although you still have ifs that are dependent on operations done in previous ifs), and each function gets a lot simpler. bboxPosXDirPosYDirPosZDir t1 = (minX - originX ) * invDirX t2 = (maxX - originX) * invDirX if (t1 > intervalMin) intervalMin = t1; if (t2 < intervalMax) intervalMax = t2; if (intervalMin > intervalMax) return false; repeat for each direction. It seems unlikely that anything involving projecting points and doing any sort of polygon containment tests could be faster. Further argument is that for me, my bounding box test is either 4 or 10 times (memory is going) faster than my fastest ray triangle intersection (significant precomputation for each triangle). Let me know if this isn't clear or I made a mistake in the above code. [While we're on the topic, I wanted to add this note. There is an interesting little variant on ray/box testing. The article is: Schroeder, Tim, "Collision Detection Using Ray Casting," Game Developer, vol. 8, no. 8, pp. 50-56, August 2001. Code at http://www.gdmag.com/code.htm (it's the last routine). It has an odd variant on ray/box intersection there, actually line-segment/box intersection (but that's really what we all do, anyway). He takes the endpoints of the line segment and forms clip codes; you know, are both endpoints to the left or right of the bounding box, above or below, etc. If they are, he's done, he's missed. If they aren't, then from the codes he has knowledge of which faces could possibly be hit. Then test each possible face for an intersection using Woo's method, essentially. For example, if he finds the endpoints are in X.lo, X.middle; Y.middle, Y.hi; and both in Z.middle, he knows he has to test only the Xmin and Ymax faces of the box. Sounds like too much work for me, but maybe the clip codes give him an early-on high rejection rate and that's a win. - EAH] ---------------------------------------------------------------------------- Array Good, Linked List Bad, by Matt Pharr [My title, not Matt's. Matt wrote me about my article in RTNv12n1, on allocating grids. - EAH] I was looking around for something else when I came across that RT news from 1999 that had that suggestion about replacing linked lists in grids with memory that was allocated contiguously. So I made the corresponding changes to the little ray tracer project I've been working on here and... blammo! 10-20% faster! Who knew lists were *that* bad? Very exciting! My code used to be something like: struct Mailbox { int id; Primitive *prim; }; struct Grid { ... list *voxels; }; Where voxels is indexed by z*XRes*YRes + y*XRes + x. I basically just changed that STL list<> in Grid to a vector<>, which is an automatically-growing array, doubling in size whenever you fill it up. So that 10% came from just keeping the Mailbox *'s contiguous in each voxel, and getting rid of the linked-list traversal overhead. ---------------------------------------------------------------------------- Questions and Answers, by Vlastimil Havran [Vlastimil Havran had a number of comments on some questions I sent out to the RTNews subscription list before SIGGRAPH 2001. I've excerpted two of his responses here. - EAH] Mailboxing, yea or nay? I tested mailboxing within the GOLEM project, I can switch it on and off by recompiling the source codes using macros for conditional compilation as #ifdef ... My experience is that for the primitives in GOLEM the use of mailboxing improves the time required by at most 5-6%, but it can also increase the time by the same amount. This is heavily dependent on the surfaces, scene geometry and camera setting used. For triangles, spheres, cones, cylinders and other simpler primitives (as used within GOLEM) together with a good ray shooting algorithm, the use of mailboxing does not make sense or is completely questionable. For objects, where the ray-object intersection is computationally demanding (as NURBS and other spline, algebraic surfaces) it probably makes sense in most cases. The worse the ray shooting algorithm, the higher probability of testing unsuccessfully the same objects, and then logically mailboxing will be more successful. What would be a possible topic for research: given some ray shooting algorithm and the scene, predict whether the use of mailboxing makes sense or not. But the improvements in the time can be not so surprising. Since mailboxing also causes problems when uses in a parallel environment and particularly requires writing the results in the first access to the object into the main memory, my current feeling and all previous experience is then "no mailboxes," or use them only for some class of complex objects. Shared platform: Another very important question in the category of RT philosophy is that people in the RT-community are able to use some common rendering framework, at least in the research field. I am aware that about 25 frameworks have been in use for this purpose, some of them are publicly available, some of them not. In some sense, many of them have completely common features etc. It is a phenomenon that everyone (including me) who gets to know about the existence of ray tracing, must write his own ray tracer, usually very simple for the first time, then another one, etc. But overall it is, in some sense, quite a loss of time. When people start to collaborate on some good international project for ray tracing and global illumination, it would be very fine to have a common library which would cover at least the research community (about a common RT library for commerce I am much more doubtful). Some of the diversity of research makes this difficult, but I think that this is just the problem of people. When people use a shared library, then their research work would be 100% reproducible (which has also minuses, since it decreases the advance past competitive researchers). And it also requires a professional approach from the people involved in such a project. I was thinking about the topic very much in the last year, and I think it is really challenging problem. ---------------------------------------------------------------------------- Scanline vs. Ray Tracing, by Piero Foscari , Paul Kahler , and Matt Pharr Piero Foscari writes, on the Real-Time Raytracing email list http://www.onelist.com/community/realtime_raytracing: But one side question... why do people keep including scanline rendering in commercial packages when even with actual medium sized scenes and unoptimized engines raytracing is faster? Paul Kahler responds: Because change takes time. First you have to convince people, THEN they can change the software :-) Generally you must SHOW people in order to convince them, which often involves showing them with their own system. That means one of their own developers has to become convinced (on their own of course) and have time to do a prototype to show management. There is also the "if it ain't broke, don't fix it" problem. Matt Pharr replies: Frankly, I don't think that's it at all. Rather, the simple reason that commercial packages still include scanline support is that ray-tracing the eye rays isn't in fact faster than a good scanline algorithm. Obviously this depends on how good the ray-tracer is versus how good the scanline algorithm is, but I think that the gap is pretty significant, if you're comparing with a scanline algorithm that does good occlusion culling, etc. Particularly when you need to sample a pixel area very well, e.g. in order to get smooth motion blur or good anti-aliasing of fine geometry (think hair), ray-tracing just doesn't make sense, efficiency-wise--tracing hundreds of rays per pixel just takes too much time. ---------------------------------------------------------------------------- 13th Annual Eurographics Workshop on Rendering, by Nelson L. Max [Nelson has been writing these worthwhile summaries of EGWR for a few years now. Here's this year's installment. - EAH] The Eurographics Workshop on Rendering is an established conference, and the 13th annual session was held in Pisa, Italy, on June 26 - 28. Here are summaries of all the papers presented. Fernandez, Bala, and Greenberg computed "Local Illumination Environments" which store in octree cells the fully and partially visible light sources and potential occluders for the cell, accelerating direct lighting computations. Wald, Kollig, Benthin, Keller, and Slusallek do parallel global illumination by tracing coherent ray groups with quasi-Monte Carlo integration sampling. Dimitriev, Brabek, Myszkowski, and Seidel do selective photon tracing for moving scenes, first tracing "pilot photons" to detect regions of change, and then tracing "corrective photons" for more accurate integration there. Coconu and Hege have an octree-based point rendering system, which composits in hardware RGBA splats, using the recursive back to front sort for the octree cells, and the LDI occlusion compatible ordering for the points in an octree node. Botsch, Wiratanaya, and Kobbelt replace the points by the octree cells themselves, splatting the cell centers. Since most cells will be empty for surfaces, they suggest an efficient encoding taking about 2 bits per full cell. They also encode the normals and colors, and shade the normal codes, rather than the individual points. Yang, Everett, Buehler, and McMillan described a system of 64 unsynchronized cheap firewire video cameras connected to 6 PCs, which warp on graphics cards only the parts of their input images that will be needed in the final novel image, composited by a seventh PC. With no use of geometry to enhance the light field reconstruction, multiple exposures are visible outside of the plane of focus. Sander, Gortler, Snyder, and Hoppe construct a metric tensor representing the detail in a color or normal texture, and reparametrize the texture coordinate charts to minimize the stretch in this metric, putting the texture detail where it is needed most. Zelinka and Garland do very fast texture synthesis from examples using a jump map, which lists for each input pixel a set of matching input pixels. A new texture pixel is usually synthesized by extending the input patch of one of its neighbors, but occasionally a jump to a new patch is taken using the jump map. Lefebvre and Neyret generate either color or bump-map tree bark textures by an approximate simulation of the cracking process of the inelastic bark layers as the tree grows in diameter beneath them. Ward and Eydelberg-Vileshin render color accurately by defining the surface colors in CIE XYZ space using spectral integration with the spectrum of the dominant illuminant (good for direct illumination from a single kind of source, but not for multiple bounces or spectrally different sources) and then correct for white-point balancing in the "Sharp color space", which is based on super-saturated RGB primaries. Beckaert, Sbert, and Halton speed up path tracing by reusing all but the first leg of the ray paths at nearby pixels. Cammarano and Wann Jensen do time dependent photon mapping for correct motion blur of caustics by storing time as well as color, energy, and direction at each photon hit, and then averaging photons that are near in time as well as in space when rendering with a distributed ray caster. Ashikhmin maps high dynamic range input images to a limited output range by determining a local adaptation level (the average over the largest neighborhood without high contrast), tone mapping this adaptation level using linear compression, and then putting back the detailed contrast by multiplying by the ratio of the input pixel value to the local adaptation level. Sawhney, Arpa, Kumar, Samarasekera, Aggarwal, Hsu, Nister, and Hanna map multiple real-time video surveillance images as textures onto a navigable urban geometric environment model. Yamazaki, Sagawa, Kawasaki, Ikeuchi, and Sakauchi extract RGBZ microfacets from a laser-scanner / color-camera input images by clipping the color/range data to small cubical volumes, and projecting it onto small quadrilaterals perpendicular to the output viewing directions. The projection and depth clipping is done in hardware by converting the depth to an opacity, and using register combiners and alpha clipping. Jeshke and Wimmer simplify a complex model into textured polygonal meshes suitable for a specific view cell by scan converting the model into a voxel representation using depth-clipped hardware rendering into layers. The layer spacing is chosen so that parallax from a moving viewpoint within the view cell is no more than one pixel. The layers are processed in front-to-back order, in order to delete all occluded voxels, with special treatment for the one-pixel-wide border between filled and empty pixels in each layer. An initial complex polygonal mesh is constructed and then simplified, with the constraint that the simplified mesh must cover exactly all the non-occluded voxels. Nirenstein, Blake, and Gain do exact polygon-to-polygon visibility by representing all stabbing lines connecting the two polygons in a 5D euclidean space using Plucker coordinates (with one of the six coordinates normalized to 1). By computational geometry in this space, volumes are subtracted for stabbing lines hitting each occluder, leaving the non-occluded stabbing lines. This is applied to give exact visibility culling for a polyhedral viewpoint volume. Baxter, Sud, Govindaraju, and Manocha achieve interactive walkthroughs for huge complex environments, with a parallel pipelined system using two graphics pipelines and three CPU processes, based on an axis aligned bounding box scene hierarchy, hierarchical Z buffer occlusion culling, and level of detail decisions. The first process drives one pipeline renders an Item/Z buffer from the culled geometry for the previous frame, which is read back for the hierarchical Z buffer occlusion cull. Another process performs the view frustum and occlusion culls and level of detail decisions in software, and the third process drives the final rendering in the second graphics pipeline. Secord, Heidrich, and Striet place point or stroke primitives for illustration-style rendering of a grey scale image, by redistributing a predefined Poisson-disc or Halton sequence 2D point distribution. The marginal y cumulative distribution of intensity in the input image (found by integration in x along scan lines) is used to redistribute the y coordinates, and then the cumulative distribution in x on the scan lines is used to redistribute the x coordinates. Corrections are made to account for the probability that multiple stokes are assigned to the same pixel. Stroke direction can be determined from image gradients, colors, or information from a 3D model, if available. Freudenberg, Masuch, and Strothotte use blending operations on graphics hardware to emulate the halftone screening process traditionally used in the printing industry. Hertzmann, Oliver, Curless, and Seitz create line drawing line styles from examples, by adapting ideas from texture synthesis from examples. Masselus, Dutre, and Anrys capture the reflectance field of an object by determining the position of a hand-moved light source from its effects on four small diffusely reflecting spheres. Furukawa, Kawasaki, Ikeuchi, and Sakauchi compress a 4D bi-directional texture function measured from images of an object under varying illumination, using a tensor product expansion. Matusik, Pfister, Ziegler, Ngan, and McMillan acquire low resolution reflectance information for a reflecting and refracting object using a traditional set-up of multiple rotating lights, a fixed set of cameras, and a rotating turntable for the object. But they add two large plasma panel monitors, a horizontal one below the turntable, and a vertical one behind the object, to obtain high resolution reflection and refraction information, each represented by a gaussian weight distribution about a reflection or refraction direction, as determined by fitting the response to 1D sinusoidal wave patterns in three orientations on the monitors. These monitor views are also used for detecting an alpha-matted visual hull. The parameters of the gaussians are determined for a new view using unstructured lumigraph interpolation, and used to convolve with a new environment map. Wexler, Fitzgibbon, and Zisserman solve a similar problem by using a sequence of images of a transparent object, moving in front of a fixed planar background image, but use a weighted combination of a few nearby pixels instead of a gaussian, and consider only refraction. Kautz, Sloan, and Snyder convolve a BRDF with an environment map by representing both with spherical harmonics. The necessary rotations of the spherical harmonic coefficients for the environment into the local frame for the surface are done per vertex in software. The rotation of the viewing vector, the look-up of the spherical harmonic coefficients for expansion in the lighting direction from a texture function of the viewing vector, and the dot product of the two aligned spherical harmonic coefficients to do the convolution in frequency space, are done per pixel in hardware. Akenine-Moller and Assarsson generalize Frank Crow's shadow volume method to penumbras by following along profile contour polylines creating a shadow wedge for each edge, bounded by the umbra plane, the penumbra plane, and two side planes separating the wedge for the current edge from those from the preceding and following edges on the contour. If a surface point is found to lie inside a wedge, a light intensity is found by bilinear interpolation within the quadrilateral where the plane through the viewing ray parallel to the profile edge intersects the wedge. ---------------------------------------------------------------------------- SSM and OOM Become Enigma, by Mark Manning In regards to RTNews8b, I'm one of the original programmers for SSM (way back in 1988 is when I started). I thought I'd bring you up to date a bit. First - we have no control over what the University of Georgia does with the software. We write it - they sell it. I've written to them recently though and asked them why such a high price for the software. Especially since SSM and OOM are now very very old in terms of 3D software. I've asked them to reduce the cost to something like $50.00. Time will only tell if they do this or not. :-) Next - SSM and OOM went by the wayside back in 1994 when we combined the two pieces of software into a single package called Enigma. The match hasn't gone all that well for many years but is now beginning to come together at last. That is to say - we went through the change, then had to redo Enigma so it worked with X-Windows (SSM/OOM were written before X-Windows came along), then OpenGL caused another revamp, finally C++ appeared and we have had several problems converting several hundred thousand lines of code from C to C++'s methods. Along the way Enigma has won several awards at NASA both for its capabilities as well as its general usefulness. We are also revamping several methodologies within Enigma which will enhance its speed considerably. We hope to win even more awards by doing this. Our highest achievement was to come in second place over all of the other thousands of programs across America which have been written and used at NASA. The first place program was truly awesome and deserved first place. In going to Enigma, we have combined the creation of models with the animation of those models. This might not seem too big of a deal in today's world of Everquest but this capability was put in back before the Internet even began having an impact on everyone's lives. Remember too that this is an engineering tool and not really meant for games or movies. It's meant to be as accurate as possible. So we also sacrifice speed sometimes for precision. Some of the new additions to Enigma are the capability to adjust the entire model, one of the objects within the model, or individual surfaces (polygons). I am presently expanding upon this by making it so someone can adjust an individual polygon and stretch all of the polygons around that polygon, snap the polygon off from the rest and move it, or to be able to stretch a polygon to fill holes and the like. We are also enhancing the animation capabilities, speed of redrawing scenes, speeding up the location of a particular model/object/surface, and many other things which were not a part of the original SSM and OOM programs. If you would like more information about what all SSM and OOM has evolved into - just let me know. ---------------------------------------------------------------------------- A Simple Ray-Triangle Pluecker Optimization, by Jeffrey Mahovsky [Though I personally prefer methods such as Moeller/Trumbore (see http://www.ce.chalmers.se/staff/tomasm/code/) for ray/triangle intersection, there are occasions when using Pluecker coordinates can make sense. The test is straightforward and elegant. For some previous work on the subject, see Ray Jones' article in RTNv13n1. - EAH] There is a simple way to reduce the memory requirements of ray-triangle intersections by 17% when using Pluecker coordinates. The optimization also improves performance by roughly 5%. Ray-triangle tests using Pluecker coordinates will look something like this: if (r0 * a4 + r1 * a5 + r2 * a3 + r3 * a2 + r4 * a0 + r5 * a1 < 0) return MISS; if (r0 * b4 + r1 * b5 + r2 * b3 + r3 * b2 + r4 * b0 + r5 * b1 < 0) return MISS; if (r0 * c4 + r1 * c5 + r2 * c3 + r3 * c2 + r4 * c0 + r5 * c1 < 0) return MISS; else return HIT; Where r0..r5 are the Pluecker coefficients for the ray and a0..a5, b0..b5, and c0..c5 are the Pluecker coefficients for the three triangle edges. Say the triangle is defined counterclockwise as vertices (x0, y0, z0), (x1, y1, z1), (x2, y2, z2). Then the equations for generating the Pluecker edge coefficients are: a0 = x0 * y1 - x1 * y0 a1 = x0 * z1 - x1 * z0 a2 = x0 - x1 a3 = y0 * z1 - y1 * z0 a4 = z0 - z1 a5 = y1 - y0 b0 = x1 * y2 - x2 * y1 b1 = x1 * z2 - x2 * z1 b2 = x1 - x2 b3 = y1 * z2 - y2 * z1 b4 = z1 - z2 b5 = y2 - y1 c0 = x2 * y0 - x0 * y2 c1 = x2 * z0 - x0 * z2 c2 = x2 - x0 c3 = y2 * z0 - y0 * z2 c4 = z2 - z0 c5 = y0 - y2 Realize that: a2 + b2 + c2 = x0 - x1 + x1 - x2 + x2 - x0 = 0 a4 + b4 + c4 = z0 - z1 + z1 - z2 + z2 - z0 = 0 a5 + b5 + c5 = y1 - y0 + y2 - y1 + y0 - y2 = 0 therefore: c2 = -a2 - b2 c4 = -a4 - b4 c5 = -a5 - b5 Substituting into the 'if' statements: if (r0 * a4 + r1 * a5 + r2 * a3 + r3 * a2 + r4 * a0 + r5 * a1 < 0) return MISS; if (r0 * b4 + r1 * b5 + r2 * b3 + r3 * b2 + r4 * b0 + r5 * b1 < 0) return MISS; if (-r0 * a4 - r0 * b4 - r1 * a5 - r1 * b5 + r2 * c3 + -r3 * a2 - r3 * b2 + r4 * c0 + r5 * c1 < 0) return MISS; else return HIT; Simplifying, the optimized form is obtained: A = r0 * a4 + r1 * a5 + r3 * a2 if (A + r2 * a3 + r4 * a0 + r5 * a1 < 0) return MISS; B = r0 * b4 + r1 * b5 + r3 * b2 if (B + r2 * b3 + r4 * b0 + r5 * b1 < 0) return MISS; if (r2 * c3 + r4 * c0 + r5 * c1 - A - B < 0) return MISS; else return HIT; Note that we no longer need to precompute and store c2, c4, and c5. Hence only 15 Pluecker coefficients need to be stored for the triangle instead of 18 - a 17% memory savings. There should be no penalty for separately computing values A and B, as the total number of multiplies and adds is the same for both methods up until the third 'if' statement. The optimized third 'if' has 3 multiplies and 4 adds/subtracts, instead of 6 multiplies and 5 adds in the unoptimized version. Hence the savings is 3 multiplies and 1 add. This results in a roughly 5% speed increase. Few ray-triangle tests make it all the way to the third 'if' so the overall speed increase is not as great as one might think. This optimization also extends to convex polygons with more sides, although the percentage savings is not as great. For example, a 4-sided poly would require 21 coefficients instead of 24 (a 13% savings) and the fourth 'if' statement would require 3 multiplies and 5 adds/subtracts, a savings of 3 multiplies. (Note the additional subtract!) (Note that the Pluecker test shown here doesn't compute an intersection distance along the ray; a separate calculation is needed for that.) ---------------------------------------------------------------------------- Recovering a Triangle's Surface Normal from the Pluecker Coordinates of its Edges, by Jeffrey Mahovsky There is a simple way to recover the un-normalized surface normal of a triangle from the Pluecker coordinates of its edges. This avoids the need to store the surface normal separately from the Pluecker coefficients. Given a triangle with vertices (x0, y0, z0), (x1, y1, z1), (x2, y2, z2), the Pluecker coordinates of the edges A, B, and C are given by the following 18 coefficients: A0 = x0 * y1 - x1 * y0; A1 = x0 * z1 - x1 * z0; A2 = x0 - x1; A3 = y0 * z1 - y1 * z0; A4 = z0 - z1; A5 = y1 - y0; B0 = x1 * y2 - x2 * y1; B1 = x1 * z2 - x2 * z1; B2 = x1 - x2; B3 = y1 * z2 - y2 * z1; B4 = z1 - z2; B5 = y2 - y1; C0 = x2 * y0 - x0 * y2; C1 = x2 * z0 - x0 * z2; C2 = x2 - x0; C3 = y2 * z0 - y0 * z2; C4 = z2 - z0; C5 = y0 - y2; The un-normalized surface normal can be recovered with the following equations: ni = A3 + B3 + C3 nj = -A1 - B1 - C1 nk = A0 + B0 + C0 Proof: The surface normal components (ni, nj, and nk) of a 3-D triangle are directly proportional to the areas of the 2-D triangles that result when the 3-D triangle is projected onto each of the x, y, and z planes. The area of the 2-D triangle projected onto the z plane (and the k component of the surface normal) is given by the equation: | x0 y0 1 | areaZ or nk = 1/2 | x1 y1 1 | | x2 y2 1 | (This assumes the vertices are defined in counter-clockwise fashion.) (Any good book on geometry should have this area equation.) Expanding the determinant gives: areaZ or nk = 1/2 * (x0 * y1 - y0 * x1 - x0 * y2 + y0 * x2 + x1 * y2 - y1 * x2) The factor of 1/2 can be removed as the resulting nk component isn't normalized anyways. areaZ or nk = x0 * y1 - y0 * x1 - x0 * y2 + y0 * x2 + x1 * y2 - y1 * x2 Similarly, for ni and nj: | y0 z0 1 | areaX or ni = 1/2 | y1 z1 1 | | y2 z2 1 | areaX or ni = y0 * z1 - z0 * y1 - y0 * z2 + z0 * y2 + y1 * z2 - z1 * y2 | x0 -z0 1 | areaY or nj = 1/2 | x1 -z1 1 | | x2 -z2 1 | areaY or nj = -x0 * z1 + z0 * x1 + x0 * z2 - z0 * x2 - x1 * z2 + z1 * x2 Comparing these equations with the Pluecker coefficient equations, it is easy to see that: ni = A3 + B3 + C3 nj = -A1 - B1 - C1 nk = A0 + B0 + C0 This idea will probably extend to any convex n-sided polygon with edges defined as Pluecker coordinates. The number of addition operations needed to recover each normal component would be (n-1), where n is the number of sides. ---------------------------------------------------------------------------- Objects per Grid Cell, by Pete Shirley I wrote Pete about uniform grid efficient structures, and how many cells they should have compared to the number of objects in the scene: In your book "Realistic Ray Tracing" you mention that the number of cells in a grid should be within an order of magnitude of the number of objects. True, but interestingly, people seem to find that the number of cells should be noticeably larger than the number of objects for best performance, e.g. 3* as many cells as objects. Again, mileage varies, but if this turns out to be the case, then mailboxing becomes very important, as there will more likely be objects spanning two or more grid cells. Pete's reply: Bruce Walter and I did a bunch of work on analyzing this in an as yet unpublished paper. Definitely if (cost of object intersection) is greater than (cost of cell advance) then you want more cells than objects. My guess is between 2 and 10 has worked best in my code for most models. 8 is my default. ---------------------------------------------------------------------------- END OF RTNEWS