_ __ ______ _ __ ' ) ) / ' ) ) /--' __. __ , --/ __ __. _. o ____ _, / / _ , , , _ / \_(_/|_/ (_/_ (_/ / (_(_/|_(__<_/ / <_(_)_ / (_ comments: We did some volumetric rendering with raytracing and then "mixed" that with a normal 3d scene (the scene was drawn using 3d acceleration). Fair enough, it looks a bit blurry but it does give the impression of flying through 3d clouds. ---- There is a quite impressive RT intro, Fresnel 2 by Kolor. You can get it at http://www.kaoz.org/kolor It has two versions, one for the PIII and the other for less powerful machines. You need an OpenGL capable 3D card. --------- The web site for Dr. David Rogers' book "An Introduction to NURBS" is at: http://www.nar-associates.com/nurbs/nurbs.html It includes information about the book, errata, source code, etc. -------- The Graphics Gems site has gone through a major redesign (as well as getting a more memorable URL), with all code now directly linked to the gems, which have been organized by book, category, and author: http://www.graphicsgems.org/ -------- Tidbits: an early draft of "Point in Polygon Strategies", and efficient code for doing shaft culling, is now available at the site http://www.erichaines.com/. -------- If you just can't get enough on Pluecker coordinates, i.e. the articles in RTNv10n3 and RTNv11n1 weren't enough for you, try: http://www.flipcode.com/tutorials/tut_pluecker.shtml It is derived from the RTN articles, but has some other information and may be more approachable. -------- A derivation of the formula for ray/general quadric intersection, from Paul Kahler: http://www.oakland.edu/~phkahler/techdoc/RayQuadric.doc (it's a Word Doc file) -------- The site: http://www.schorsch.com/kbase/glossary/index.html has a glossary of the various terms used in the area of lighting and related fields. Reasonably done, though nothing deep. There are certainly a few I've never heard of before (e.g. "etendue"). -------- Where can you find minimal indirect illumination, no atmospheric effects, perfectly reflective surfaces, and hard-edged shadows (depending on the location and size of the sun)? Yes, space, the real home of classical ray tracing. Check out movies and stills at: http://www.spacebattles.com/ ---------------------------------------------------------------------------- Octree Implementation and RTRT Speeds, by Paul Kahler , Vlastimil Havran , Ingo Wald , John Stone , and Nick Chirkov [excerpted from the Realtime Raytracing mailing list at http://groups.yahoo.com/group/realtime_raytracing/. - EAH] I've done some checking, and ran across RTNv12n2 where there is an article covering many (all?) of the various papers on ray-octree traversal. From reading the descriptions, it seems that the method I worked so hard to come up with is #18 by Gargantini and Atkinson from 1993. Yet another algorithm I've independently reinvented :-) There was no link to the actual paper or code, so I haven't compared my implementation to theirs yet. I must say that this algorithm (and others like it) is VERY sensitive to the details of the implementation. Let me explain... I started with a simple octree using 8 pointers to children at each node, with NULL pointers where there are no children. I use axially aligned boxes with subdivision always happening in the center (I have good reasons to back up this choice, but that's another story). My first implementation recursed to all 8 children as long as there was no NULL pointer. Each node was responsible for determining if the ray hit it. The hit test treated the node as the intersection of 3 slabs and checked for the existence of a non-empty span of the ray passing through all 3. I used a table of 8 orders to check the children in front to back order based on the direction the ray came from. The biggest problem with this is that the ray can hit at most 4 of the children, this means potentially many recursive calls to intersection tests with nodes the ray does not hit. Note that the computation lies completely with the child nodes, the parent only checks pointers for NULL. I realized that a little work in the parent node can reduce the number of children checked to 4 (at most) and that helped a LOT. I ultimately reached the algorithm described in the abstract I found. I'm still left with some open questions. I can still put more work into the parent nodes, but it will be done even for children with NULL pointers - this will slow things down in some cases and speed it up in others. I can also do more work related to sorting things, however there are only 5 numbers whose order matters and more complex code may hurt things more than it helps - this one is not dependant on the existence of children, so I should try it. I still need to find the authors' code to compare - since this algorithm is so sensitive to what optimization is done, it's conceivable that I can beat them. A modest 2x performance increase would put it in direct competition with those grids everyone keeps talking about. Don't laugh, I've already seen an increase of more than 4x over my first implementation while keeping the same time complexity. Also, there are some things that would be best handled by combining sign-bits in floating point numbers as opposed to using conditional code. This could reduce branch prediction errors enormously, but Pentium type processors really aren't designed with that in mind. BTW, I expect Athlon to smoke P4 running this code due to the branching complexity and the 20-stage P4 pipes that will keep getting flushed - if the branches could be predicted, we'd have an even better algorithm :-) Anyway, there's a demo of it at: http://www.oakland.edu/~phkahler/rtrt [There is also an interesting article on doing what is essentially adaptive supersampling of a low resolution version of the image (i.e. render every eighth pixel to start) at Paul's site at http://www.oakland.edu/~phkahler/rtrt/dre.html - EAH] Select demo 4. It does a big fractal octree thing and lets you subdivide up to 8 levels deep. Most rays from the camera hit the tree root, and with 2 light sources there's a lot more work to do also. Since the octree is fixed in space, I attached the camera AND light sources as children of the same transformation node in the scene graph, this make it look like the octree is spinning :-) BTW, I can turn off leaf visibility and place objects in the octree, but that starts making speed dependent on other things than tree traversal which is what I've been trying to optimize. ---- [I asked Paul for some timing numbers, and he tried some tests. -EAH] My terrain dataset is 511x511 squares each divided into 2 triangles. The vertices are offset vertically using 2 frequencies of Perlin noise. I have one light source up in the sky, and turning off shadow rays only added 15% to the frame rate because those rays trace very efficiently. I ran some more controlled tests. 320x240 images with a ray every 8th pixel for a total of 9600 eye rays per image. The camera was aimed 0.5 radians down from straight horizontal (it thrashes if you get the horizon in view with too many polys). This means every ray hit the ground and had 1 shadow ray cast. The camera was high enough to see several hundred (perhaps a couple thousand?) triangles at a time. My DRE was turned off so it was 19200 rays per frame. So... with tracing a terrain grid of size 2047x2047x2 = 8,380,418 triangles I got 4.0 FPS with some fluctuation up to 4.1 & 4.2 - call it 4.0 for a total of 76800 rays per sec. The root of the octree was 11 levels deep (including root and leaf). It took several minutes to build the terrain, and then another minute for the thrashing to stop after it started rendering. Last time I did this it allocated about 1GB of memory. Next with 511x511x2 = 522,242 triangles in a 10 level octree I got... the same result 4.0 FPS - not fluctuating up as often. It only takes a few seconds to build the scene and the speed tops out in a few more. Tree depth is really the determining factor, not polygon count. Then I pointed the camera up higher so about 75% the image was ground and the rest was sky (default blue color). This gave about the same speed as the previous test. While there are fewer rays, they have to do more work - passing over the ground at lower levels of the tree. The horizon looked really bad due to the pixel interpolation. I turned on DRE and it cleared up but dropped to 3.2 - 3.4 FPS. Summary: I'm at about 76K-80K rays/second in a 10 level octree with 50% being shadow rays. If I point the camera straight down it speeds up to the 130K rays per second due to fewer intersection tests. All done on an Athlon 700 with 128M RAM. Paul gave an update a few months later: I am now getting 140K rays per second all the time (perhaps a dip to 130K) and going up to 215K per second with the camera pointed down to avoid those over the horizon rays. [Paul got this speed boost from code tweaks like turning C++ to C, inlining, and in translating inner loops into assembler. - EAH] ---- Vlastimil Havran comments: I agree with the sensitivity to the implementation issues, I improved the efficiency of the code for octree about three times by tuning and trying different versions of the code etc. It seems that actual implementation is still very important. The determining of which child nodes have to be traversed is the real potential source of the inefficiency. What I would recommend is to get the paper from J. Revelles et al, "An Efficient Parametric Algorithm for Octree Traversal", and implement it, since the optimization idea in this paper is just what you are searching for and is very promising, I think. Unfortunately, I have not yet time to reimplement it in my own source code. The paper is available at: http://giig.ugr.es/~curena/papers/, and also includes some pseudocode of the traversal algorithm. ---- Ingo Wald comments: Two numbers from our interactive ray tracer [see http://graphics.cs.uni-sb.de/rtrt/] may be of particular interest for you: - about 90,000 rays per second for a 4-million triangle scene in which almost all triangles are visible. - about 150,000 to 600,000 primary rays per second (depending on view settings) for a 1.5 million triangle scene with lots of occlusion. (Both scenes are primary rays only, flat shading) All that on a 800 MHz PIII. [Part of the difference in speeds is because Ingo et al are using SIMD (SSE) instructions in a clever way: they intersect four rays at once against a single triangle. Like Paul, they also pay a lot of attention to and optimize for cache lines and other architectural elements of their processor. -EAH] ---- John Stone notes (in a separate email; worth mentioning here): There's a heck of a lot that can be done to fit a ray tracer to an architecture. Make sure to have tight loops that avoid branches, precompute stuff, have the cache-lines get filled nicely by making data structures fit them, ways to fit like data together to maximize coherency (I've seen some very funky code for culling for z-buffers by Intel, for example, that has a few copies of a mesh's data in different places to get better locality). ---- Nick Chirkov writes: My ray tracer works on any polygonal models, builds BSP, and traces 40,000 uniformly distributed rays/sec (Celeron433) on the well-known Attic (~52,000 triangles) scene with ratio of correct hits 99.96%. All code is C++, SSE is not used. I can get a speed up of 150sing Intel's C++ compiler and rewrite something like if(*(DWORD*)&float_value<0) to if (float_value<0.0f). I can't use occlusion techniques because of shadow generation [I'm not sure what this means. -EAH]. Uniformly distributed rays is slower than primary rays because of CPU cache misses. In real image generation it was ~60,000 rays/sec on the Attic scene with 3 light sources with shadows. I tested a scene with ~480,000 triangles and one light source. It was an old ship and all triangles were within the view frustum. I got ~30,000 rays/sec. [See Nick's page at http://www.geocities.com/SiliconValley/Bay/5604/raytrace.htm] ---------------------------------------------------------------------------- Loose Octrees for Dynamic Raytracing, by Eric Haines In a dynamic animation environment, one problem for ray tracers to solve is updating the spatial ray tracing structure as quickly as possible. This is especially true if multiprocessor machines are being used. Reinhard, Smits, and Hansen give one solution in their work, "Dynamic Acceleration Structures for Interactive Ray Tracing," http://www.cs.utah.edu/~reinhard/egwr/. In this paper they insert objects into a grid. As objects move outside the grid, they do a modulo operation for the object's location. So, say an object is in the rightmost grid cell, #9, in a 10x10x10 grid. It moves outside the grid - instead of recreating the grid, the object is moved to cell #0, but is noted as being in a "copy" of the grid at X location 1. Another way to express it is that each grid coordinate has a number 0-9, but think of the world as being filled with these grids, in a repeating grid. This meta-grid is accessed with the higher numerals of the object's location number. Everything starts in grid #0. So 10 would mean meta-grid #1 along the axis, 20 is meta-grid #2, etc. Now when a ray travels through a grid cell containing something outside the normal grid, the object's real location is also checked. If the meta-grid location does not match the ray's meta-grid location, the object is not actually there, so it's not tested. As time goes on and more objects move outside the grid, this scheme becomes less efficient as more objects have to be tested but can never be hit. See the paper for how the authors decide to regenerate the grid when it becomes inefficient. What's clever about their scheme is that when an object moves, it is quick to remove it from the grid and reinsert it. The grid does not have to be regenerated. This scheme can also work with a hierarchy of grids (i.e. nested grids). The authors note that normal octrees suffer from non-constant insertion and deletion times, as the tree has to be traversed and an object may get put into two or more nodes. Thatcher Ulrich's "loose octree" spatial partitioning scheme has some interesting features. Meant for collision detection, it may also have application to real-time ray tracing. The basic idea is that you make each octree node actually enclose twice the space, in each direction, as its location in the octree. That is, normally an octree node does not overlap its neighbors - space is precisely partitioned. In Ulrich's scheme, the octree node box is extended by 1/2 in the six directions of its face. Anything listed in this node is inside this extended box. This makes for a less-efficient partitioning of space, but has a great advantage for dynamic objects. As an object moves or otherwise changes, it can be removed and inserted in the tree "instantly". Say you want to insert spheres into this structure. The radius of the sphere determines exactly what level of the octree you need to insert it at. For example, if the extended octree node at some level is, say, 12x12x12 units, then a sphere with a radius of 3 or less must fit inside this extended node if the center of the sphere is inside the unextended octree node. If the radius is 1.5 or less, it can be inserted in the next octree node down (6x6x6) or further, as it must fit there. So just knowing the sphere's center and radius fully determines which octree node to insert it into, without searching or bounds testing (other than walking down the tree to the node itself). Similarly, deletion from the octree is quick: each object exists in one and only one octree node, which can be found immediately and so deleted from. It might even be faster to hash the octree nodes by their level and address (as Glassner did in his original scheme) to more quickly delete from them. This gives at least some hope that octrees could be used in a dynamic ray tracing system. Another nice feature of octrees is that if an object moves outside the bounding box of the entire octree, this octree can become a sub-node of a larger octree made on the fly that also encloses the space the object moved to. I have to admit that the loose octree structure seems pretty inefficient to trace rays against, but really I am just brainstorming here, presenting a possible use for a clever, new data structure with some useful properties. I can imagine combining loose octrees with other schemes, e.g. also creating bounding boxes within each populated node lazily, as needed, to further speed intersection testing. See more about the loose octree idea at http://www.tulrich.com/geekstuff/partitioning.html, and read more about it in the book "Game Programming Gems." There are some optimizations I do not mention here, like being able to sometimes push some objects one level deeper into the octree. ---------------------------------------------------------------------------- Processor and Compiler Comparison as of August 2001, by John Stone I have various new info on the compiler fronts. For x86 machines, I've run a bunch of benchmarks on Intel P4 and AMD Athlon machines, and have had interesting results. Along with this, I've tested gcc, the Portland Group compilers, and the beta Intel compilers for Linux. On the hardware front, for the testing I've done with Tachyon (John's ray tracer, source at http://jedi.ks.uiuc.edu/~johns/raytracer/), the Athlon machines have consistently annihilated the P4 machines with which I've compared them. My 1.2Ghz Athlon at home is still posting a SPD balls time of 2.6 seconds, versus about 3.0 for a 1.7Ghz P4 we have at work. I tested on a 1.4Ghz Athlon previously and got 2.2 seconds! From the testing I did recently, the PG compilers were good, but the most recent versions of gcc are now at about the same level of performance (I haven't tested recently with all combinations on the same box, but I believe my results indicate that if PG compilers are still better than gcc, then the margin has narrowed to about 3% so in performance. In all cases, the new beta Intel compiler has proven to produce code that runs *slower* than what I've been getting out of gcc, though since it is a beta compiler, I can't complain too much yet. This was true when testing the produced binaries on both an Athlon and on a P4, with Tachyon. We _have_ seen some performance benefits running some other codes, on the older generation Pentiums, but not yet on the P4. I suspect that Intel just needs a few more months of work before their new linux compiler is as competitive as we'd expect. I think the main advantages it has right now are much more sophisticated vectorizing features which benefit linear algebra oriented codes, but don't help ray tracers very much. (Tachyon has very few vectorizable loops presently, unfortunately...) Though the timings I have below span over a couple of months, none of the core code in Tachyon has changed for quite a while. ---------------------------------------------------------------------------- Triangles per Vertex, by Eric Haines I noticed a line in Peter Shirley's book "Realistic Ray Tracing" Typically, a large mesh has each vertex being stored by about six triangles, although there can be any number for extreme cases. This is true enough, but it goes stronger and deeper than that, and something I only recently realized (well, in 1998), so I'm passing it on here. It turns out that the Euler Characteristic precludes there ever being more or less than about six triangles per vertex for large meshes consisting entirely of triangles, surprisingly enough. The formula is: V + F - E = 2 (V=vertices, F=faces, E=edges) for a closed mesh (i.e. a solid object) without any holes (i.e. topologically equivalent to a sphere; though a donut merely changes the formula to V + F - E = 0; for an open mesh without holes it's V + F - E = 1). The formula holds in general, but we can constrain it when dealing with connected triangles in a closed mesh. If every entity in the mesh is a triangle, we know that the number of edges is going to be exactly 3/2 * the number of faces, since each face has three edges and each edge in a closed mesh is shared by two faces. Substitute: V + F - 3/2*F = 2 V ~= 1/2 * F (I'm dropping the "2" constant, since it's relatively unimportant for large meshes). We also know that each triangle has 3 vertices, so for the total number of vertices to be 1/2 the number of faces, each vertex *must* be shared by an average of 6 triangles. I love this result, as it applies to any mesh. You can draw the mesh so that one vertex is shared by many triangles, but then the rest of the vertices are shared by proportionally less, balancing out to 6 - there's no way around it. If the mesh is open, then the ratio will change, with the average number of triangles going down. Topologically, with an open mesh (one not forming a solid object but having some edges touching some single triangle) you can think of the mesh as solid, but including a polygon with many edges along the outside of the mesh. This many-sided polygon changes the ratio. [I wrote this note to Pete, and he had just learned the same fact the day before receiving my note! I figure if neither of us knew this until recently, then it was worth presenting here.] ---------------------------------------------------------------------------- Bounding Box Intersection via Origin Location, by Eric Haines Schmalstieg and Tobler had a clever article two years ago: Dieter Schmalstieg and Robert F. Tobler, "Fast projected area computation for three-dimensional bounding boxes," journal of graphics tools, 4(2):37-43, 1999. http://jgt.akpeters.com/papers/SchmalstiegTobler99/ The idea is that a bounding box can be thought of as six planes that divide space into 27 parts (nothing new so far, it's used in orthographic view volume clipping). Take the location of the eye and see which of the 27 parts it's in. For each of the 27 positions there is a set of faces visible from that position (1, 2, or 3; essentially which faces face that direction). Go a step further, though: in a table for the 27 entries you can do better than storing the faces that are visible. Instead, you store the list of box vertices, 4 or 6, that make up the silhouette outline of the box from that position. This silhouette list does not change within the volume of space (of the 27) that you're in. You can use this list of vertices to project onto the screen and directly compute the screen area of the convex polygon formed. I was wondering if this algorithm might be useful for ray/box intersection. I have not tried it out, but the idea's to use these silhouettes for 2D point in polygon testing, and also find the distance along the ray by using the two box corners that bound the ray's direction. There is only one closest (and one farthest) box vertex for a given ray direction, and the index of this vertex can be determined once for any given ray direction. This is an idea from shaft culling (http://www.erichaines.com), where the closest box vertex to a plane is entirely determined by the normal of the plane (and a ray's values do define a plane, since they both have a normal and origin). To begin, the closest and farthest vertex can be cast upon the ray's direction to find out how far these vertices are along the ray. This is simple: the vector from the ray's origin to the vertex dotted with the ray direction gives the distance to the vertex. If the ray's extent (and rays often are actually line segments, having a maximum distance to them, but "line segment tracing" doesn't sound as good as "ray tracing") overlaps the box's closest and farthest distances, then the ray is likely to hit the box. This is not a perfect test, but is conservative: the test will sometimes say there's an overlap when there's not, but will never miss a box that should be hit. If the ray survives this test, say the ray starts in a corner region. You now have a polygon outline. Instead of computing the ray's intersection with 3 slabs or 3 faces (the traditional ray/box intersection methods), form a plane at the closest corner that is perpendicular to the ray's direction. In fact, you have all you need of this plane already, as the ray's normal is the plane's normal and the plane's distance is actually irrelevant. The idea is to use the ray's direction to cast the ray's origin and the silhouette vertices onto a 2D plane. If the ray's 2D position is inside the vertex list's 2D polygon, the ray is definitely inside the box. It doesn't really matter what the frame of reference is for the 2D plane itself, anything will do. That's about it, and I hope it makes sense. To recap, you first look at the problem as how far along the ray is the box. If you survive this test, you see whether, looking along the ray's direction, the ray's origin is inside the convex polygon formed by the silhouette vertices. I have my doubts as to whether this test is any faster than traditional tests (the operation of actually casting onto a 2D plane looks to be slow), but I thought it was interesting that there was another way to skin this cat. This algorithm does have the interesting feature that it appears to avoid using division at all, it is mostly just dot products to cast vertices onto lines and planes. ---------------------------------------------------------------------------- END OF RTNEWS