_ __ ______ _ __
' ) ) / ' ) )
/--' __. __ , --/ __ __. _. o ____ _, / / _ , , , _
/ \_(_/|_/ (_/_ (_/ / (_(_/|_(__<_/ / <_(_)_ / (_
----
Here's a tech report that covers some of the ray-tracing-on-GPU stuff that
Rodrigo Toledo's been doing:
http://www.loria.fr/~levy/php/article.php?pub=../publications/papers/2004/ray_tracing_gpu
-- John Stone (johns@ks.uiuc.edu)
----
The Graphics Hardware 2005 conference had one paper of relevance to ray
tracing. See http://www.graphicshardware.org/program.html for the full
program, along with links to PDFs and slide sets.
KD-Tree Acceleration Structures for a GPU Raytracer, T. Foley, J. Sugerman,
presentation at:
http://www.graphicshardware.org/presentations/foley-kdtreegpu-gh05.ppt
----
GPGPU.org has lots of great rendering news relating to cutting-edge research
and the GPU. For example, included in the news area
http://www.gpgpu.org/cgi-bin/blosxom.cgi/Advanced%20Rendering/Global%20Illumination/
was the abstract of this master's thesis, among other things:
A Comparison of Acceleration Structures for GPU Assisted Ray Tracing
Recently, ray tracing on consumer level graphics hardware has been
introduced. So far, most published studies on this topic use the uniform
grid spatial subdivision structure for reducing the number of ray/triangle
intersection tests. For many types of scenes, a hierarchical acceleration
structure is more appropriate. This thesis by Lars Ole Simonsen and Niels
Thrane of University of Aarhus compares GPU based traversal of kd-trees
and uniform grids with a novel bounding volume hierarchy traversal scheme.
The three implementations are compared in terms of performance and
usefulness on the GPU. The thesis concludes that on the GPU, the bounding
volume hierarchy traversal technique is up to 9 times faster than its
implementations of uniform grid and kd-tree. Additionally, this technique
proves the simplest to implement and the most memory efficient.
http://www.larsole.com/files/GPU_BVHthesis.pdf
Ulf Assarsson comments, "they also present some GPU source code".
--------
A history of MAGI, the first production house to use ray tracing and CSG:
http://accad.osu.edu/~waynec/history/tree/magi.html
A number of its employees went on to found Blue Sky Studios. Note that this
page is part of a larger site giving a history of computer graphics:
http://accad.osu.edu/~waynec/history/ID797.html
Don't miss the extensive lessons section:
http://accad.osu.edu/~waynec/history/lessons.html
There's lots of other pages around with all sorts of things, e.g. check out
the index:
http://accad.osu.edu/~waynec/history/content-index.html
--------
By the way, there's a wikipedia entry for ray tracing:
http://en.wikipedia.org/wiki/Ray_tracing
Feel free to go fix anything you don't like (I did, cleaning out weirdness
about the definition of "backwards ray tracing"; asking what people think
this term means is like asking them where they put their matrix
translation values, in the right column or the bottom row, it's a
religious thing.)
There's also a nice page on POV-Ray:
http://en.wikipedia.org/wiki/POV-Ray
with plenty of worthwhile links.
--------
A fellow named Bruce Dawson of the Xbox advanced technology group has also
published some interesting papers of good practical alternatives to the
evil wicked bad epsilon tests:
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
-- Steve Hollasch (hollasch@acm.org)
[I should note that Steve's page,
http://stevehollasch.com/cgindex/index.html, has a bunch of useful links,
though it's starting to suffer a little from url-rot. That said, what is
nice is that Steve has stored away a number of worthwhile articles at his
site itself, so they won't get lost. - Eric]
--------
I released a small raytracer that now runs on smartphones at:
http://www.yasrt.org
- Emmanuel Viale (emmanuel.viale@wanadoo.fr)
----
It also had to happen someday, a ray tracer in Perl:
http://perl.plover.com/RayTracer/
----
And a ray tracer in six different languages, including LISP:
http://www.ffconsultancy.com/free/ray_tracer/languages.html
--------
As mentioned last issue, the LZW (GIF) patent has expired everywhere, as of
July 7, 2004: http://www.unisys.com/about__unisys/lzw/. I mention it only
because I ran across a wonderful passage at http://news.com.com/2100-1032-
1014236.html, worth recording for posterity:
But Unisys credited its exertion of the LZW patent with the creation of
the PNG format, and whatever improvements the newer technology brought
to bear.
"We haven't evaluated the new recommendation for PNG, and it remains to
be seen whether the new version will have an effect on the use of GIF
images," said Unisys representative Kristine Grow. "If so, the patent
situation will have achieved its purpose, which is to advance
technological innovation. So we applaud that."
--------
So what's your favorite mathematical theorem? Here's a list:
http://personal.stevens.edu/~nkahl/Top100Theorems.html
Personally, I love Pick's Theorem, it applies so nicely to pixel grids and
is surprising and elegant:
http://www.mcs.drexel.edu/~crorres/Archimedes/Stomachion/Pick.html
--------
Dictionary of Algorithms and Data Structures, from NIST:
http://www.nist.gov/dads/
--------
A wiki about computing effects of lighting:
http://lightingwiki.com/FrontPage
--------
http://webexhibits.org/causesofcolor/0.html
I like the concept of "15 causes of color". If we could only stamp out
just one of these causes in our lifetime! Actually, the homepage now talks
about just 3 causes of color: http://webexhibits.org/causesofcolor.
Anyway, a pretty site.
Interestingly enough, the Cornell Box gets used in real-life:
http://webexhibits.org/causesofcolor/1H.html, and you can even change the
lighting (in a javascripty way). Gosh, it seemed so much bigger in virtual
reality. (Actually, I was around when the first physical Cornell box was
made, around 1984, for perceptual experiments; I recall it was maybe two
or three feet high).
--------
A triangle mesh library that is GPL'ed and looks useful:
http://www.cs.princeton.edu/gfx/proj/trimesh2/
--------
I've heard good things about the Farbrausch demo tool, at
http://www.farb-rausch.com/ ("demo" meaning "those tiny graphics effects
programs set to music"). (recommended by Vincent Scheib)
When I tried, this site's download links failed. Some googling turned up
one that worked:
http://www.softonic.de/file.phtml?&id_file=35387&action=view&view=downloads
I looked over the tutorial briefly (run file werkkzeug1.exe). One cool
thing here is that they have a pretty elaborate way of generating
procedural textures (which is the key to having tiny demos, procedural
everything). The program looks quite involved, but just walking through
the tutorial gave me an appreciation of some of the elements that go into
making a demo.
--------
Just cool links (most of these are in my new public bookmarks list at
http://del.icio.us/erich666):
http://www.pacifict.com/Story/ - best skunkworks story ever. Ken Turkowski
comments on the product itself: "It's great! It's not just graphing, you
know. It's also algebra and various kinds of differential calculus. One of
the neatest things is direct manipulation on symbols, i.e. if you have an
equation and want to move a term from one side of the equals sign to
another, you just drag it over!"
http://www.krazydad.com/visco/ - a rainbox of sci-fi magazine covers. Cute.
http://www.michaelbach.de/ot - many great illusions here, old and new. My
current favorite mind-fry:
http://www.michaelbach.de/ot/col_lilacChaser/index.html
http://www.ianrowland.com/MiscPages/Mrangryandmrscalm.html - a new
illusion. Squint. (from Ronen Barzel)
http://www.gigapxl.org/ - Imagine a single camera that takes a photo that's
equivalent to 600 professional photos (or 10,000 TV screens) in terms of
resolution. Pretty wild; check out the website: photos of cityscapes,
which they then zoom in on a single window. (from Michael Cohen)
http://www.rendermagazine.com/ - Read the URL before clicking. What could it
be? Wrong. (from Neil Gatenby)
----------------------------------------------------------------------------
Newish Books, by Eric Haines
There are books, books, and more new books. Here's an abbreviated list of
ones that caught my eye this past year or so:
"Fundamentals of Computer Graphics" by Peter Shirley et alia,
http://www.amazon.com/exec/obidos/tg/detail/-/1568812698/realtimerenderin,
has come out in a second edition this year. Lots of new material (270
pages worth), and lots of error cleanup of the first edition. This book
seems likely to become the new "Foley & van Dam" for computer graphics
(though the field has grown so much since that book, "Computer
Graphics: Principles and Practice," that it's a different world). It
looked worthwhile from my brief skim of the Table of Contents. Currently
it is on the top of my pile of books, daring me to read it at lunchtime
instead of the backlog of Wireds and Computer Gaming Worlds clogging up
my valise.
"Graphics Tools: The JGT Editors' Choice", edited by Ronen Barzel,
http://www.amazon.com/exec/obidos/tg/detail/-/1568812469/realtimerenderin/,
is a collection of articles by the editors of the "journal of
graphics tools" of what they consider their favorite articles
published over the past nine years. This could be thought of as the
sixth "Graphics Gems" book, in some ways (since the goal of the journal
is to publish tools for computer graphics, like Graphics Gems). Yes, I'm
an editor; no, I don't get a dime on book sales. If you don't have back
issues of JGT available to you, I recommend it, as there are some good
ray tracing articles (and much else) in there.
"Foundations of Multidimensional and Metric Data Structures", by
Hanan Samet,
http://www.elsevier.com/wps/find/bookdescription.cws_home/706187/description#description.
Each time I've run into Hanan at SIGGRAPH he's been working on this book,
year after year. It's finally going to press in November. It looks to be
pretty darn comprehensive, with a number of data structures relevant for
computer graphics and for other multidimensional search and sort areas.
He says this book will actually complement, not replace, his (now out-of-print)
books on the topic from the late 80's.
Those are the ray tracing related books that come to mind. Of course, Pharr
and Humphrey's "Physically Based Rendering" book has been out for more than
a year now, and why don't you have one yet? See the last RTNv17n1 for my
brief review.
There were also some other graphics books that came out in the past year
that appealed to me, but since they're not ray tracing related I won't dwell
on them. "GPU Gems 2", edited by Matt Pharr (does he ever sleep?), has a lot
of chewy material. I particularly appreciated seeing articles like John Owens'
"Streaming Architectures and Technology Trends." Samples at
http://developer.nvidia.com/object/gpu_gems_2_home.html.
"Real-Time Collision Detection", by Christer Ericson,
http://www.amazon.com/exec/obidos/tg/detail/-/1558607323/realtimerenderin.
It covers the topic thoroughly, and keeps focused on doing this task
at interactive rates.
"Practical Linear Algebra: A Geometry Toolbox", by Gerald Farin and Dianne
Hansford, http://www.amazon.com/gp/product/1568812345/realtimerenderin.
This is essentially a second edition of "The Geometry Toolbox". It's a
nice, approachable book on linear algebra, with many illustrations. I
appreciate the authors' attempts to build up an intuition of what the
various equations mean. For most graphics professionals the book won't be
anything new until about Chapter 7, "Eigen Things" (and as I've said
before, how can you dislike a book with a chapter title like that?). The
only unfortunate bit is that the postscript files that go along with the
book are no longer available on the web, the URL is dead last I checked.
"High Dynamic Range Imaging", by Erik Reinhard, Greg Ward, Sumanta
Pattanaik, and Paul Debevec,
http://www.amazon.com/gp/product/0125852630/realtimerenderin. Coming out
soon. I've gotten a glimpse at Paul's section. Lots of practical advice
and theory all pulled together, putting in all those useful bits that
can't fit (and normally don't get presented) in articles; just what a book
is for.
I'm interested to page through "Real-Time Cameras", by Mark Haigh-Hutchinson,
http://www.amazon.com/exec/obidos/tg/detail/-/0123116341/realtimerenderin.
It is more for video-game programmers, discussing techniques of
camera control and response to occlusion from nearby walls, etc. I'm seen
this done poorly enough times that I'm interested to learn any knowledge he
has to impart.
"Advanced Game Development with Programmable Graphics Hardware",
by A. Watt and F. Policarpo,
http://www.amazon.com/exec/obidos/tg/detail/-/156881240X/realtimerenderin.
The first four chapters look to be a good treatment of theory and practice
concerning the GPU, then about a fifth of the book is focused on Policarpo's
Relief Texture Map research (which does look nice, but you might end up
learning more than you want about the technique). Then a quick effects chapter,
and the last half of the book is on animation and game engine architecture.
"Mathematical Illustrations", by Bill Casselman,
http://www.math.ubc.ca/~cass/graphics/manual/. Pretty wild book, mainly
talking about controlling Postscript by using its programming language,
including how to draw in 3D with hidden surfaces. It looks like a fun
way to learn about Postscript's power. Also, the whole darn book is free
online!
"Production Rendering: Design and Implementation" edited by Ian
Stephenson, http://www.amazon.com/gp/product/1852338210/realtimerenderin.
I haven't seen this one yet, but it's on my wish list; now you know what
to get me for Christmas. From one of the authors: "This is the first book
dedicated to teaching how to write your own RenderMan-compliant (or
similar) production renderer. It includes separate chapters on shader
compilers and shader interpreters, and presents a single architecture for
incorporating ray tracing and global illumination into a micropolygon
framework. It was written more or less equally by seven experienced
graphics software developers, each of whom have published a renderer. The
other authors: Mark Elendt (Mantra), Rick LaMont (RenderDotC), Jacopo
Pantaleoni (Lightflow), Scott Iverson (AIR), Paul Gregory (Aqsis), and
Matthew Bentham (RenderDrive)."
----------------------------------------------------------------------------
More on Randomly Sampling Bounding Volumes, by Mateu Sbert (mateu@ima.udg.es)
I've just happened to browse the Ray Tracing News, Volume 17, Number 1. In
the article "Randomly Sampling Bounding Volumes",
http://www.acm.org/tog/resources/RTNews/html/rtnv17n1.html#art5, is an
algorithm to obtain uniformly distributed lines within a convex volume:
1. Choose a random face based on area
2. Choose a random point uniformly on that face
3. Choose a random direction with respect to normal N
I think, for the sake of clarity, point 3 should be changed to:
3. Choose a cosine distributed random direction with respect to normal N
I'm sure this is what authors wanted to say, because (if I've checked
well) the line generation they give below the algorithm corresponds to
this cosine generation.
I have with my students already published in several places how to sample
global lines from bounding boxes. You can find it, for example, in:
Francesc Castro, Mateu Sbert, "Application of quasi-Monte Carlo sampling
to the multi-path method for radiosity", International Conference on Monte
Carlo and Quasi Monte Carlo Methods '98, Claremont (USA), 1998, published
by Springer Verlag
where I have described the above method, and more recently my tutorial
(with Andrey Iones and S.Zhukov) on Integral Geometry in Eurographics
2001, Manchester.
It can be easily proven that the selection of pairs of random points on a
sphere is a very particular case of the above technique.
----------------------------------------------------------------------------
Real-Time GPU Spheres, by John Stone (johns@ks.uiuc.edu)
I've been having fun with ray tracing again recently. I've been using
OpenGL programmable shading in the new version of the molecular
visualization tool I develop for a while, as a means of getting higher
quality rendering (fragment lighting and such), and very recently I
implemented a new shader that does ray tracing as a means of rendering
spheres. Rather than rendering the sphere using several hundred
triangles, my code draws the sphere bounding box, and the vertex and
fragment shaders collectively perform the necessary ray-sphere
intersection tests, Z-buffer depth evaluations, and shading calculations
necessary to mix the ray-traced spheres with polygonal objects drawn with
regular OpenGL operations. Here's a test image from one of my early
versions:
http://www.ks.uiuc.edu/Research/vmd/images/collections/openglshader/2004/
spheretrace.jpg
The ray traced spheres outperform polygonal spheres using the fragment
lighting in most cases, and they have the advantage of always looking
spherical in profile regardless of their on-screen size, which is nice
when you're displaying molecules.
One could ray trace any object inside of the bounding box, with just a few
changes to the code, but spheres and cylinders are the main object types
of interest in molecular visualization. With some work, I think I can
gain another 2x or more in performance, which would give the ray traced
spheres a significant advantage over polygonal spheres. My plan is to
change the code to use a billboard rather than a full bounding box. The
billboard would always be oriented so it's normal aligns with the view
direction, and with this I should be able to eliminate the 2-times
overdraw that occurs with the axis aligned bounding box method. Beyond
that I think I can also make the shaders generally more efficient with a
little work.
Once I'm done with the spheres, I'll implement cylinders, and after that
I'll probably write a ray-casting volume renderer... Exciting stuff!
[Unlike other similar work I've seen, this one actually has perspective
distortion, i.e., it's not just an orthogonal projection of the spheres. -
EAH]
Yeah, in fact my code has to work for both perspective and orthographic
projections since this is going into a real application and not just a
cute one-time siggraph demo or something :-)
I'd been planning to do some sort of sphere rendering trick using GLSL for
about 6 months, and finally got all of the kinks worked out of my main
GLSL shader (lots of NVidia, ATI, 3DLabs driver bugs to sidestep along the
way). Earlier this fall I was thinking I'd do something much simpler than
this, but I decided to give ray tracing a shot after two of my colleagues
mentioned they'd been working on similar things with decent results.
Russell Taylor wrote a nice short paper on rendering non-polygonal objects
with programmable shading:
ftp://ftp.cs.unc.edu/pub/publications/techreports/04-023.pdf
Directly Rendering Non-Polygonal Objects on Graphics Hardware using Vertex
and Fragment Programs, Russell M. Taylor II, TR04-023, UNC Chapel Hill
Dept. of Computer Science, 2004.
He's got a more sophisticated implementation than I have, though I think
mine is faster (as a result of simplicity) and better suited to my
particular purpose since my implementation is very closely tied to my
application. Russell is using Cg for his stuff. Rodrigo Toledo at
INRIA/Nancy has a cool piece of code he's written for ray tracing large
factory models using the ARB vertex/fragment shader stuff. His
implementation ray traces spheres, cylinders, and torii, and he's spent a
lot of effort on tuning his performance as well as getting good results
with the limited floating point precision in some programmable pipelines. See
http://www.loria.fr/~levy/php/article.php?pub=../publications/papers/2004/ray_tracing_gpu
[I commented: It's a kind of wild approach; does it work for anything but
bunnies and other soft stuff without hard edges?]
Their method definitely works for much more than bunnies. :-) Their
approach is very similar to what I'm doing, but they've done some extra
work exploring the cost of the "empty" parts of the bounding geometry,
point, or billboard. Rodrigo has a nice demo which renders a huge factory
model interactively using this ray tracing technique. His large model is
rendered with spheres, cylinders, torii, and triangles as I recall. He
and I are planning to see if we can create a molecular surface rendering
implementation based on this same approach, though that's a much more
complex problem for various reasons.
----
I asked John if anything has happened since he wrote me (which was around
January 2005). He replied:
In the months that have passed since our emails, the code I wrote about
was released in VMD 1.8.3 and a large number of my VMD users benefit from
the GLSL-based GPU sphere ray tracing when making screenshots of molecules
or quick movie renderings when they don't want to take the time to run one
of the various software renderers that VMD can export molecular scenes to.
As is always seemingly the case, the final version of my GPU sphere ray
tracing code that's in the shipping program runs a bit slower than the
development version did because I had to add various workarounds for bugs
in the GLSL implementations that are in the field. Also, my initial
shader implementations didn't have all of the 3-D texturing code and other
features that the final version has. The performance is good, but could
still be improved if I broke the implementation into separate shaders for
perspective and orthographic projections, 3-D texturing on/off, etc. Now
that there are better shader performance analysis tools coming out for
GLSL and the drivers are much improved, I plan to revisit the performance
issues in my implementation and see what I can do to improve on it.
The fastest ray tracing implementations I've seen to date were hand-coded
GPU assembler programs that take great pains to do just the right things
for the target video board. My implementation doesn't beat these, but
it's flexible and the ray tracing is only part of what the shader is doing
so it's hard to compare apples-to-apples when I look at my implementation
versus the others that are out there.
I've recently gotten various queries from a few CS grad students
interested in working on shaders that draw perspective correct spheres in
other ways, it'll be interesting to see what other GPU methods people come
up with.
I had hoped to write a GPU-based ray casting volume renderer by this time,
but other development tasks have distracted me from my shader work
recently and I probably won't get that done until next year. Maybe
that'll make an interesting topic for a future RTN if I do anything
interesting/unique there.
----------------------------------------------------------------------------
Image Comparison, by Chris Cason (Chris.Cason@povray.org), Eric Haines, and
Greg Ward (gward@lmi.net)
We're currently part-way through the submission of POV-Ray as a SPEC
benchmark. One of the things the SPEC committee seem to want is a way to
quantify if the output of the benchmark on a given platform is 'valid'. By
'valid' I mean that if it is not too 'different' from the reference image.
There are some problems with rounding on some platforms, depending on the
compilation options, because we use double-precision to integer
conversions in our 3d noise hash function.
Of course the question is, how do we define 'too different'? We could
simply say that if any pixel is different by any amount, then it's counted
in the total of different pixels.
However I am wondering if there is a more formal way to evaluate the
differences, e.g. by looking at how much a given pixel differs from the
reference pixel, and then somehow summing this into some sort of result.
Do you know of a specific means of doing this which can give us a number
as a e.g. 'percentage difference' or something, and which has (or would
have, if we used it) reasonable industry acceptance?
----
Eric replied:
Usually a "sum of squares" approach is the simplest method I know,
i.e. take the difference between what's expected and what you got and
square it for the pixel, add these differences up to get a total. I
don't know if there's a formal way of expressing this; Greg, do you?
Greg might also be able to recommend a better metric. For example,
Visual Difference Predictor, a metric Scott Daly came up with that
seems to have good correlation with the perceptual difference (vs.
pure numerical difference) of two images. I don't think this measure
is appropriate for your task, you probably want something that's
simpler since your goals are more conformance testing of the code than
actual visual result. Anyway, Greg, is there some "official" way of
expressing or normalizing some simple metric like distance-squared for
image comparison?
----
Greg Ward replied:
Mean squared error (MSE) is frequently used to compare images, though it
has a poor correlation to visible differences. A better pixel-wise
comparison is to use CIE Delta E* 1994, which is a complicated but
manageable formula to compute color differences. Better still is Scott
Daly's VDP metric, though it is rather challenging from an implementation
standpoint. (I don't have an implementation myself, else I'd share it
with you.) As Eric points out, this may be overkill for conformance
testing, anyway, though you might consider taking the maximum error rather
than the mean if you are looking for bugs. The only problem is if the
round-off concerns the locations of samples, in which case your maximum
error could be enormous. The eye doesn't care if an object moves one
pixel to the left, but any pixel-wise comparison will show this up as a
huge difference.
Your biggest problem may be with your random number generator, which you
may wish to disable or replace with a low-discrepancy sequence. Even then,
if POV-ray is like Radiance, round-off error in a test to determine which
rays to trace can throw everything off if the random numbers are a simple
sequence, since you will get out of step between machines. All
correlations are lost at that point. That's why I would recommend either
turning the random number generator off (i.e., returning 0.5 every time)
or keying the sequence off pixel location or something repeatable.
----------------------------------------------------------------------------
Plane Equations of a Tetrahedron, by Paul Kahler (phkahler@wowway.com)
How to find the plane equations of a tetrahedron given the vertices.
There is an extremely elegant way to solve the problem. Please put this
in RTN.
1) A point/vertex can be represented by a column vector [x,y,z,1].
2) A plane equation ax+by+cz+d=0 can be represented by a row vector
[a,b,c,d].
3) A point can be plugged into (tested in) a plane equation by taking the
product of these (plane*point).
4) A tetrahedron can be represented by a 4x4 matrix whose columns are its
vertices.
5) A tetrahedron in this point-form can be transformed by multiplication
with standard transform matrices.
6) A tetrahedron can also be represented by a 4x4 matrix whose rows are
the plane equation vectors.
7) A tetrahedron in this plane-matrix form can be transformed similar to
the point version, as I recall, you need to use the inverse transform and
perhaps multiply on the other side?
Now the magic:
8) The inverse of the point-tetrahedron-matrix is the plane-tetrahedron-
matrix. You can verify this by plugging the points into the resulting
plane equations. All 16 combinations can be done by multiplication of the
plane and point matrices. Since the result is an identity matrix (by
definition of the inverse) we see that 3 points lie on each plane. We also
see that the plane equations are scaled such that the point not contained
in the plane produces a value of 1 when taking the product. This indicates
that the plane normals (a,b,c of each row) are pointing inward.
I believe two more things happen that are really nice:
9) The barycentric coordinates (w,x,y,z) of a point inside the tetrahedron
can be determined by multiplying the plane-tetrahedron-matrix by the
point. Obviously if the point is on one of the faces, one of these
coordinates is zero. I believe the plane equations are such that it all
just works out - you get your point as a linear combination of the 4
vertices.
10) All of this works in 2D as well.
I suspect the points can be found from plane equations in a similar way,
but they would not have a W coordinate of 1 unless you scaled the plane
equations properly first - or divide by W after. Obviously with the
scaling above, the points are simply the inverse of the planes.
So my solution to the original question: Throw the points into a matrix
and invert. Beautiful? Profound? Decide for yourself, but I think so.
--------
I passed this on to Pat Hanrahan (hanrahan@cs.stanford.edu), who comments:
That's a neat result that I am well aware of. I use the 2D version all the
time. Projective geometry and homogenous coordinates are sooo cool!
One minor point. Wherever he says "inverse" you can use the adjoint. That
makes the geometry particularly elegant since the adjoint is a co-factor
and that has a simple geometric interpretation. Also, the adjoint always
exists.
Here is a slide that I go over in class that somewhat illustrates these
ideas:
http://www.graphics.stanford.edu/courses/cs348b-04/lectures/lecture2/walk013.html
--------
Paul comments:
Pat is correct that you can use the adjoint instead of the inverse, the
division just multiplies through all 4 plane equations by 1/determinant.
However the scaling that you get from the inverse is very interesting. I
believe you'll only have a zero determinant if the tetrahedron is
degenerate (all points in a plane) and I suspect the determinant is
actually (equal?) closely related to the volume of the tetrahedron, which
means an inverse will by fine for any meaningful tetrahedron. It's not
much more math when you have the adjoint already.
I have considered some interesting uses for ones that are not well formed.
For example, it should be possible to negate the '1' of one vertex to
create a triangular viewing frustum - truncated by the tetrahedron. This
is really why I wanted to find a nice tetra-tetra intersection method.
Beam tracing then becomes a bunch of tetrahedron intersection tests.
----------------------------------------------------------------------------
Ray/Tetrahedron Intersection, by Paul Kahler (phkahler@wowway.com)
[Paul followed up with this article. It's fairly standard, basically the idea
is to consider the tetrahedron as a set of planes. My old ray/polyhedron code,
at http://www.acm.org/pubs/tog/GraphicsGems/gemsii/RayCPhdron.c, does the same
thing. However, this article dovetails nicely with Paul's previous article, so
here it is. - Eric]
The ray tetrahedron intersection can be done as follows:
1) Put all vertices in as columns in a matrix. (Tv for tetrahedron
vertices as columns)
2) Invert to get Tp (Tetrahedron plane equations as rows)
3) Using the ray equation R=P+Lt plug R into the plane equations:
Tp*R = 0 (where * is product and R is a column vector)
Expand:
Tp*P + Tp*Lt = 0
t = -(Tp*P) / (Tp*L)
Now this is really not quite standard notation, t in this case is a 4
vector containing the distances to the 4 planes. You need to do the divide
as 4 separate scalar divisions, the matrix notation is just a handy way to
do 4 dot products at a time.
Now given the 4 distances, you may treat the tetrahedron as the
intersection of 4 half-spaces defined by the planes. The ray will enter or
exit each half space at the point of intersection. We can determine
whether it enters or exits by taking the dot product of the surface normal
and the ray direction; this has already been done as Tp*L. Because it is
a convex object, we can take the farthest distance to an entry point and
the closest distance to an exit point (this defines the extent of the
intersection). If the exit is beyond the entry point, there is no
intersection. Or, if ANY exit point lies in front of ANY entry point,
there is no intersection. This test feels like there should be some sort
of handy matrix method behind it, but I don't know.
I believe this method requires fewer calculations than Pluecker coordinates
(not by much) but it does use division, which I suspect can be
eliminated. Storing the inverse matrix requires less storage than Pluecker
coordinates, too. In fact, because of the relation between the matrices,
you may choose to only store one of them (point or plane) depending on how
you expect them to be used the most.
I didn't like the Pluecker coordinates method - it is what I expected after
reading the ray-triangle intersection. Very mechanical /procedural. This
matrix method suffers from the same ugliness in the final part where you
compare a lot of distances and check a bunch of signs. Not only does all
that condition checking run slow, it's not very pretty. I like things like
the ray-sphere intersection where the sign of a calculation tells you if
they intersect or not. I still think ray-tetrahedron may have such a test,
and that it will involve a matrix representation of the tetrahedron. But I
have not found any such result... yet.
----------------------------------------------------------------------------
K-d Tree Comments, by Ingo Wald (wald@sci.utah.edu), Gordon Stoll
(gordon.stoll@intel.com), Vlastimil Havran (havran@mpi-sb.mpg.de), and
Eric Haines
[I wrote about k-d trees last issue, having had fun with Perl programs to
see how these things are best split under various circumstances (see
RTNv17n1,
http://www.acm.org/tog/resources/RTNews/html/rtnv17n1.html#art8). I did
this mostly for my own entertainment, as it's a nice little problem that's
fun to think about. It turns out I had been reinventing the wheel (though
one or two of my results are new, AFAIK). Vlastimil Havran let me know he
wrote about this extensively in his thesis, which can be downloaded at
http://www.cgg.cvut.cz/~havran/phdthesis.html. Check out pages 49-91 in
particular. I admit I haven't read his thesis, and I should! (Well, for
this article I at least did a skim...). There's lots of interesting bits
in there, judging from talking with him and others at SIGGRAPH. - EAH]
Ingo Wald (wald@mpi-sb.mpg.de) writes:
There's a thick chapter about the cost function for splitting a BSP
node in Vlastimil Havran's PhD thesis, and more recently (though thinner)
in mine ;-) (http://graphics.cs.uni-sb.de/~wald/phd.pdf).
The same ideas can also be used for accelerating photon mapping, see
http://www.sci.utah.edu/~wald/Publications/webgen/2004/VVH/download//vvh.pdf,
a paper we've just had accepted at this year's (2004) Eurographics.
There is a more optimal algorithm for building the k-d tree. Sorting in
every step - as you correctly mention - is O(n log n), so the total
complexity is O(huge). You can, however, build the entire tree in O(n log
n), which is asymptotically optimal. The algorithm for that is sketched in
the vvh paper I've pointed you to - it's for points only, but applies
(almost) the same for non-point primitives.
On the article itself:
I personally find the argumentation with the four probabilities (Pa, Pb,
Pab, Pba) a little bit confusing, and overly complex. The original
argumentation with C = C_trav + C_isec( P(a)C(a) + P(b)C(b)) is much
easier to understand and implement. Thus, I don't hope anybody wants to
implement the approach based on the much more complex formulae used in the
text. On the other hand, that way of calculating has brought up some
interesting facts that I did not know myself before, either. Like, for
example, that the probability for hitting the plane doesn't change
wherever you put it, or that in the one example it doesn't matter where
exactly between 0.4 and 0.6 you place the split plane (which was
completely surprising to me, and I'm still sure it won't hold if you
consider the next-level split).
----
I have a bunch of other things to say, some of them even interesting:
At SIGGRAPH 2005 in the Real-Time Ray Tracing course, Gordon Stoll gave a
fascinating talk about how to make an extremely efficient ray tracing.
This presentation, and many others, is available at the OpenRT project
site: http://www.openrt.de/Siggraph05/UpdatedCourseNotes/course.php. I
didn't expect too much from the talk: something about using SSE for
shooting four rays at a polygon, some talk about cache misses, the usual
new tricks for wringing more performance from the CPU. These techniques
*were* covered, and it was impressive how much they could help. For
example, due to ray coherence, about 95% of the rays in one of these SSE
packets actually all had the same result (all hits, or all misses), so the
gain from SSE was 3.5x or more. The Saarbrucken folks have said about the
same thing for some time, but it was nice to hear this independently
verified. Squishing a node of a k-d tree into 8 bytes resulted in a nice
speed-up due to fitting the cache nicely. But he went on from there, with
tip after tip. Philipp Slusallek (who has done much of this original
research in the Saarbrucken group with Ingo Wald, Vlastimil Havran, and
others, and who has a company on the side, http://www.intrace.com/, that
is selling such a ray tracer to VolksWagen and others) half-jokingly
commented to me, "He's telling all the secrets!" Other than more CPU
tricks like stressing memory coherence when making the k-d tree
acceleration structure, avoiding recursion and stack operations, etc., the
interesting thing to me was how he stressed doing locally optimal splits
of each node in the tree.
Stoll confirms what Havran and others have researched: this sort of local
optimization makes a huge difference in the speed of ray traversal. This
is a non-obvious result, however: on a ray tracer that hasn't had a lot of
optimization of its various elements, it's a bit hard to tell what's
really worth doing. The cost function's fixed cost values (cost to
intersect a node, to traverse a child, to intersect a polygon, etc.)
affect the cost function itself. So faster results on an experimental ray
tracer can sometimes be misleading. For example, say your ray/polygon test
is terrible compared to the rest of your code. In this case, algorithms
stressing avoiding testing a ray against a polygon will be better for your
system, but only because your original code stinks.
Stoll's group has worked very hard to optimize all the critical elements
of their ray tracer. So hearing that a better k-d tree really makes a
difference is exciting to me; this was supposed to be a solved problem
back in 1992, that you just use some sort of simple BSP-tree (or nested
grids, or bounding volume hierarchy) and it'll be good enough. Not true:
more optimization can really matter, it gives a tree that's several times
faster than a mediocre tree. Nice! The locally optimal trees created are
quite deep (50-100 levels), many levels go by where the various splitting
planes don't so much split as chop off hunks of space from a node making
it bound the stuff inside the node more tightly, giving "stringy" trees
where only one child continues to get subdivided, and child nodes ending
with 2 triangles per cell. Another interesting tidbit of Gordon Stoll's
talk: lazy evaluation is not such a good idea, since it ruins the cache's
performance.
My observation on this optimized process is that it reminds me of bounding
volume hierarchies. You can sort-of think of each splitting plane being
generated as adding a plane of a bounding box. For example, imagine a
trivial scene: two identical blobs some distance apart along a single
axis, say the X axis. The global bounding box, in this case, has four
sides, the Y and Z faces, that touch both blobs; each blob touches a
single X face:
+-------------------------+
| # # |
|#### #### |
|# ## # ##|
| ### ### |
+-------------------------+
X axis-->
With bounding volume hierarchies say a bounding box is put around each
blob. So the bounding volume hierarchy is one global box containing two.
What optimized k-d trees do in this situation is to find the edges of
these two blob objects, which is where you want to test to put planes.
There are only two planes that need to be checked, with these planes
corresponding to the "new" faces of the blob bounding box hierarchy:
+-------------------------+
| # | | # |
|#### | |#### |
|# ##| |# ##|
| ### | | ### |
+-------------------------+
X axis-->
A trivial example, but what's interesting here is that the k-d tree and
the bounding box hierarchy have test planes in the *same* locations. The
k-d tree is more memory efficient, as you've had to add two just two
nodes, each with one splitting plane. For the bounding volume hierarchy
you've had to add two complete bounding boxes. From an efficiency
standpoint, k-d trees appear to have an advantage. Both k-d trees and BV
hierarchy have to test the topmost global box. But then the computations
for k-d trees consist of (at most) two cutting planes; for BV hierarchy
you have to test two whole bounding boxes. Essentially, k-d trees, in this
(contrived) case, have the advantage of coherency: 5 of the 6 planes
forming each child bounding box are shared by the global bounding box. So
k-d trees just have to test the sixth bounding box face of two boxes,
while bounding volumes don't get to share intersection results with their
parent.
You could do a similar analysis on this situation if the two blobs were
displaced by an X, Y, and Z offset. In this case we've had to make six k-d
tree cutting planes. Now it's less clear that k-d trees beat bounding
boxes: sure, the two child bounding boxes have 12 planes total to test,
and k-d only 6, but there's a bit more overhead in dealing with each plane
with k-d trees.
Anyway, there are certainly many differences between k-d trees and
bounding volume hierarchies, but I thought it interesting how they're
similar in that the optimal k-d trees put planes in the same places that
bounding boxes do, etc.
There's still more to be done in the area of efficiently forming the k-d
tree, e.g. how to best form the upper levels of a BSP tree quickly is not
solved (and I think there's probably no one good answer; it all
depends...). Stoll pointed out that using Level-of-detail representations
in ray-tracing is another tantalizing area that turns out to be hard to do
in some use cases.
To address this first question, efficient k-d tree creation, there are
basically two approaches I know of: sorting and binning. Say you have a
million object database. It's a fair amount of work to sort the million
objects, then test to find which cutting plane (along each of three axes)
gives the best cost function. Recall that every object has two possible
cutting planes to test for it, at its extents along each axis, so this
gives 6 million planes that would need to be tested. Once you split, you
now have two nodes with around half a million objects each, and each of
these nodes then potentially needs 3 million plane tests to find the
optimal split; 6 million tests again for a single level. Even if you use a
rule of thumb (which Havran shows is worthwhile) of testing for splits
only along the longest of the three axes, this is still 2 million plane
tests per level. Havran also points out that you can reduce this by
another factor of two by finding the object median plane (the plane that
splits the set of objects' centroids in the middle) and then testing only
the side of the object that is closer to this median. You're down to a
full sort and then a million plane cost tests per level, still a
considerable number.
So one alternative is binning, where you simply run through the objects'
centroids and toss them into a regularly-space set of bins. For example,
you can record each object's extents and what bins are overlapped. You can
then check just the boundaries between the bins to see which of these
planes is best. You've limited the number of plane tests you do, and
you've avoided a sort. Gordon Stoll talked about this idea briefly during
the talk and afterwards in discussions. My discussion of binning here is
an "off the top of my head" idea, and I'm guessing others have studied
this more thoroughly (though I don't know of any published results).
Anyway, I believe more research could be done here. The tradeoff in any of
these systems is what the cost is in forming the tree vs. the speedup
gained from the tree. This obviously depends on the system's use: if you
plan on shooting more rays, then more time spent forming a good BSP tree
may be of use. Having not done any research in this area, I'm sure I'm at
the baby steps, don't know much stage of understanding what's sensible to
do.
----
The original version of this text had a discussion about Smits great
article on ray tracing optimization techniques, at
http://www.cs.utah.edu/~bes/papers/fastRT/. The idea itself is at
http://www.cs.utah.edu/~bes/papers/fastRT/paper-node12.html. The concept
for bounding volume hierarchies is that you want to not bother storing the
hierarchy as a hierarchy in memory, but rather as a long list of boxes and
surfaces to be intersected. This list contains the whole hierarchy and all
objects, in depth-first order. If you hit the box, you simply go test the
next object on the list, be it a box or object, as this is the first child
of the box. Each box has a single skip pointer: if you miss a box, you use
the skip pointer to find out what the next object is in the list that
needs to be tested. By arranging the hierarchy in this way you avoid
recursion, you minimize storage, and you get excellent memory coherence
among nearby objects. The cost is that you have to zip through the
original tree and convert it. But this is done only once, vs. having to
access parts of the original, inefficient tree for every ray.
About two minutes after sending my text out for comments, I realized this
sort of thing does not apply to k-d trees, which (if created depth-first)
already should have memory coherence of nearby nodes, and which have an
order of evaluation which is view dependent (vs. bounding volumes, which
normally do not, and so can always be evaluated in a particular order).
----
[...Much give and take among Ingo Wald, Gordon Stoll, Vlastimil Havran,
and myself omitted here...]
----
Eric writes:
Here’s my other random thought for the day: let’s say when you found that
a ray pierced the cutting plane of a kd-tree that you always traversed the
left one first (regardless if it’s a min or max). This of course makes no
sense for eye/reflection/refraction rays, since the power of the kd-tree
is that ray walking is implicit. But for shadow rays, which can often make
up 80% or more of the rays shot in a scene for a classical ray tracer, you
just want to find any hit, not the closest hit, so traversal order doesn’t
matter. I don’t know if there’s any possible advantage to doing this with
kd-trees, it really seems like there isn’t, but thought I’d throw it out
there. Maybe one less compare is needed? Anyway, this is then more like
Smits’ “same order each time” traversal idea.
----
Gordon Stoll replies:
I've thought about this a little, mostly thinking about cache performance
for shadow rays in general. There are a bunch of different possible
orders that you could use for shadow rays. You could do light-to-
hitpoint, hitpoint-to-light, fixed order, or even SAH-based probability
order. Unfortunately I can think of good and bad cases for all of those,
depending on the scene and whether you do smart light culling.
Hitpoint to light is probably a must if you have a big scene with
occlusion and don't have smart light culling. Otherwise light to hitpoint
might be good if you have a lot of occlusion by the lights (like
fixtures). SAH (surface area heuristic) probability order might be
interesting, but who the heck knows how that would behave or how you would
do it, or how much overhead it would have.
Oh, one other comment about the flattened BVH traversal that might not
come through from Smits's writeup. The traversal is *so* simple that you
might profitably eliminate at least one of the unpredictable branches from
the inner loop (converting into indexed or conditional moves). Could be a
significant win, and maybe more so for the SSE packet tracing stuff. I
vaguely remember that the Saarland guys tried to do this for kD-tree
traversal way back when, but there was too much overhead for it to be
profitable.
-----
In the rundown above I wrote:
> lazy evaluation is not such a good idea, since it ruins the cache's
performance. ...
Gordon Stoll comments:
Hmm...don't know if I'd put it quite that way. I guess the real point is
that the effect of lazy evaluation is...non-obvious :).
Looking at a straightforward depth-first kD-tree builder, think about the
"frontier" of nodes where the data needed to pick a split is equal to the
size of the cache (whatever cache). During the course of building, there
are piles of cache misses when you're working on nodes above that frontier
in the tree. Whenever you cross that frontier, there are *zero* cache
misses for the entire sub-tree until you pop back above the frontier.
Deciding to be lazy and stop what you're doing when you're below the
frontier can only hurt your total memory traffic (can't get better than
zero :).
It might not hurt as much as that implies, though. Say a ray hits a chunk
of the tree that's below the frontier (i.e. smaller than the cache) for
the first time in the course of the tracing. In the non-lazy scheme, the
data is probably not in cache because the whole tree was built up front.
So you get some misses pulling in tree data. In the lazy scheme, the data
needed for building is not in cache, so you get some misses pulling that
in -- but when you're done the finished tree data is right there in the
cache :).
This also still leaves the possibility of being lazy above the frontier,
or changing the way the builder accesses memory in general, or doing lazy
evaluation for computation rather than bandwidth purposes.
----
Ingo Wald (wald@sci.utah.edu) replies:
There're two important points about building kd-trees lazily.
1.) Assuming high occlusion / depth complexity and/or lots of triangles
outside the view frustum (in other words: we're seeing only a fraction of
the triangles), then lazy building will build only a fraction of the kd-
tree. and this will most likely more than counter the effects of a few
cache misses here and there, even *if* those should increase. Remember one
important point: the finest level of the kd-tree has as many nodes as all
the other ones above summed together! Thus, the impact of not even
building parts of the kd-tree (which will work best at the leaves) could
potentially be pretty high.
1.b.) The same actually applies to distributed rendering: in lazy
building, each client could/would build different parts of the tree, so
this would even kind of parallelize the building process (including
distributing the cache misses over all clients).
2.) You have more information about the number of rays. Thus, you cannot
only build the kd-tree *lazily*, but even *adaptively*. E.g., in the usual
cost function, one argues about the "probability" of rays hitting a voxel,
and use that probability to compute a estimated cost. Once you build it
hierarchically you actually have some hands-on data on where the rays
actually go. And of course, you could use that to decide a.) whether to
subdivide a node at all (for a single active ray that probably won't make
sense), and/or b.) where to put the split plane(s).
Thus, I think there is some really high potential in building kd-trees
lazily. I just didn't find the time to really implement it :-(
----
We talked a fair bit about ray tracing efficiency research progress in the
past 15 years. Here are some excerpted comments:
Vlastimil Havran (havran@mpi-sb.mpg.de) wrote:
In between there are more than 15 years of hardware improvements, so the
estimated speedup from hardware advancements should be about 1000,
according to Moore's law.
----
Gordon Stoll replied:
Ah, this is something I've thought about a lot. Fifteen years ago, 1990,
would be (I think) a 33MHz 80486. That had onboard floating-point, which
is obviously important for RT. The actual semiconductor scaling trend
(including both the number of transistors and their switching speed)
predicts an improvement of 3,125 times since then. Do you think that the
core CPU performance of RT has actually gone up that much?
My guess is that it didn't, and that instead RT fell off of the
performance curve right around then because RT breaks pretty much
everything that we've put into CPUs since then (deep pipelines with branch
prediction, superscalar, out-of-order, SIMD/vector operations, etc, etc,
etc). I've been thinking about trying an experiment or doing a literature
survey to try to support this, but for now it's just a crackpot theory.
Maybe I'll try to convince Eric to figure it out. :)
Things are really good right now as far as this goes. The code is getting
friendlier for the architecture (things like the packet tracing
technique), and the architecture is getting friendlier for the code
(shorter pipelines, multi-core, multi-threaded).
----
My wrap-up:
One other topic of conversation was about MacDonald and Booth's paper
"Heuristics for Ray Tracing Using Space Subdivision" in Graphics Interface
'89. The two bits that were interesting to me are related, in that they
both involve the dissemination of information. First, this paper spends
some time establishing the fact that the surface area of a convex object
is directly related to the chance it will be hit by a ray. They use
empirical testing to show that there is a correlation. This was
unnecessary work, from a god's-eye view, as this result was known for some
years. For example, in 1987 Goldsmith and Salmon cite L. Stone, "Theory of
Optimal Search", Academic Press, 1975, pp. 27-28, as their justification
for using the surface area. However, this fact hadn't really percolated
out into the field of computer graphics until some time later. For
example, Jim Arvo presented this fact and its proof, usually referred to
nowadays as the surface area heuristic (SAH), in some SIGGRAPH 90 course
notes (in http://www.ics.uci.edu/~arvo/papers/MetaHierarchies.ps), see
http://www.acm.org/tog/resources/RTNews/html/rtnv10n1.html#art4. As
mentioned earlier in this issue, the Discrete Differential Geometry course
presents this result in a formal framework of measure.
The point is that SAH had been known by some people for some time, but it
took a good long while for this fact to filter out to the general graphics
community, so it's hardly MacDonald's fault that he did not know of it at
that time. The other interesting bit in MacDonald's paper relating to
information dissemination is his own presentation of results. On the ninth
page of his article he writes, "the arbitrary acyclic algorithm performed
best, providing up to three orders of magnitude reduction in the number of
objects tested for intersection." He even shows, in a log graph, how much
better this algorithm is for one test scene. How significant is that?!
He's found an algorithm (essentially what Gordon Stoll and others have
also found to be great: placing kd-tree cutting planes in locally optimal
locations) that works up to a thousand times faster on a particular type
of scene (many small objects uniformly distributed). But then he gets
distracted by his other test scenes where this algorithm does not work as
well. However, these scenes are entirely unlikely, pathological cases:
large overlapping objects filling a space. So he doesn't really realize
he's found a critical result, and in fact spends some time trying to find
a hybrid algorithm that will handle all his test scenes well, reasonable
and pathological alike.
My point here is not to beat on MacDonald; he was a master's student in
math who was doing his best to complete a thesis project, so did not have
time to create interesting test scenes (ahhh, too bad he didn't know about
the Standard Procedural Database...). He found a great result, but didn't
fully realize what he had found. What's interesting is how long it has
taken for this result to percolate out. Nowadays if a new ray tracing
algorithm shows a 2x speedup, it's a worthwhile result. Here MacDonald
gets up to a 1000x speedup in some cases and buries the result on the
ninth page of sixteen page article, because it worked for only two of his
five test scenes (though for two of the remaining scenes he was getting up
to a 10x improvement). The conference was relatively unknown and the
proceedings not easily available, there were no pretty pictures in the
article, and the most significant result is well-buried, never mentioned
in the abstract, introduction, or conclusions. Amazing, really, that this
result ever surfaced again at all. Heck, I certainly missed it! I mention
his paper in
http://www.acm.org/tog/resources/RTNews/html/rtnv8n2.html#art6, noting
that he explored this area, but never really put two and two together (or
perhaps taking 5 and subtracting the 3 pathological scenes). And MacDonald
was not the only person to explore this area, Fussell and Subramanian also
did so, but their work never made it out past a technical report:
D. Fussell and K. R. Subramanian, "Fast ray tracing using K-D trees,"
Technical Report TR-88-07, U. of Texas, Austin, Dept. Of Computer Science,
Mar. 1988. http://www.cs.utexas.edu/ftp/pub/techreports/tr88-07.pdf
Their conclusion was that this method was one of the fastest known,
comparable to Arvo & Kirk's 5D method (a scheme that works well for
coherent rays but that bogs down otherwise, and has seen little use). But
they do not show their scheme was significantly faster than other schemes
overall. Is it because of coding style, or the processor type, or that
they used different termination criteria? We're unlikely to ever know.
The takeaway for me is that first discovering a result is important, but
interpretation of the significance and validation of that result is also
equally valuable. MacDonald found a useful result, but his test scenes
misled him to think the result was not of interest. Also, though his
advisor is well-known, MacDonald was working on his ray tracer on his own.
It is very difficult for a reader to evaluate the worth of any efficiency
work done by one coder in isolation: is his speedup due to a better
algorithm, or just better coding of the new work? As I mentioned before,
this is why I think Gordon's work is significant, as his group is
optimizing every element they can, and working with realistic test data,
so making their results more solid.
----------------------------------------------------------------------------
Processor Level Optimization Verification, by Paul Kahler
(phkahler@wowway.com) and Anonymous
Paul Kahler writes:
I just read Stephen Bridgett's article in RTNv17n1
(http://www.acm.org/tog/resources/RTNews/html/rtnv17n1.html#art5) and
wanted to comment on optimizing the point in polygon test. I've found that
branching code is indeed the worst thing you can run on a modern
processor. I'm still using an Athlon 700, so I'm sure the problem has
gotten worse than I've seen. Anyway, while Stephen has eliminated one
branch from the test there is still one left in his code fragment that can
be eliminated. While it appears as an assignment it involves evaluation of
a boolean >= condition which involves a conditional branch. Some
processors can handle this without branching, but only with the right
compiler as well.
When doing a magnitude comparison (A