"The Ray Tracing News" email edition began after some ray tracing researchers
met at SIGGRAPH 87 and an address list was formed. Andrew Glassner started
"The Ray Tracing News", hardcopy edition, and soon thereafter we distributed
copies of the email address list to researchers.
This is the first archive collection of "The Ray Tracing News". I hope you
will get as much use out of it as I have,
Eric Haines, 3D/Eye Inc, 2359 N. Triphammer Rd, Ithaca, NY 14850
...!hpfcla!hpfcrs!eye!erich
----------------------------------------------------
I'm presently keeping the list up-to-date. As far as adding new people
to this mailing list, I'd personally like to see the list not grow without
bounds. Given that the Intro to Ray Tracing course had the highest attendance,
there's obviously a lot of people interested in ray-tracing. The group
presently consists of researchers and people building ray-tracing systems, and
it'd be nice to keep it at this level (and keep all of our long-distance e-mail
costs down).
First off, a quick announcement: if you didn't get a copy of the
"Intro. to Ray Tracing" course notes at SIGGRAPH 87 and would like a copy
(they sold out twice), send me $20 and I'll xerox them. There are only three
articles which are reprints - the rest is new stuff and pretty worthwhile.
[Skip the next offering if you read it in The Ray Tracing News]
SPD & NETLIB:
My news for the day is that netlib is now carrying my "standard"
procedural database generators (written in C). If you don't know about netlib,
here's the two minute explanation:
-------------------------------------------------------------------------------
Netlib has two addresses. One is:
...!hplabs!hpfcla!research!netlib
(There may be other quicker routes to netlib - you'll have to research that
yourself). The second one may be more convenient, as it is an arpa connection:
netlib@anl-mcs.arpa
So after you do "mail [...!]hplabs!hpfcla!research!netlib", the next step is to
request what you want, one line per request. For example, to get my databases,
simply type on a single line (and don't have anything else on the line):
send haines from graphics
and end the message. Netlib is computerized, and will automatically parse
your message and send you the 82K bytes of dox & code for my databases.
The best way to find out about netlib and what it has to offer is to
send it a request:
send index
and you'll get sent a listing of all the libraries available. It's mostly
numerical analysissy stuff (lots'o'matrix solvers), but there are some things
of interest. One particularly yummy database is the "polyhedron"
contributions. There are some 142 polyhedra descriptions (vertices, faces, &
edge descriptions and more). Some of these descriptions are buggy, but most
are good (as netlib says, "Anything free comes with no guarantee").
-----------------------------------------------------------------------------
As far as the question "what do the images look like?" goes, the
images will be published in IEEE CG&A in November.
SPLINE SURFACES:
A question which I want to answer is "which is the best way in
software to ray-trace bicubic spline patches?" In particular, I want to
intersect bicubics (also quadrics and linears, and any mix of the three, e.g.
bicubic u, quadric v) that can also be rational, and also have non-uniform
knots. As an added bonus, I'd like to do trimming curves. I am interested in
anyone's feelings or findings on this subject, especially any experiences with
actual implementation they may have had.
To kick off discussion of this problem, John Peterson, who is researching this
question at the University of Utah, was kind enough to spend some time on a
response to me. His reply follows (printed with his permission):
------------------------------------------------------------------------------
RE ray tracing splines.. I've sent a paper on ray tracing splines via
polygons to TOG, but since that's going to be stuck in the review process
for a while, here's an overview:
Most of the recent published stuff on this have been approaches using
numerical methods; which I would avoid like the plague. I recently
discovered that Whitted mentions surface subdivision very briefly in his
classic paper (CACM '80) and also in Rubin & Whitted (SIGGRAPH '80).
The technique they use was to subdivide the surface "on the fly", i.e.,
an area of surface is split only when rays come near it. Whitted
doesn't go into any detail in these papers though, just a couple of
paragraphs each.
However, Whitted's motivation for "subdivision on the fly" was lack of
memory on his PDP-11 - nowadays I think you're better off just to do all
the subdivision initially, then throw the surface away - much less
bookkeeping. The polygon/bounding volume database isn't that huge if you
use adaptive surface subdivision (more on that in a moment).
In terms of references, I'd highly recommend the "Killer B's" - "An
Introduction to the Use of Splines in Computer Graphics" by Bartels,
Beatty and Barsky. It appeared as SIGGRAPH tutorial notes in '85 and
'86, and I think a book version is coming out from Kaufmann(sp?) in
September. Another good reference is, "A Survey of Curve and Surface
Methods in CAGD", by Bohm, Farin and Kahmann, in "Computer Aided Geometric
Design", July 1984. Both of these give excellent background on all the
math you'll need for dealing with splines. If you need to know about
rationals, see Tiller's paper "Rational B-Splines for Curve and Surface
Representations" in CG&A, September '83.
The subdivision algorithm I used is based on the Oslo Algorithm (Cohen,
Lyche & Riesenfeld, Computer Graphics & Image Proc., Oct. 1980). This
is a little slower than some of the other subdivision algorithms, but
the win is that you're not restricted to specific orders or knot
vectors. Since the subdivision time is generally small compared to the
ray tracing time (like < 10%) I find it's worth the generality. (By
the way, the Killer B's description of the Oslo algorithm is much easier
reading than the CG&IP article. Sweeney's paper in the Feb. '86 CG&A
also has a good description of it). Other subdivision classics are Ed
Catmull's PhD thesis (U. Utah, '75) and Lane, Carpenter, Whitted &
Blinn's article in the Jan. '80 CACM.
A couple tricks are noteworthy. First, if you're doing adaptive surface
subdivision, you'll need a "flatness criteria" (to determine when to
quit splitting the surface). I've found a good approximation is to take
the bounding box of the control mesh, and find the point in the middle
of it. Then project the size of a pixel onto this point, and use this
distance as a flatness criteria.
Other trick: Crack prevention. If you split a surface into two parts,
one part may get subdivided more than the other. If this happens, along
the seam between the two halves you need to force the polygon vertices in the
side with more divisions to lie on the edge of the surface with fewer
subdivisions.
My reply to John follows:
Thanks much for taking the time to tell me about splines and your
findings. You leave me in a quandary, though. I'm interested in the
numerical techniques for bicubics, but I also want to take your warnings to
heart.
I have to admit, I have a great fear of throwing away the nice
compact spline description and blow it up into tons of polygons. From your
comments, you say that using adaptive techniques can help avoid this problem.
You seem to divide to the pixel level as your criteria - hmmm, I'll have to
think about that. Avoiding cracks sounds like a headache. Also, it seems to
me that you'll have problems when you generate reflection rays, since for these
rays the "flatness criteria" is not necessarily valid. Have you ever noticed
practical problems with this (one pathological example I can think of is a
lens in front of a spline patch: the lens magnifies the pixel sized patches
into much larger entities. However, almost everything has pathological
problems of some sort. Have you run into any problems due to your method on a
practical level)?
I may try subdividing on the fly to avoid all this. To this end, have
you looked at Ron Pulleyblank's routine for calculating bicubic patch
intersections (IEEE CG&A March 1987)? What do you think of his "on the fly"
subdivision algorithm?
Articles: thanks for the references. Have you seen the article by
Levner, Tassinari, and Marini, "A Simple Method for Ray Tracing Bicubic
Surfaces," in Computer Graphics 1987, T.L. Kunii editor, Springer-Verlag, Tokyo?
Sounded intriguing - someone's hopefully going to get me a copy of it soon
if you don't have it and would like a copy. If you do have a copy, is it any
good?
----------------------------------------------------------------------------
Now, here's John's response:
RE: Numerical techniques. I guess grim memories of round-off errors
and consistently inconsistent results may bias my opinion, but there
are some fundamental reasons for the problems with numerical methods.
Finding roots of systems of two equations is inherently an unstable
process (for a good description of this, see section 9.6 in "Numerical
Recipes" by William Press, et. al.). Another way to think about
iterative approximations in two variables is the chaotic patterns
you see Mandlebrot sets. It's (very roughly) the same idea. Chaos
and ray tracing don't mix...
Your comments about the flatness criteria are true, though in practice
I've only found one "practical" instance where it really poses a
problem. This is when a light source is very close to an object, and
casts a shadow on a wall some distance away. The shadow projection
magnifies the surface's silhouette onto the wall, and in some cases
you see some faceting in the shadow's edge. The work-around is to
have a per-surface "resolution factor" attribute. The flatness
criteria found by projecting the pixel is multiplied by this factor,
so a surface with a "res factor" of 0.5 may generate up to twice as
many polygons as it normally would (extra polygons are generated only
where the surface is really curved, though).
In order to get a feel for just how much data subdivision generates, I
tried the following experiment. I took the code for balls.c (from
the procedural database package you posted) and modified it to
generate a rational quadratic Bezier surface for each sphere
(as well as bounding volumes around each "group" of spheres). I
didn't do the formal benchmark case (too lazy), but just choose a view
where all the spheres (level 2 == 91 of them) just filled the screen.
The results look like this:
Image Size Triangles
(pixels) generated
128x128 7800
512x512 30400
The original spline surface definition wasn't small, each sphere has
45 rational (homogeneous) control points + knot vectors. My
philosophy at the moment is that the algorithms for handling lots of
trivial primatives win over those for handling a few complex ones.
Right now the "lots of little primatives" camp has a lot of strong
members (Octrees/Voxels, Kay/Kajiya/Caltech, Arvo & Co, etc). If you
just want to maximize speed, I think these are difficult to beat, but
they do eat lots of memory.
I'd be very interested in seeing a software implementation of Pulleyblank's
method. The method seemed very clever, but it was very hardware oriented
(lots of integer arithmetic, etc). I guess the question is whether or not
their subdivision algorithm is faster than just a database traversal.
Something like Kay/Kajiya or Arvo's traversal methods would probably scream
if you could get them to run in strictly integer arithmetic (not to mention
putting them in hardware...)
Cheers,
jp
----------------------------------------------------------------------------
Anywell, that's the discussion as far as it's gone. We can continue it in one
of two ways: (1) everyone writes to everyone on the subject (this is quick,
but can get confusing if there are a lot of answers), (2) send replies to me,
which I'll then send to all. I opt for (1) for now: if things get confusing
we can always shift to (2).
[Actually, we're shifting to (2) around now, though it seems worthwhile to
pass on your comments to all, if you're set up to do it. A summary of the
comments will (eventually, probably) get put in Andrew's ray-tracing
newsletter.]
More responses so far:
>From Jeff Goldsmith:
Re: flatness criterion
The definition that JP gave seems to be based on pixel geometry.
That doesn't seem right. Why not subdivide until you reach
subpatches that have preset limits in the change in their
tangent vector (bounded curvature?) Al Barr and Brian Von
Herzen have done some formal studies of that in a paper given
this year. (It wasn't applied to ray tracing, but it doesn't
matter.) I used that technique for creating polygonal representations
of superquadrics with fair to good success. The geometric
criterion makes sure that not much happens to the surface
within a patch, which is what you really want, anyway.
I, too, by the way, believe in the gobs of polygons vs. one
compicated object tradeoff. The two seem to be close in
speed, but polygons saves big time in that you never need
new code for your renderer. I hate writing (debugging)
renderer code because it takes so long. Modeling code
is much faster.
--Jeff
>From Tim Kay:
Subject: ray tracing bicubic patches
The discussion about subdivision was interesting. I just want to point
out that a paper in this year's proceedings (Snyder and Barr) did just
what the discussion suggested. The teapot was modeled with patches,
and John hacked them up into very small polygons. He also talked about
some of the problems that you run into.
Tim
------------------
>From Brian Barsky:
What numerical techniques is John referring to? He doesn't mean the
resolvent work, does he?
----------------------------
Response from John Peterson:
I was using a modified version of Sweeney's method. It was extended in
two ways; first, a more effective means was used to generate the bounding
volumes around the mesh, and it was able to handle surfaces with arbitrary
orders and knot vectors. I wrote up the results in a paper that appeared
in a very obscure proceedings (ACM Rocky Mnt. Regional Conference,
Santa Fe, NM, Nov. 1986)
------------------------------------------------------
ABNORMAL NORMALS:
>From Eric Haines:
My contribution for the week is an in-house memo I just wrote on transforming
normals, which is easier that it sounds. Some of you have undoubtedly dealt
with this long ago, but hopefully I'll notify some poor soul that all is not
simple with normal transforms. Pat Hanrahan mentioned this problem in his talk
at the SIGGRAPH '87 Intro to Ray Tracing course, but I didn't really understand
why he was saying "don't use the modeling matrix to transform normals!" Now
I do, and I thought I'd explain it in a fairly informal way. Any comments
and corrections are appreciated!
The file's in troff format, run by:
pic thisfile | troff -mm
(The document was written partly so that I could learn about troff and pic,
so it's pretty primitive).
All for now. The file follows, indented by 4 spaces so that no "."'s would
be in column one (which some e-mailers evidentally don't like).
----------------------cut here---------------------
.tr ~
.ds HF 3 3 2 2 2 2 2
.nr Hi 0
.HM I A 1 a
.nr Ht 1
.nr Hb 6
.nr Hs 6
.nr Cl 7
.na
.fi
.ad l
\f(CS
.ce
Abnormal Normals
.ce
Eric Haines, 3D/Eye Inc.
The problem: given a polygon and its normal(s) and a modeling matrix, how
do we correctly transform the polygon from model to world space? We assume
that the modeling matrix is affine (i.e. no perspective transformation is going
on).
This question turns out to be fraught with peril. The right answer is
to transform the vertices using the modeling matrix, and to transform the
normals using the transpose of the inverse (also known as the adjunct) of the
modeling matrix. However, no one believes this on first glance. Why do all
that extra work of taking the inverse and transposing it? So, we'll present
the wrong answers (which are commonly used in the graphics community
nonetheless, sometimes with good reason), then talk about why the right answer
is right.
Wrong answer #1: Transform the normals using the modeling matrix. What
this means is multiplying the normal [~x~y~z~0~] by the modeling matrix. This
actually works just fine if the modeling matrix is formed from translation
matrices (which won't affect the normal transformation, since translations
multiply the 'w' component of the vector, which is 0 for normals) and rotation
matrices. Scaling matrices are also legal, as long as the x, y, and z
components are the same (i.e. no "stretching" occurs). Reflection matrices
(where the object is flipped through a mirror plane - more about these later)
are also legal, as long as there is no stretching. Note that scaling will
change the overall length of the vector, but not the direction.
So what's wrong? Well, scaling matrices which stretch the object (i.e.
whose scaling factors are not all the same for x, y, and z) ruin this scheme.
Imagine you have a plane at a 45 degree tilt, formed by the equation
x~=~y (more formally, x~-~y~=~0). Looking down upon the x-y plane from the z
axis, the plane would appear as a line x~=~y. The plane normal is [~1~-1~0~]
(for simplicity don't worry about normalizing the vector), which would appear
to be a ray where x~=~-y, x~>~0. Now, say we scale the plane by stretching
it along the x axis by 2, i.e. the matrix:
[~2~0~0~0~]
.br
[~0~1~0~0~]
.br
[~0~0~1~0~]
.br
[~0~0~0~1~]
This would form a plane in world space where x~=~2y. Using the method of
multiplying the normal by this modeling matrix gives us a ray where x~=~-2y,
x~>~0. The problem with this ray is that it is not perpendicular to our
plane. In fact, the normal is now 2x~=~-y, x~>~0. Therefore, using the
modeling matrix to transform normals is wrong for the
stretching case.
.DS
.PS 6.3
# x-y grid
LX: arrow up up
"+y" at LX.end above
move to LX.end
move left 0.25
"before" "transform"
move to LX.start
LY: arrow right right
"+x" at LY.end ljust
move to LY.start
line left ; move to LY.start
line down ; move to LY.start
# plane
M: line up right up right
"plane" at M.end + (-0.05,0.0) rjust
move to M.start
line down left
move to M.start
N: arrow down right dashed
"normal" at N.end + (0.05,0.0) ljust
move to N.start
##############
move right 2.0
# x-y grid
LX: arrow up up
"+y" at LX.end above
move to LX.end
move left 0.25
"after" "transform"
move to LX.start
LY: arrow right right
"+x" at LY.end ljust
move to LY.start
line left ; move to LY.start
line down ; move to LY.start
# plane
M: line up right right
"plane" at M.end + (-0.05,0.0) rjust
move to M.start
line down 0.25 left
move to M.start
N: arrow down right right dashed
box invisible height 0.25 "bad" "normal" with .n at N.end
move to N.start
N: arrow down right 0.25 dotted
box invisible height 0.25 "correct" "normal" with .n at N.end
move to N.start
.PE
.ce
Figure 1 (a) & (b) - Stretching Transformation
.DE
.na
.fi
.ad l
Wrong answer #2: Transform the vertices, then calculate the normal. This
is a limited response to the wrongness of method #1, solving the stretching
problem. It's limited because this method assumes the normal is calculated
from the vertices. This is not necessarily the case. The normals could be
supplied by the user, given as a normal for the polygon, or on a normal per
vertex basis, or both. However, even if the system only allowed normals which
were computed from the vertices, there would still be a direction problem.
Say the method used to calculate the normal is to take the cross product of
the first two edges of the polygon (This is by far the most common method.
Most other methods based on the local geometry of the polygon will suffer from
the same problem, or else the problem in method #1). Say the vertices are
[~1~0~0~], [~0~0~0~], and [~0~-1~0~]. The edge vectors (i.e. the vector formed
from subtracting one vertex on the edge from the other vertex forming that edge)
are [~1~0~0~] and [~0~1~0~], in other words the two edge vectors are parallel
to the +x and +y axes. The normal is then [~0~0~1~], calculated from the cross
product of these vectors.
If we transform the points by the reflection matrix:
[~1~0~~0~0~]
.br
[~0~1~~0~0~]
.br
[~0~0~-1~0~]
.br
[~0~0~~0~1~]
the result is the same: none of the edges actually moved. However, when we
use a reflection matrix as a transform it is assumed that we want to reverse the
object's appearance. With the above transform the expected result is that
the normal will be reversed, thereby reversing which side is thought of as
the front face. Our method fails on these reflection transforms because it
does not reverse the normal: no points changed location, so the normal will
be calculated as staying in the same direction.
The right (?) answer: What (usually) works is to transform the normals
with the transpose of the inverse of the modeling matrix. Rather than trying
to give a full proof, I'll talk about the three types of matrices which are
relevant: rotation, reflection, and scaling (stretching). Translation was
already seen to have no effect on normals, so we can ignore it. Other more
obscure affine transformations (e.g. shearing) are avoided in the discussion,
though the method should also hold for them.
In the case of rotation matrices and reflection matrices, the transpose and
the inverse of these transforms are identical. So, the transpose of the
inverse is simply the original modeling matrix in this case. As we saw, using
the modeling matrix worked fine for these matrices in method #1. The problems
occurred with stretching matrices. For these, the inverse is not just a
transpose of the matrix, so the transpose of the inverse gives a different
kind of matrix. This matrix solves our problems. For example, with the bad
stretching case of method #1, the transpose of the inverse of the stretch
matrix is simply:
[~0.5~0~0~0~]
.br
[~~0~~1~0~0~]
.br
[~~0~~0~1~0~]
.br
[~~0~~0~0~1~]
(note that the transpose operation is not actually needed in this particular
case). Multiplying our normal [~1~-1~0~] by this matrix yields [~0.5~-1~0~],
or the equation 2x~=~-y, x~>~0, which is the correct answer.
The determinant: One problem with taking the inverse is that sometimes
it isn't defined for various transforms. For example, casting an object onto
a 2D x-y plane:
[~1~0~0~0~]
.br
[~0~1~0~0~]
.br
[~0~0~0~0~]
.br
[~0~0~0~1~]
does not have an inverse: there's no way to know what the z component should
turn back into, given that the above transform matrix will always set the z
component to 0. Essentially, information has been irretrievably destroyed
by this transform. The determinant of the upper-left 3x3 matrix (the only
part of the matrix we really need to invert for the normal transform) is 0,
which means that this matrix is not invertable.
An interesting property of the determinant is that it, coupled with method
#2, can make that method work. If the determinant of the 3x3 is positive, we
have not shifted into the mirror world. If it is negative, then we should
reverse the sign of the normal calculated as we have entered the mirror
universe).
It would be nice to get a normal for polygons which have gone through this
transform. All bets are off, but some interesting observations can be made.
The normal must be either [~0~0~1~] or its negative [~0~0~-1~] for this
transformation (or undefined, if all vertices are now on a single line).
Choosing which normal is a bit tricky. One OK method is to check the normal
before transform against [~0~0~1~]: if the dot product of the two is negative,
then reverse the normal so that it will point towards the original direction.
However, if our points went through the z-reflection matrix we used earlier,
then went through the transform above, the normals were reversed, then the
object was cast onto the x-y plane. In this case we would want to have the
reverse of the normal calculated from the edges. However, this reversal has
been lost by our casting transform: concatenating the reflection matrix with
the casting matrix yields the same casting matrix. One tricky way of
preserving this is to allow 0 and -0 to be separate entities, with the sign of
zero telling us whether to reverse the normal or not. This trick is rather
bizarre, though - it's probably easier to just do it the simple way and warn
whoever's using the system to avoid non-invertable transformations.
THE END:
Well, that's all for now. Do you have any comments? questions?
interesting offerings for the group? Either send your bit to everyone on the
list, or send a copy on to me and I'll post it to all. I realize this is
quite a deluge of info for one message, but all of this has accumulated over a
few months. The traffic so far has been quite mild: don't worry about future
flooding.
All for now,
Eric Haines
----------------------------------------
_ __ ______ _ __
' ) ) / ' ) )
/--' __. __ , --/ __ __. _. o ____ _, / / _ , , , _
/ \_(_/|_/ (_/_ (_/ / (_(_/|_(__<_/ / <_(_)_ / (_
Subject: Bounded ray tracing
Dear Sir,
The discussions so far is very interesting one and I have
several comments.
As I am charged for foreign mail (about $1 for 1K bytes, both
incoming and out going), it costs considerablely to mail everyone
on the list separately. So, I would like you to re-distribute
my transpacific mail to everyone else.
Masataka Ohta
My comment on the flatness criteria with reflections follows:
-----------------------------
Though I don't like subdividing patches into polygons for ray
tracing (it's incoherent and, for example, CSG objects are
difficult to render), good "flatness criteria" even with
reflection, refraction or shadowing can be given using ray
bound tracing.
The basic idea is simple. Ray bound is a combination of two
bounds: a bound of ray origins and a bound of ray directions.
A efficient bound can be formed by using a sphere for bounding
ray origins and using a circle (on a unit sphere, i.e. using
spherical geometry) for ray directions.
To begin with, bound a set of all rays which originates from
each pixel. Flatness of a patch for the first generation ray
should be computed against this ray bound, which is equivalent
to measure flatness with perspective transformation, because
rays are bounded by a pixel-sized cone.
As for the second generation rays, they can be bounded by a
certain ray bound which can be calculated form the first
generation ray bound. And those ray bounds should be used
for the flatness check.
For those further interested in ray bound tracing, I will
physically mail my paper titled "Bounded ray tracing for
perfect and efficient anti-aliasing".
-------------------------------------------------------------------------------
From: Eric Haines
Subject: Spline surface rendering, and what's wrong with octrees
Well, after all the discussion of spline surfaces, I finally went with turning
the spline surface into patches, putting an octree around these, and then do
Glassner/Kay/Fujimoto/etc octree ray-tracing (in reality I found Glassner's
article the most useful, though I didn't use his hashing scheme due to (a)
being pressed for time and (b) being pressed for memory space. This seems to
work fairly well, but I noticed some interesting problems with octrees that
I thought I'd pass on.
----------
[Note: this first problem is kinda boring if you've never implemented an octree
subdivision scheme before. Skip on to problem # 2, which I think is more
important].
The first problem is: How do I cleverly chose octree bounds? This problem was
first mentioned to me by Mike Kaplan, which I did not think about much until I
suddenly noticed that all available memory was getting gobbled by certain
polygonalized splines. The problem is that there are two parameters which are
commonly used to end the further subdivision of an octree cube into its eight
component "cubies".
One is a maximum number of primitives per octree cube. To make the octree in
the first place we have a bounding cube which contains the environment. If
the cube has more than a certain number of primitives in it, then octree
subdivision takes place. The octree cubies formed are then each treated in a
like fashion, subdividing until all leaf cubies contain less than or equal to
the number of primitives. The second parameter is the maximum tree depth,
which is the number of levels beyond which we will not subdivide cubes. This
parameter generally has precedence over the first parameter, i.e. if the
maximum level has been reached but the maximum number of primitives is still
exceeded, subdivision will nonetheless halt.
The trick is that you have to pay close attention to both parameters.
Originally I set these parameters to some reasonable numbers: 6 primitives and
8 levels being my maximums. What I found is that some objects would have
very deep octrees, all the way down to level 8, even though their number of
primitives was low. For example, an object with 64 patches would still have
some leaf nodes down at level 8 which had 7+ primitives in them. I was
pretty surprised by this problem.
My solution for spline surfaces was to keep the maximum number of primitives
at 6 and use another parameter to determine the maximum level. I use the
formula:
max level = round_up [ ln( primitives / K ) / ln( S ) ]
where K is the maximum number of primitives (i.e. 6) and S was a prediction of
how much an octree subdivision would cut down the number of primitives in an
octree. For example, in an environment consisting of a set of randomly
distributed points, one would expect that when the octree cube containing these
points was subdivided into eight octree cubies, each octree cubie would have
about 1/8th of the points inside it. For a spline surface I reasoned that
about four of the octree cubies might have some part of the surface in them,
which would give an S=4 (Note that the largest, original octree must have at
least four cubies filled by the surface. However, this is not necessarily true
for suceedingly smaller cubies). Another factor which had to be taken into
account was that there would also be some overlap: some primitives would appear
in two or more cubies. So, as a final reasonable guess I chose S=3.5 . This
seems to work fairly well in practice, though further testing would be very
worthwhile.
Coming up with some optimal way to chose a maximum octree depth still seems to
be an open question. Further study on how various environments actually fill
space would be worthwhile: how many octree nodes really are filled on the
average for each subdivision? More pragmatically, how do we determine the
best maximum depth for ray-tracing an environment? The problem with not
limiting the maximum level is primarily one of memory. If the octree grows
without reasonable bounds a simple scene could use all available memory. Also,
a large number of unnecessary octree nodes results in additional access time,
either through having to search through the octree or through having extraneous
objects in the hashing table.
A more intelligent approach might be to do adaptive subdivision: subdivide an
octree cube as usual, then see how many fewer primitives there are in each
cubie. If some cube has more than some percentage of primitives in it, the
subdivision could be deemed useless and so subdivision would end at this point.
If anyone knows a master's candidate looking for a project, this whole question
of when it is profitable to subdivide might make a worthwhile topic. Judging
from the interest in octrees by ray tracing researchers at last year's
roundtable, I think this will become more and more important as time goes on.
-------------
The second problem with octrees: I decided to go with octrees for spline
surfaces only because these objects would have fairly localized and even
distribution of primitives (i.e. quadrilateral patches). I feel that octree
efficiency techniques are probably horrible for ray tracing in general.
For example, imagine you have a football stadium made of, say, 5K primitives.
Sitting on a goal line is a shiny polygonalized teapot of 5K quadrilaterals
(note that the teapot is teapot sized compared to the stadium). You fill the
frame with the teapot for a ray trace, hoping to get some nice reflections of
the stadium on its surface.
If you use an octree for this scene, you'll run into an interesting problem.
The teapot is, say, a foot long. The stadium is 200 yards long. So, the
teapot is going to be only 1/600th the size of the stadium. Each octree
subdivision creates 8 cubies which are each half the length of the parent
cube. You could well subdivide down to 9 levels (with that 9th level cubie
having a length of 1/512th of the stadium length: about 14 inches) of octrees
and still have the whole teapot inside one octree cube, still undivided. If
you stopped at this 9th level of subdivision, your ray trace would take
forever. Why? Because whenever a ray would enter the octree cubie containing
the teapot (which most of the rays from your eye would do, along with all those
reflection and shadow rays), the cubie would contain a list of the 5K teapot
polygons. Each of these polygons would have to be tested against the ray,
since there is no additional efficiency structure to help you out. In this
case the octree has been a total failure.
Now, you may be in a position where you know that your environments will be
well behaved: you're ray tracing some specific object and the surrounding
environment is limited in size. However, the designer who is attempting to
create a system which can respond to any user's modeling requests is still
confronted by this problem. Further subdivision beyond level nine down to
level eighteen may solve the problem in this case. But I can always come up
with a worse pathological case. Some realistic examples are an animation
zooming in on a texture mapped earth into the Caltech campus: when you're
on the campus the sphere which represents the earth would create a huge
octree node, and the campus would easily fall within one octree cubie. Or
a user simply wants to have a realistic sun, and places a spherical light
source 93 million miles away from the scene being rendered. Ridiculous? Well,
many times I find that I will place positional light sources quite some
distance away from a scene, since I don't really care how far the light is,
but just the direction the light is coming from. If a primitive is associated
with that light source, the octree suddenly gets huge.
Solutions? Mine is simply to avoid the octree altogether and use Goldsmith's
automatic bounding volume generation algorithm (IEEE CG&A, May 1987). However,
I hate to give up all that power of the octree so easily. So, my question:
has anyone found a good way around this problem? One method might be to do
octree subdivision down to a certain level, then consider all leaf cubies that
have more than the specified number of primitives in their lists as "problem
cubies". For this list of primitives we perform Goldsmith's algorithm to get
a nice bounding volume hierarchy. This method reminds me of the SIGGRAPH 87
paper by John Snyder and Alan Barr, "Ray Tracing Complex Models Containing
Surface Tesselations". Their paper uses SEADS on the tesselated primitives and
hierarchy on these instanced SEADS boxes to get around memory constraints,
while my idea is to use the octree for the total environment so that the
quick cutoff feature of the octree can be used (i.e. if any primitive in an
octree cubie is intersected, then ray trace testing is done, versus having to
test the whole environment's hierarchy against the ray). Using bounding
volume hierarchy locally then gets rid of the pathological cases for the octree.
However, I tend to think the above just is not worthwhile. It solves the
pathological cases, but I think that automatic bounding volume hierarchy (let's
call it ABVH) methods will be found to be comparable in speed to octrees in
many cases. I think I can justify that assertion, but first I would like to
get your opinions about this problem.
-------------------------------------------------------------------------------
Top Ten Hit Parade of Computer Graphics Books
by Eric Haines
One of the most important resources I have as a computer graphics programmer
is a good set of books, both for education and for reference. However, there
are a lot of wonderful books that I learn about years after I could have first
used them. Alternately, I will find that books I consider classics are unknown
by others. So, I would like to collect a list of recommended reading and
reference from you all, to be published later in the year. I would especially
like a recommendation for good books on filtering and on analytic geometry.
Right now I am reading _Digital Image Processing_ by Gonzalez and Wintz and have
_A Programmer's Geometry_ on order, but am not sure these fit the bill.
_An Introduction to Splines for use in Computer Graphics and Geometric
Modeling_ by Bartels/Beatty/Barsky looks like a great resource on splines,
but I have read only four chapters so far so am leaving it off the list for
now.
Without further ado, here are my top ten book recommendations. Most should be
well known to you all, and so are listed mostly as a kernel of core books I
consider useful. I look forward to your additions!
_The Elements of Programming Style, 2nd Edition_, Brian W. Kernighan,
P.J. Plauger, 168 pages, Bell Telephone Laboratories Inc, 1978.
All programmers should read this book. It is truly an "Elements of
Style" for programmers. Examples of bad coding style are taken from
other textbooks, corrected, and discussed. Wonderful and pithy.
_Fundamentals of Interactive Computer Graphics_, James D. Foley, A. Van
Dam, 664 pages, Addison-Wesley Inc, 1982.
A classic, covering just about everything once over lightly.
_Principles of Interactive Computer Graphics, 2nd Edition_, William M.
Newman, R.F. Sproull, 541 pages, McGraw-Hill Inc, 1979.
The other classic. It's older (e.g. ray-tracing did not exist at this
point), but gives another perspective on various algorithms.
_Mathematical Elements for Computer Graphics_, David F. Rogers, J.A. Adams,
239 pages, McGraw-Hill Inc, 1976.
An oldie but goodie, its major thrust is a thorough coverage of 2D and
3D transformations, along with some basics on spline curves and
surfaces.
_Procedural Elements for Computer Graphics_, David F. Rogers, 433 pages,
McGraw-Hill Inc, 1985.
For information on how to actually implement a wide variety of
graphics algorithms, from Bresenham's line drawer on up through
ray-tracing, this is the best book I know. However, for complicated
algorithms I would recommend also reading the original papers.
_Numerical Recipes_, William H. Press, B.P. Flannery, S.A. Teukolsky,
W.T. Vetterling, 818 pages, Cambridge University Press, 1986.
Chock-full of information on numerical algorithms, including code
in FORTRAN and PASCAL (no "C", unfortunately). The best part of
this book is that they give good advice on what methods are appropriate
for different types of problems.
_A First Course in Numerical Analysis, 2nd Edition_, Anthony Ralston,
P. Rabinowitz, 556 pages, McGraw-Hill Inc, 1978.
Tom Duff's recommendation says it best: "This book is SO GOOD [<-these
words should be printed in italics] that some colleges refuse to use
it as a text because of the difficulty of finding exam questions that
are not answered in the book". It covers material in depth which
_Numerical Recipes_ glosses over.
_C: A Reference Manual_, Samuel P. Harbison, G.L. Steele Jr., 352 pages,
Prentice-Hall Inc, 1984.
A comprehensive and comprehensible manual on "C".
_The Mythical Man-Month_, Frederick P. Brooks Jr, 195 pages, Addison-Wesley
Inc, 1982.
A classic on the pitfalls of managing software projects, especially
large ones. A great book for beginning to learn how to schedule
resources and make good predictions of when software really is going
to be finished.
_Programming Pearls_, Jon Bentley, 195 pages, Bell Telephone Laboratories
Inc, 1986.
Though directed more towards systems and business programmers, there
are a lot of clever coding techniques to be learnt from this book.
Also, it's just plain fun reading.
As an added bonus, here's one more that I could not resist:
_Patterns in Nature_, Peter S. Stevens, 240 pages, Little, Brown and Co.
Inc, 1974.
The thesis is that simple patterns recur again and again in nature and
for good reasons. A quick read with wonderful photographs (my favorite
is the comparison of a turtle shell with a collection of bubbles
forming a similar shape). Quite a few graphics researchers have used
this book for inspiration in simulating natural processes.
---------------------------------
From Olin Lathrop:
Here goes another attempt to reach more people. I will now spare you all
a paragraph of griping about the e-mail system.
About the normal vector transform:
Eric, you are absolutely right. I also ran into this when some of my squashed
objects just didn't look right, about 4 years ago. I would just like to offer
a slightly different way of looking at the same thing. I find I have difficulty
with mathematical concepts unless I can attatch some sort of physical significance
to them. (I think of a 3x4 transformation matrix as three basis
vectors and a displacement vector instead of an amorphous pile of 12 numbers.)
My first attack at finding a transformed normal was to find two non-paralell
surface vectors at the point in question. These could be transformed regularly
and the transformed normal would be their cross product. This certainly
works, but is computationally slow. It seemed clear that there should exist
some 3x3 matrix that was the total transform the normal vector really went thru.
To simplify the thought experiment, what if the original normal vector was exactly
along the X axis? Well, the unit surface vectors would be the Y and Z axis
vectors. When these are sent thru the regular 3x3 transformation matrix,
they become the Y and Z basis vectors of that matrix. The final resulting
normal vector is therefore the cross product of the Y and Z basis vectors of the
regular matrix. This is then what the X basis vector of the normal vector
transformation matrix should be. In general, a basis vector in the normal
vector transformation matrix is the cross product of the other two basis
vectors of the regular transformation matrix. I wasn't until about a year
later that I realized that this resulting matrix was the inverse transpose
of the regular one.
This derivation results in exactly the same matrix that Eric was talking about,
but leaves me with more physical understanding of what it represents.
Now for a question: It has always bothered me that this matrix trashes the
vector magnitude. This usually implies re-unitizing the transformed normal
vector in practise. Does anyone avoid this step? I don't want to do any
more SQRTs than necessary. You can assume that the original normal vector
was of unit length, but that the result also needs to be.
About octrees:
1) I don't use Andrew's hashing scheme either. I transform the ray so that
my octree always lives in the (0,0,0) to (1,1,1) cube. To find the voxel
containing any one point, I first convert the coordinates to 24 bit integers.
The octree now sits in the 0 to 2**23 cube. Picking off the most significant
address bit for each coordinate yields a 3 bit number. This is used to select
one of 8 voxels at the top level. Now pick off the next address bit down
and chose the next level of subordinate voxel, etc, until you hit a leaf node.
This process is LOGn, and is very quick in practise. Finding a leaf voxel
given an integer coordinate seems to consume about 2.5% of the time for most
images. I store direct pointers to subordinate voxels directly in the parent
voxel data block. In fact, this is the ONLY way I have of finding all but the
top voxel.
2) Choosing subdivision criteria: First, the biggest win is to subdivide on
the fly. Never subdivide anything until you find there is a demand for it.
My current subdivision criteria in order of precidence (#1 overrides #2) are:
1) Do not subdivide if hit subdivision generation limit. This is the same
as what Eric talked about. I think everyone does this.
2) Do not subdivide if voxel is empty.
3) Subdivide if voxel contains more than one object.
4) Do not subdivide if less than N rays passed thru this voxel, but did
not hit anything. Currently, N is set to 4.
5) Subdivide if M*K < N. Where M is the number of rays that passed thru this
voxel that DID hit something, and K is a parameter you chose. Currently,
K is set to 2, but I suspect it should be higher. This step seeks to avoid
subdividing a voxel that may be large, but has a good history of producing
real intersections anyway. Keep in mind that for every ray that did hit
something, there are probably light source rays that did not hit anything.
(The shader avoids launching light rays if the surface is facing away from
the light source.) This can distort the statistics, and make a voxel appear
less "tight" than it really is, hence the need for larger values of K.
6) Subdivide.
Again, the most important point is lazy evaluation of the octree. The above rules
are only applied when a ray passes thru a leaf node voxel. Before any rays are
cast, my octree is exactly one leaf node containing all the objects.
3) First solution to teapot in stadium: This really cries out for nested objects.
Jim Arvo, Dave Kirk, and I submitted a paper last year on "The Ray Tracing Kernel"
which discussed applying object oriented programming to designing a ray tracer.
Jim just told me he is going to reply about this in detail so I will make this
real quick. Basically, objects are only defined implicitly by the results of
various standard operations they must be able to perform, like "intersect
yourself with this ray". The caller has no information HOW this is done. An
object can therefore be an "aggregate" object which really returns the result of
intersecting the ray with all its subordinate objects. this allows for easily
and elegantly mixing storage techniques (octrees, linear space, 5D structures,
etc.) in the same scene. More on this from JIM.
4) Second solution to teapot in stadium: I didn't understand why an octree
wouldn't work well here anyway. Suppose the teapot is completely enclosed
in a level 8 voxel. That would only "waste" 8x8=64 voxels in getting down
to the space you would have chosen for just the teapot alone. Reflection
rays actually hitting the rest of the stadium would be very sparse, so go
ahead and crank up the max subdivision limit. Am I missing something?
-----------------------------------------------
(This is a reply to Olin Lathrop. Summary: "well, maybe the octree is not so
bad after all...").
From: Eric Haines
Olin Lathrop writes:
> To simplify the thought experiment, what if the original normal vector was exactly
> along the X axis? Well, the unit surface vectors would be the Y and Z axis
> vectors. When these are sent thru the regular 3x3 transformation matrix,
> they become the Y and Z basis vectors of that matrix. The final resulting
> normal vector is therefore the cross product of the Y and Z basis vectors of the
> regular matrix. This is then what the X basis vector of the normal vector
> transformation matrix should be. In general, a basis vector in the normal
> vector transformation matrix is the cross product of the other two basis
> vectors of the regular transformation matrix. It wasn't until about a year
> later that I realized that this resulting matrix was the inverse transpose
> of the regular one.
The problem is the sign of the basis vector is unclear by this method.
I tried this approach, but it fails on mirror matrices. Suppose your
transformation matrix is:
[ -1 0 0 0 ]
[ 0 1 0 0 ]
[ 0 0 1 0 ]
[ 0 0 0 1 ]
This matrix definitely affects the surface normal in X, but your two vectors
in Y and Z are unaffected. This problem never occurs in the "real" world
because such a matrix is equivalent to twisting an object through 4D space
and making it go "through the looking glass". However, it happens in computer
graphics a lot: e.g. I model half a car body, then mirror reflect to get the
other half. If you have a two sided polygon laying in the YZ plane, with one
side red & the other blue, and apply the above transformation, no vertices
(and no tangent vectors) have any non-zero X components, and so will not change.
But the normal does reverse, and the sides switch colors. My conclusion was
that you have to use the transpose of the inverse to avoid this problem, since
surface normals fail for this case. (p.s. did you get a copy of Glassner's
latest (2nd edition) memo on this problem? He does a good job explaining the
math).
> About octrees:
>
> 1) I don't use Andrew's hashing scheme either. I transform the ray so that
> my octree always lives in the (0,0,0) to (1,1,1) cube...
Actually, this is the exact approach I finally took, also. I had rejected
the hashing scheme earlier, and forgotten why (and misremembered that it was
because of memory costs) - the correct reason for not hashing is that it's
faster to just zip through the octree by the above method; no hashing is
needed. It's pretty durn fast to find the right voxel, I agree.
Have you experimented with trying to walk up and down the octree, that is, when
you are leaving an octree voxel you go up to the parent and see if the address
is inside the parent? If not, you go to its parent and check the address, etc,
until you find that you can go back down. Should be faster than the straight
downwards traversal when the octree is deep: the neighboring voxels of the
parent of the voxel you're presently in account for 3 of the 6 directions the
ray can go, after all. You have 1/2 a chance of descending the octree if you
check the parent, 3/4ths if you go up two parents, etc. (Where did I read of
this idea, anyway? Fujimoto? Kaplan? Whatever the case, it's not original
with me).
Another idea that should be mentioned is one I first heard from Andrew
Glassner: putting quadtree-like structures on the cube faces of the octree
cubes. It's additional memory, but knowing which octree cube is the next would
be a faster process. Hopefully Andrew will write this up sometime.
The subdivision criteria ideas are great - makes me want to go and try them
out! When are you going to write it up and get it published somewhere? Lazy
subdivision sounds worthwhile: it definitely takes awhile for the octrees to
get set up under my present "do it all at the beginning" approach (not to
mention the memory costs). That was something I loved about the Arvo/Kirk
paper - without it the 5D scheme would appear to be a serious memory hog.
> 4) Second solution to teapot in stadium: I didn't understand why an octree
> wouldn't work well here anyway. Suppose the teapot is completely enclosed
> in a level 8 voxel. That would only "waste" 8x8=64 voxels in getting down
> to the space you would have chosen for just the teapot alone. Reflection
> rays actually hitting the rest of the stadium would be very sparse, so go
> ahead and crank up the max subdivision limit. Am I missing something?
There are two things which disturbed me about the use of the octree for this
problem. One was that if the maximum subdivision level was reached prematurely
then the octree falls apart. I mentioned that you could indeed subdivide down
another 9 levels and have an 18 level octree that would work. However, the
problem with this is knowing when to stop - why not go on to 24 levels? For
me it boils down to "when do I subdivide?". I suspect that your additional
criteria might solve a lot of the pathological cases, which is why I want
to test them out. Also note that there are built in maximum subdivision levels
in octree schemes which could be reached and still not be sufficient (though
admittedly your 24 levels of depth are probably enough. Of course, I once
thought 16 bits was enough for a z-buffer - now I'm not so sure. Say you have
a satellite which is 5 feet tall in an image, with the earth in the background.
We're now talking 23 levels of subdivision before you get within the realm
of subdividing the satellite. With 24 levels of depth being your absolute
maximum you've hit the wall, with only one subdivision level helping you out
on the satellite itself).
Good point that as far as memory goes it's really just 8x8 more voxels "wasted".
One problem is: say I'm 8 feet in each direction from the teapot, with me and
the teapot in diagonally opposite corners of a cube which is then made into an
octree. The only way to get through the 8 cubes in the containing box is to
travel through 4 of them (i.e. if I'm in Xhi, Yhi, Zhi and the teapot is in
Xlo, Ylo, Zlo, then I have to intersect my own box and then three other boxes
to move me through in each "lo" direction). In this case there are only 3
levels of octree cubes I have to go through before getting to the 1 foot cube
voxel which contains the teapot. The drawback of the octree is that I have to
then do 3x4=12 box intersections which must be done each ray and which are
useless. Minor, but now think of reflection rays from the teapot which try to
hit the stadium: each could go through up to 8 levels x 4 voxels per level =
32 voxels just to escape the stadium without hitting anything (not including
all the voxels needed to be traversed from the teapot to the one foot cube).
Seems like a lot of intersection and finding the next octree address and tree
traversal for hitting the background. I suspect less bounding volumes would be
hit using hierarchy, and the tests would be simpler (many of them being just
a quick "is the ray origin inside this box?": if so, check inside the box).
I guess it just feels cleaner to have to intersect only bounding volumes
which are needed, which is the feel which automatic bounding volume hierarchy
has to it. Boxes can be of any size, so that if someone adds a huge earth
behind a satellite all that is added is a box that contains both. With
hierarchy you can do some simple tricks to cut down on the number of
bounding volumes intersected. For example, by recording that the ray fired
at the last pixel hit such and so object, you can test this object first for
intersection. This quickly gets you a maximum depth that you need not go
beyond: if a bounding volume is intersected beyond this distance you don't have
to worry about intersecting its contents. This trick seems to gain you about
90% of the speed-up of the octree (i.e. not having to intersect any more
voxels once an intersection is found), while also allowing you the speed up
of avoiding needless octree voxel intersections. I call this the "ray
coherency" speedup - it can be used for all types of rays (and if you hit
when the ray is a shadow ray, you can immediately stop testing - this trick
will work for the octree, too! Simply save a pointer to the object which
blocked a particular shadow ray for a particular light last pixel and try it
again for the next shadow ray).
I still have doubts about the octree. However, with lazy evaluation I think
you get rid of one of my major concerns: subdividing too deep makes for massive
octrees which soak up tons of memory. Have you had to deal with this problem,
i.e. has the octree ever gotten too big, and do you have some way to free up
memory (some "least recently used" kind of thing)?
An interesting comment that I read by John Peterson on USENET news some months
ago was:
>> [John Watson @ Ames:]
>> Anyway, I know there have been a few variations of the constant-time
>> algorithms around, and what I need to know is, what is the _best_,
>> i.e. simplest, most effiecent, etc, ... version to implement.
>>
>> Could some of you wonderful people comment on these techniques in general,
>> and maybe give me some pointers on recent research, implementions, etc.
>
> This is an interesting question. Here at Utah, myself and Tom Malley
> implemented three different schemes in the same ray tracer; Whitted/Rubin,
> Kay/Kajiya, and an octree scheme (similar to the Glassner/Kaplan, camp, I
> think). The result? All three schemes were within 10-20% of each other
> speedwise. Now, we haven't tested these times extensively; I'm sure you could
> find wider variances for pathological cases. But on the few generic test
> cases we measured, there just wasn't much difference. (If we get the time,
> we plan on benchmarking the three algorithms more closely).
I suspect that this is probably the case, with octree working best when the
scene depth (i.e. the number of objects which are intersected by each ray,
regardless of distance) is high, the "ray coherency" method outlined above for
hierarchy fails, and so cutting off early is a big benefit. Automatic hierarchy
probably wins when there are large irregularities in the density of the
number of objects in space. (Of course, the SEADS method (equal sized voxels
and 3DDDA) is ridiculous for solving the "teapot in a stadium" kind of
problems, but it's probably great for machines with lots of memory ray tracing
scenes with a localized set of objects.
By the way, I found Whitted/Rubin vs. Kay/Kajiya to be about the same: Kay had
less intersections, but the sorting killed any time gained. I find the
coherency ray technique mostly does what Kay/Kajiya does: quickly gets you a
maximum intersection depth for cutoff.
Without the memory constraints limiting the effectiveness of the octree I can
believe it well could be the way of the future: it is ideal for hardware
solution (so those extra voxel intersection and traversal tests don't bother me
if they're real fast), sort of like how the z-buffer is the present winner in
hidden surface algorithms because of its simplicity.
So, how's that for a turnabout on my polemical anti-octree position?
Nonetheless, I'm not planning to change my hierarchy code in the near future -
not until the subdivision and memory management problems are more fully
understood.
All for now,
Eric Haines
--------------------------------------------------
SUBSPACES AND SIMULATED ANNEALING
I started out intending to write a very short reply to Eric Haines's
"teapot in a football stadium" example, but it turned out to be rather
long. At any rate, most of what's described here (except for some of
the very speculative stuff near the bottom) is a result of joint work
with Dave Kirk, Olin Lathrop, and John Francis.
One way that we've dealt with situations similar to Eric's teapot
example is to use a combination of spatial subdivision and bounding
volume techniques. For instance, we commonly mix two or three of the
following techniques into a "meta" hierarchy for ray tracing a single
environment:
1) Linear list
2) Bounding box hierarchy
3) Octrees (including BSP trees)
4) Linear grid subdivision
5) Ray Classification
We commonly refer to these as "subspaces". For us this means some
(convex) volume of space, a collection of objects in it, and some
technique for intersecting a ray with those objects. This technique is
part of an "aggregate object", and all the objects it manages are the
"children". Any aggregate object can be the child of any other
aggregate object, and appears simply as a bounding volume and
intersection technique to its parent. In other words, it behaves just
like a primitive object.
Encapsulating a subspace as just another "object" is very convenient.
This is something which Dave and Olin and I agreed upon in order to make
it possible to "mix and match" our favorite acceleration techniques
within the same ray tracer for testing, benchmarking, and development
purposes.
As an example of how we've used this to ray trace moderately complex
scenes I'll describe the amusement park scene which we animated. This
consisted of a number of rides spread throughout a park, each containing
quite a bit of detail. We often showed closeups of objects which
reflected the rest of the park (a somewhat scaled down version of the
teapot reflecting the stadium). There were somewhere around 10,000
primitive objects (not including fractal mountains), which doesn't
sound like much anymore, but I think it still represents a fairly
challenging scene to ray trace -- particularly for animating.
The organization of the scene suggested three very natural levels of
detail. A typical example of this is
I) Entire park ( a collection of rides, trees, and mountains )
II) Triple decker Merry-go-round ( one of the rides )
III) A character riding a horse ( a "detail" of a ride )
Clearly a single linear grid would not do well here because of the scale
involved. Very significant collections of primitives would end up
clumped into single voxels. Octress, on the other hand, can deal with
this problem but don't enjoy quite the "voxel walking" speed of the
linear grid. This suggests a compromise.
What we did initially was to place a coarse linear grid around the whole
park, then another linear grid (or octree) around each ride, and
frequently a bounding box hierarchy around small clusters of primitives
which would fall entirely with a voxel of even the second-level (usually
16x16x16) linear grid.
Later, we began to use ray classification at the top level because, for
one thing, it did some optimizations on first-generation rays. The
other levels of the hierarchy were kept in place for the most part
(simplified a bit) in order to run well on machines with < 16 MB of
physical memory. This effectively gave the RC (ray classification)
aggregate object a "coarser" world to deal with, and drastically cut
down the size of the candidate sets it built. Of course, it also "put
blinders" on it by not allowing it to distinguish between objects inside
these "black boxes" it was handed. It's obviously a space-time
trade-off. Being able to nest the subspaces provides a good deal of
flexibility for making trade-offs like this.
A small but sort of interesting additional benefit which falls out of
nesting subspaces is that it's possible to take better advantage of
"sparse" transformations. Obviously the same trick of transforming the
rays into a canonical object space before doing the intersection test
(and transform the normal on the way out) also works for aggregate
objects. Though this means doing possibly several transforms of a ray
before it even gets to a primitive object, quite often the transforms
which are lower in the hierarchy are very simple (e.g. scale and
translate). So, there are cases when a "dense" (i.e. expensive)
transform gets you into a subspace where most of the objects have
"sparse" (i.e. cheap) transforms. [I'll gladly describe how we take
advantage of matrix sparsity structures if anybody is interested.] If
you end up testing N objects before finding the closest intersection,
this means that (occasionally) you can do the job with one dense
transform and N sparse ones, instead of N dense transforms. This is
particularly appropriate when you build a fairly complex object from
many scaled and translated primitives, then rotate the whole mess into
some strange final orientation. Unfortunately, even in this case it's
not necessarily always a win. Often just pre-concatenating the
transforms and tossing the autonomous objects (dense transforms and all)
into the parent octree (or whatever) is the better thing to do. The
jury is still out on this one.
Currently, all of the "high level" decisions about which subspaces to
place where are all made manually and specified in the modeling
language. This is much harder to do well than we imagined initially.
The tradeoffs are very tricky and sometimes counter-intuitive. A
general rule of thumb which seems to be of value is to put an "adaptive"
subspace (e.g. an octree, RC) at the top level if the scene has tight
clusters of geometry, and a Linear grid if the geometry is fairly
uniform. Judicious placement of bounding box hierarchies within an
adaptive hierarchy is a real art. On the one hand, you don't want to
hinder the effectiveness of the adaptive subspace by creating large
clumps of geometry that it can't partition. On the other hand, a
little a priori knowledge about what's important and where bounding
boxes will do a good job can often make a big difference in terms of
both time and space (the space part goes quintuple for RC).
Now, the obvious question to ask is "How can this be done
automatically?" Something akin to Goldsmith and Salmon's automatic
bounding volume generation algorithm may be appropriate. Naturally, in
this context, we're talking about a heterogeneous mixture of "volumes,"
not only differing in shape and surface area, but also in "cost," both
in terms of space and time. Think of each subspace as being a function
which allows you to intersect a ray with a set of objects with a certain
expected (i.e. average) cost. This cost is very dependent upon the
spatial arrangement and characteristics of the objects in the set, and
each type of subspace provides different trade-offs. Producing an
optimal (or at least good) organization of subspaces is then a very
nasty combinatorial optimization problem.
An idea that I've been toying with for quite some time now is to use
"simulated annealing" to find a near-optimal subspace hierarchy, where
"optimality" can be phrased in terms of any desired objective function
(taking into account, e.g., both space and time). Simulated annealing
is a technique for probabilistically exploring the vast solution space
(typically) of a combinatorial optimization problem, looking for
incremental improvements WITHOUT getting stuck too soon in a local
minimum. It's very closely linked to some ideas in thermodynamics, and
was originally motivated by nature's ability to find near-optimal
solutions to mind-bogglingly complex optimization problems -- like
getting all the water molecules in a lake into a near-minimum-energy
configuration as the temperature gradually reaches freezing. It's been
fairly successfull at "solving" NP-hard problems such as the travaling
salesman and chip placement (which are practically the same thing).
This part about simulated annealing and subspace hierarchies is all
very speculative, mind you. It may not be practical at all. It's
easy to imagine the "annealing" taking three CPU-years to produce a
data structure which takes an hour to ray trace (if it's done as a
preprocessing step -- not lazily). There are many details which I
haven't discussed here -- largely because I haven't figured them out
yet. For example, one needs to get a handle on the distribution of rays
which will be intersected with the environment in order to estimate the
efficiency of the various subspaces. Assuming a uniform distribution is
probably a good first approximation, but there's got to be a better way
-- perhaps through incremental improvements as the scene is ray traced
and, in particular, between successive frames of an animation.
If this has any chance of working it's going to require an interesting
mix of science and "art". The science is in efficiently estimating the
effectiveness of a subspace (i.e. predicting the relevant costs) given a
collection of objects and a probability density function of rays
(probably uniform). The art is in selecting an "annealing schedule"
which will let the various combinations of hierarchies percolate and
gradually "freeze" into a near-optimal configuration. Doing this
incrementally for an animation is a further twist for which I've seen no
analogies in the simulated annealing literature.
If you've never heard of simulated annealing and you're interested in
reading about it, there's a very short description in "Numerical
Recipes." The best paper that I've found, though, is "Optimization by
Simulated Annealing," by S. Kirkpatrick, C. D. Gelatt, Jr., and M. P.
Vecchi, in the May 13, 1983 issue of Science.
Does this sound at all interesting to anybody? Is anyone else thinking
along these or similar lines?
-- Jim Arvo