_ __ ______ _ __
' ) ) / ' ) )
/' __. __ , / __ __. _. o ____ _, / / _ , , , _
/ \_(_/_/ (_/_ (_/ / (_(_/_(__<_/ / <_(_)_ / (_ Snell/Descartes
covers this...

Phil Dutre writes:
There are of course Durer's famous woodcuts (see an example on the cover of
the Global illumination compendium  shameless plug :) )
Also, Turner (not Whitted, but the British Landscape artist) studied the
subject in the early 19th century, but that is of course not the earliest ,
but it was the first 'course' on perspective and accurate drawing at the
Royal Society.
Also very interesting to read is David Hockney's "Secret Knowledge", where he
explains how renaissance artists use optics to paint and achieve accurate
geometry (basically using a camera lucida).
I guess you can make a point that if some caveman was tracing a silhouette
cast by a campfire he was 'tracing rays' ;)
[Interestingly, in Greek mythology the origin of art is attributed to someone
(I forget who) tracing a silhouette on a wall. I forget the details, but
believe it's in "A Short History of the Shadow" by Victor Stoichita.  EAH]

Finally, here's a document entitled "The Origins of Raytracing?" by Jonathan
Maxwell, dated January 12, 2003, used with permission:
The question of who was the first raytracer is a fascinating one.
The Conventional History

I think as far as tracing rays in optical instruments is concerned, and
taking the law of refraction as opposed to the law of reflection as being the
key to it, the readily available standard literature regards Willebrord
Snellius (15911626) as its inventor in Holland in 1621. Having said this, I
accept that discovering the law of refraction is not necessarily the same as
tracing rays. Snell's law was not well known until Rene Descartes (15961650)
published it in 1638, and even then he did not make it clear that he was
following Snell, so for some time Descartes was regarded as the originator of
Snell's law.
As far as the explicit tracing of rays is concerned, it looks as though
William Gascoine (16121644) in England was doing this in the 1620's or
1630's, but again this was not widely known until some time later (1692) when
William Molyneux, the author of _Dioptrica Nova_, attributed his first clue
of how rigorous raytracing might be done to hearing of it from the Astronomer
Royal, John Flamsteed, who in turn attributed his knowledge of it to papers
by Gascoine in his possession.
Going Back Further

However, the history of raytracing, especially with regard to refraction and
also its more generalised application to image synthesis, and the creation of
shadows (Sciography) and the like, goes back a lot further; to the Greek
philosopher Thales of Miletus (circa 585BC). Thales is thought to have
studied the geometry the Egyptians used and thereby came to understand what
we now call trigonometry. He not only used this information for astronomy,
but he also propounded a law of refraction, which turned out to be wrong.
Moving forward 1500 years, the Arab scientist Alhazan (Abu ibn alHasan ibn
alHaitham (9651039 AD approx) wrote an important treatise on many aspects
of optics in 7 books, using the concept of rays with arrows on to explain
geometrical optics. These books contained sections on aberrations, and even
diffraction, and were published in manuscript form in Arabic around about AD
984, and in Latin in 1269 (preceded by publication of his treatise on
rainbows in 1170). Finally they were published in print in Latin in 1572. I
do not know if Alhazan knew Snell's law; it wouldn't surprise me if he did,
and If he did then it's Alhazan's law, rather than Snell's law.
References

The historical details of Snell's law and Gascoine's work can be gleened from
books such as H.C.King's History of the Telescope (Griffin, London, 1955) and
Isaac Asimov's _Eyes on the Universe_ (Imprint Magazine, Bombay 1977, but
published before that in the UK I believe)
For the information on Alhazan I am indebted to a close friend Tareq AlBaho,
and also _The Oxford History of Science and Technology_.
Robert Temple's _The Crystal Sun_ (Century/Random House, London, 2000)
provided the information on Thales, and also further information on Alhazan.

Faster Yet Point in Polygon Test? by Stephen Bridgett (s.bridgett@qub.ac.uk)
[The CrossingsMultiplyTest() in the online code for Graphics Gems IV, at
http://www1.acm.org/pubs/tog/GraphicsGems/gemsiv/ptpoly_haines/, is the
fastest general test (without precomputes) for point in polygon testing. It's
a bit more efficient than the traditional test found in the
comp.graphics.algorithms FAQ (http://www.faqs.org/faqs/graphics/algorithms
faq/), as it avoids division. The idea, from Mark HaighHutchinson (now at
Retro Studios), is that you can turn the division into a multiply by keeping
track of which direction your test edge is going.
I can't say if the following will help, but knocking a little off this key
algorithm can have a noticeable performance effect so it's worth a try if
you're doing a lot of ray/triangle testing.  EAH]
I'm writing to say a big thankyou for your article in Graphics Gems IV on
point in polygon testing. I'm a student in N. Ireland, and have been writing
a small program in my spare time using Borland Delphi, to display models of a
few engineering components, and had been looking for an efficient but easily
implemented point in polygon test for concave polygons, and the
'CrossingsMultiplyTest()' was just what I needed. So thank you very much.
When converting it to Pascal, I noticed a possible slight improvement, which
you've probably thought of already anyway (I hadn't  EAH). I'd read that on
newer processors, conditional jumps could cause a slight cache delay. So, I
was thinking that the 'if' after the multiply test could be replaced with an
'xor' test, as it has the same truth table, i.e.
Original 'if':
==============
ytest:= (m[p1].ztz)*(m[p0].ym[p1].y)
>= (m[p1].yty)*(m[p0].zm[p1].z);
if ytest = yflag1 then Result:= not Result;
(The original C version was:)
if ( ((vtx1[Y]ty) * (vtx0[X]vtx1[X]) >=
(vtx1[X]tx) * (vtx0[Y]vtx1[Y])) == yflag1 ) {
inside_flag = !inside_flag ;
Using 'xor' test instead:
=========================
ytest:= (m[p1].ztz)*(m[p0].ym[p1].y)
>= (m[p1].yty)*(m[p0].zm[p1].z);
Result:= (ytest=yflag1) xor Result;
I haven't tested it, and in practice in may not make much actual difference
to the processing time, or may even slow it down, but it's a thought anyway.

BSP Plane Cost Function Revisited, by Eric Haines
We all know a BSP tree (or octree, or kd tree) helps in limiting the number
of ray/object intersections needed. But how much? MacDonald and Booth
[MacDonald89] already analyzed this problem fairly well, though due to space
limitations they did not include analysis of what happens when a splitting
plane hits one or more objects in a node. Also, their work, which has some
fascinating and useful results, appears to be less known than it should; this
article covers some of their contributions and goes a bit further.
MacDonald and Booth use the key fact that a child of a node has a chance of
being hit proportional to its surface area, one that I will also use
extensively (see RTNv10n1
http://turing.acm.org/pubs/tog/resources/RTNews/html/rtnv10n1.html#art4). As
a shorthand term, I'll call this the "area expectancy algorithm". MacDonald
calls it the "surface area heuristic", but at the time he probably didn't
realize there was a solid proof of this result in geometric probability, so
"heuristic" is a misnomer. For a formal justification, Fredo Durand
recommends _Integral Geometry and Geometric Probability by Luis Santalo,
AddisonWesley, 1976.
What good does splitting a BSP node by a plane do, as far as ray tracing
goes? The obvious thing it does is (usually) cut down on the number of
objects to be tested against the ray. If a ray hits the node, the splitting
plane is compared against the ray. The ray can do one of two types of things:
it can progress through just one side of the splitting plane (i.e. one child
of the node), or it can progress through one side and then the other (both
children). If it progresses through just one side, you then need to test only
those primitives on that side. If it goes through both sides, the potential
savings are smaller (or nil): you have some chance of hitting a primitive on
the first side traversed; if you hit (and the intersection point really is in
the first part of the node), then no primitives on the other side need to be
intersected. This can be useful for eye/reflection/refraction rays, since you
want the closest intersection, but is usually little help in accelerating
shadow rays.
Great Expectations

The chance of hitting a bounding box is proportional to its surface area. Say
NX, NY, and NZ are the dimensions of the box. The chance of hitting a given
child node is:
node hit expectance == ENH = NX*NY + NX*NZ + NY*NZ
(Actually, there is a factor of 2 we leave out here and elsewhere, which
accounts for the area for 6 faces vs. just 3. Since we leave it out
everywhere, it is not important in this article. That said, it is important
to also then treat polygons as onesided when computing their expectance.)
This "expectance" is not a percentage chance, but rather a surface area
measurement that can be compared with other surface areas to determine the
relative chance an object is hit. So, given a random ray, a convex object
with twice the surface of another convex object is twice as likely to be hit.
One principle of splitting a node is that you usually want to maximizes the
chance that a ray will hit only one of the two children nodes. Say you split
the node on the X axis at R, where R = 0 at minimum X and R = 1 at maximum X.
The expectance of hitting each child box is then:
child A hit expectance = ((R*NX)*NY + (R*NX)*NZ + NY*NZ)
child B hit expectance = (((1R)*NX)*NY + ((1R)*NX)*NZ + NY*NZ)
If you divide either of these expectances by the expectance of hitting the
node itself, you get the relative chance (a value from 0% to 100%) that a
subnode will be traversed if the node is hit. The chance that one child or
the other is traversed of course varies with R; lower R, lower chance of
being hit. What's interesting is that the chance that both child nodes will
be traversed by a ray is independent of R! This chance depends on the ray
hitting the plane splitting the node into two children. The ratio of this
area (the BSP plane inside the node) vs. the surface area of the node itself
is this chance:
plane hit expectance == EPH = NY*NZ
plane hit relative chance == CPH = EPH / ENH = NY*NZ / (NX*NY + NX*NZ + NY*NZ)
(Notational note: a variable of the form "E*H" is always the area/2 of an
object, which I call its expectance, and a variable of the form "C*H" is a
relative chance of hitting something compared to hitting something else,
computed by dividing one area by another.)
As an example, say the cube 1x1x1 is split by a plane:
CPH = 1*1 / ( 1*1 + 1*1 + 1*1 ) = 1/3
So for a cube hit by a ray, an arbitrary plane splitting the cube has one
chance in three of being hit, regardless of the split location. Since this
axisaligned splitting plane has the same area as a cube face, this also
means that a given cube face has a 1/3 chance of being hit by a ray. This is
surprising to me on an intuitive level. I would have thought that a plane
inside the middle of the cube would have a higher chance of being hit than a
cube face, but that's not the case.
To prove it to yourself, it's helpful to think about the area expectancy
principle in terms of orthographically projecting the object on to a plane
instead of shooting a ray at it. These are equivalent, in that you can think
of shooting a ray through a pixel and testing against an object as the same
as computing a pixel's chance of being covered by projecting the object into
a zbuffer. So, visualize a box and the splitting plane inside it and how
this would project orthographically on a screen. Slide the plane back and
forth along one axis of the box and it becomes clear that the screen area of
the splitting plane never changes. Therefore it's always equally likely to be
hit, regardless of its position in the cube.
The irrelevance of R on the splitting plane's chance of being hit makes it
easier to think about how best to split the node. Say the cube's dimension NX
is increased slightly, to 1.1. If your plane cuts the X axis, it now has 1
chance in 3.2 (0.325) of getting hit by a ray hitting the box, less than a
third. If your plane cuts the Y or Z axis, it now has a 1.1 chance in 3.2
(0.34375) of getting hit, greater than a third. This analysis verifies what
we intuitively know, that splitting a box along its longest axis minimizes
the chance a ray will hit both nodes and so is more likely to improve
efficiency.
To compute a few more examples, say the box was 2x1x1: the values are then a
1/5th chance of both children being hit if a split is made on the X axis vs.
2/5ths for either other axis. For a 10x1x1 it's 1/21 along X vs. 10/21. For a
flat, squarish box (like a CD case), say 10x10x1, splitting along X or Y
gives 10/120 chance of hitting both boxes, while splitting along Z gives
100/120.
Childcare Costs

We want to learn something about the cost of traversing an unsplit node vs. a
split one. The time to test an unsplit node is something like this:
time for unsplit node = TK1 + M*TP
where TK1 is some constant time spent intersecting the node and accessing its
contents, M is the number of primitives in the node, and TP is the time per
primitive (which admittedly can vary, but let's ignore that fact for now;
we'll come back to it later). Compare this to:
time for split node = TK1 +
time to hit both children + time to hit one child
Let's subdivide this equation into the four possible cases that a ray can
have when it hits a node: hits child A then B, hits B then A, hits child A
only, hits child B only. Call the probabilities of each CabH, CbaH, CaH, CbH.
These must sum to 1. We also know that CabH+CbaH = CPH, the probability of
hitting the splitting plane. It turns out that CabH==CbaH, i.e. that the
chance of a ray going from child A to B is the same as the other direction,
regardless of the location of the splitting plane. This is clear if you think
about it, since a random ray is as likely to hit the splitting plane coming
from one side as the other. So:
CabH == CbaH == CPH/2.
We can compute CabH and CbaH given CPH. What we want is the chance that one
and only one child is hit. Basically, we want all random rays that pass
through a child, minus the rays that hit the BSP plane face. For example, if
R is 0, then child A has no thickness and so all rays that hit it will also
hit the plane making up its face, so its chance of being the only child hit
will be 0. As R increases, rays can enter the sides of the child and possibly
not hit the BSP plane. When R is 1, then child A is equivalent to the
original node. We know that there is a percentage chance CPH of hitting the
BSP plane, so for this case we know that (1CPH) is the chance of hitting
child A but not the plane (i.e. not hitting both children).
At R=0 the relative area of child A is the same as that of the BSP plane, and
the chance of hitting only child A was zero, i.e. (EPHEPH)/ENH=0. At R=1 the
relative area of child A is ENH, and the chance of hitting only child A was
(ENHEPH)/ENH=(1CPH). We also know that at R=0.5 the chance for child A and
child B is the same, (1CPH)/2. The three values we know are on a line, and
the function is linear with R. Putting these facts together:
child A ONLY chance == CaH = (R*NX*NY + R*NX*NZ + NY*NZ) / ENH  CPH
child B ONLY chance == CbH = ((1R)*NX*NY + (1R)*NX*NZ + NY*NZ) / ENH  CPH
As a reality check, we'll test a 1x1x1 node. When R is 0, there is no chance
of hitting only child A and a 2/3rd chance of hitting only child B, leaving
the CPH chance of hitting the plane of 1/3rd. When R is 0.5, each child has a
1/3rd chance of being the only one hit.
These two functions simplify down to:
CaH = R*NX*( NY + NZ ) / ENH
CbH = (1R)*NX*( NY + NZ ) / ENH
With all this in mind, we can compute the average cost in time of a ray
traversing a node with two children leaf nodes:
node traversal cost = TK1 + TK2 + (1+CPH) * TK3 +
CabH * (Pa*TP + Ma*Pb*TP) +
CbaH * (Pb*TP + Mb*Pa*TP) +
CaH *(Pa*TP) +
CbH *(Pb*TP)
= TK1 + TK2 + (1+CPH) * TK3 +
TP * ( CUH * (Pa + Pb + Ma*Pb + Mb*Pa) +
CaH*Pa + CbH*Pb )
= TK1 + TK2 + (1+CPH) * TK3 +
TP * ( (CPH/2) * ( (Mb+1)*Pa + (Ma+1)*Pb ) +
CaH*Pa + CbH*Pb )
That's a lot of stuff; the last four terms are the four possible intersection
cases, A then B, B then A, A only, B only. TK2 is the cost of testing the ray
segment against the splitting plane and other fixed child access costs. TK3
is the cost of accessing a particular child node, multiplied by (1+CPH) since
one or two children's lists could be accessed. Pa is the number of primitives
inside only child A's box (Pb for child B). Ma is the chance of the ray
missing all the primitives in child A and so having to access child B. For
example, if there wasn't anything in child A, Ma would be 1.0, as the ray
will always reach child B if it were going that way.
Let's give this function a try. Say the TK values are pretty small overall,
maybe 0.2 of the time it costs to intersect a primitive, TP. Also assume that
the chance that a ray going from one to another child has, say, a 20 percent
chance of hitting a primitive in the first child and so not needing to test
the second, so Ma = Mb = 0.8. Also say that the node is a cube of length
1x1x1 (though overall scale does not matter), so CPH = 1/3. The formula for
these starting conditions are:
node traversal cost = 0.2*TP + 0.2*TP + (1+1/3) * 0.2*TP +
TP * ( (1/6) * ((0.8+1)*Pa + (0.8+1)*Pb) +
CaH*Pa + CbH*Pb )
= 0.666*TP +
TP * ( 0.3 * (Pa+Pb) +
CaH*Pa + CbH*Pb )
= 0.666*TP +
TP * ( (CaH + 0.3)*Pa +
(CbH + 0.3)*Pb )
= 0.666*TP +
TP * ( ((R*NX*( NY + NZ )/ENH + 0.3)*Pa +
((1R)*NX*( NY + NZ )/ENH + 0.3)*Pb )
= 0.666*TP +
TP * ( (2/3)*R + 0.3)*Pa +
(2/3)*(1R) + 0.3)*Pb )
So there's some fixed cost for traversing the BSP tree, in this case (2/3)*TP
(or whatever, depending how you set the TK values), and the main cost is
based on the number of primitives in each child times the chance of hitting
only that child, which depends on R, the relative location of the splitting
plane along the node's axis.
Let's say we put the splitting plane at R=0.5, the middle of the box, and
that there were 12 primitives and they nicely divide evenly into 6 in each
child (i.e. none are split by the BSP plane), so Pa = Pb = 6. We get:
example cost = 0.666*TP + TP * ( (1/3 + 0.3)*6 + (1/3 + 0.3)*6 )
= 0.666*TP + 7.6*TP
= 8.266*TP
Now, if you had not split the node and intersected all the primitives in it,
you'd have intersected 12 objects. So, by subdividing the node into two
children, you expect to spend the time equivalent to 8.266 primitives (and on
average you'll actually test the ray for intersection against 7.6 primitives,
plus traversal overhead). The savings, in this case, are from the fact that
there are two bounding boxes, and because there's a small chance that the ray
stops in the first child due to intersecting something. So, there's a savings
in subdividing a node by a BSP tree, but it doesn't cut the time in half,
since a ray potentially can hit both nodes.
If you go through this same analysis and assume the worst case that the ray
will always miss everything in the first child tested (i.e. Ma = Mb = 0.0),
you'll find that the value 0.3 goes to 1/3 and the final answer is "0.666*TP
+ 8*TP". If you assume the unlikely event that if the ray enters the first
child it won't escape and ever enter the second, the value 0.3 goes to 1/6
and the cost that comes out is "0.666*TP + 6*TP", which is exactly what you
would expect, that any ray would test against either the 6 primitives in one
child or the other, since the ray is (somehow) guaranteed to not continue and
pass between the two children.
One minor idea (that I won't really pursue here) is to compute these
occlusion terms based on the areas of the objects inside the node. For
example, if we knew each of the six objects in child A has an expectance
(surface area) of EOH, and child A's area expectance is EaH, then ((EaH
EOH)/EaH)^6 is a rough estimate of the chance a ray will miss all of these
objects. For example, say a 3x3x3 node had 6 1x1x2 object boxes fully inside
it somewhere. EaH is 27, EOH is 5 for each box, and the chance of missing
them all is (22/27)^6, 0.293, so the occlusion term Ma is 10.293=0.707. This
assumes the objects in the node are randomly distributed, etc. So it's
possible to get a handle on the Ma and Mb occlusion chance terms, but it's
work.
The occlusion term does matter, in our example it made the amount of
variation go from 6 to 7.6 objects tested per ray. However, this term is
meaningless for shadow testing because the shadow ray stops being tested
whenever any object is found that blocks it. Testing the contents of child A
before child B (or vice versa) does not matter. All that's important is that
the BSP tree minimize the number of objects found near the ray  for this
purpose, BSP planes that are more parallel to the ray vs. chopping it are
much more important (the only exceptions are BSP planes that eliminate
testing objects beyond the endpoints of the test shadow ray line segment).
Our node traversal equation for shadow testing becomes considerably
different, and I won't try to write it out here. It basically depends more on
the chance of hitting each primitive in the node, since a hit ends the
process.
Analyze This

Is this formula derived reasonable? Let's do an analysis of what happens when
we chop our usual 1x1x1 uniform 1000 points of stuff by X, then by Y, then by
Z, i.e. chopping it into 8 subnodes total. How much does this save overall?
In the first chop each child has 500 objects. We know that there's a 1/3
chance of hitting both nodes, and of hitting only one of the two nodes. So
there are 333 objects always tested, and 333 more objects that are tested
once (in either one child or the other), for 666 total.
Now we divide each 0.5x1x1 (aka 1x2x2) child into two 1x1x2 subchildren. The
average chance of hitting both of these subchildren, for a ray hitting the
child, is 0.25, with an 0.375 chance of hitting only one child. For 1000
objects this would give 250+375=625 ray/object tests, on average. Splitting
these 1x1x2 subchildren down to 1x1x1, the average chance of hitting both is
0.2, with 0.4 for hitting only one. 200+400 is 600 ray/object tests average
when one of these is split and it contains 1000 objects. Note how the
performance improves at each level, because a split always does more good the
proportionally less surface area (and so less chance that both nodes are
traversed) the splitting plane has. The more noncubic a node is, the better
the improvement.
Go down the tree: one way to think about things is that there's an average
testing cost of 666 at the first level. Let's instead think of the number of
objects tested as being reduced to 0.666 of the total amount. The next level
of the tree causes the number of objects to go down to 0.625 of its current
level, and the next after that causes a drop by 0.600. Multiply these
together: 0.666*0.625*0.600 = 0.250. So we predict that only 250 "point"
objects will be tested against a ray when using a BSP tree of three levels.
How does this analysis compare with just looking at the subsubchildren and
their average chance of being hit? At the bottom level each child has an
average of 125 real objects. Based on surface area, each of the eight
subchildren turns out to have one chance in four of a random ray hitting
them, so on average two subchildren are hit. So, three subdivisions gives
about a 4x speedup, with 250 objects expected to be tested. These predictions
match. That said, I believe neither is a perfect analysis, as both ignore the
geometric realities of the situation. If a ray hits both children, it cannot
also hit both subchildren of these children, it can hit only one additional
subchild. In other words, you can draw a straight line through the interiors
of only three, not four, of the squares in a 2x2 grid. The next level down
again adds only one subchild, i.e. a ray going through a 2x2x2 cube can hit
up to a maximum of 4 cells, not all 8. To be honest, I do not know how or
whether this matters to the analysis, but at least the exercise shows that
two independent analyses match. If anything, because there is an upper limit
on how many cells can be visited, this may lower the expected number of tests
further. I think this is a place where MacDonald's analysis may be faulty,
but am not really sure.
1000 Points of Stuff

Where things get interesting is in analyzing how this function respond to
various changes and conditions. One simple scene to analyze is the "1000
points of stuff" dataset. Imagine your 1x1x1 cube has 1000 objects in it,
each dimensionless (i.e. each just a point), uniformly distributed. The Ma
and Mb values, i.e. the chance of the first child traversed not stopping the
ray, are also 1. Furthermore, let's also set the time for testing a primitive
TP = 1, and ignore the fixed costs. This then gives us a function that shows
how many primitives have to be tested given various settings of the
dimensions, of R, and of the way objects in the scene are distributed. This
leaves the simple function:
simple NTC = CPH*(Pa+Pb) + CaH*Pa + CbH*Pb
This function is a fairly reasonable approximation of how the BSPtree would be
used in general. There's a little loss in efficiency from not caring that a
ray could be stopped in one child before reaching the other. There's also a
little overhead being ignored for traversing the BSPtree, so this sort of
balances out. By using this simpler function, we don't have to guess at what
the chance is that a ray will be blocked by a child, and we can set up
idealized situations and get "perfect" predictive results.
Expanding and simplifying this function out a bit:
simple NTC = (Pa+Pb) * CPH +
Pa * R*(1CPH) +
Pb * (1R)*(1CPH)
= Pa * CaH(R) + Pb * CbH(R)
As discussed earlier, CaH(R) is the chance that child A is hit at all by a
ray that hits the node (i.e. its area divided by the area of the node),
CbH(R) is the chance for child B. This is, in fact, the formula that
MacDonald uses in his first paper [MacDonald89].
MacDonald gives two interesting related results. He evaluates the cost of the
spatial median strategy and the object median strategy. In the spatial median
strategy, R is always 0.5, i.e. each node is always split in half. In the
object median strategy, the objects are sorted along the axis and the plane
inserted at the median of this object list. For example, if there are 1000
points of stuff, the plane is inserted halfway between point 500 and 501 on
the sorted list; the corresponding R might be close to 0.5, but might also be
very far from it. MacDonald's result: these two strategies will always yield
the same cost. Surprising! Here's why.
For the spatial median strategy, CaH(0.5) = CbH(0.5), since the nodes are the
same size. The number of objects in the scene is N = Pa+Pb. Also,
CaH(R)+CbH(R)=1+CPH. So the cost function is:
SpatialNTC(0.5) = Pa * CaH(0.5) + Pb * CbH(0.5)
= Pa * CaH(0.5) + Pb * CaH(0.5)
= (Pa+Pb) * CaH(0.5)
= N * CaH(0.5)
= (N/2) * (CaH(0.5)+CaH(0.5))
= (N/2) * (CaH(0.5)+CbH(0.5))
= (N/2) * (1+CPH)
In other words, as we've seen before, the number of primitives intersected is
halved by splitting the node, but then CPH raises this value due to rays that
hit both nodes. Note how this result is independent of how many primitives
are actually in either node.
For the object median strategy, Pa = Pb, and so Pa+Pa=N, so the cost function
is:
ObjectNTC(R) = Pa * CaH(R) + Pb * CbH(R)
= Pa * CaH(R) + Pa * CbH(R)
= Pa * (CaH(R) + CbH(R))
= Pa * (1+CPH)
= ((Pa+Pa)/2) * (1+CPH)
= (N/2) * (1+CPH)
Again, notice how this cost function has no reference to the distribution of
the objects themselves, it is purely a function of the number of objects to
be intersected and the relative crosssectional area of the node. MacDonald
notes that this equivalence is useful, in that the optimal splitting plane
must lie at or between these two R values (0.5 and whatever the object median
R is). This bears repeating and stressing: splitting the node in half (R=0.5)
gives the same average number of intersection tests as splitting the node at
the median of the sorted list of nonoverlapping objects (R is computed by
the algorithm). The optimum R value lies between 0.5 and the median R value,
and this optimum R value can actually give a cost value better than these two
strategies.
Unfortunately, this rule falls apart as soon as objects in the node are
pierced by the splitting plane, i.e. as soon as the objects have volume to
them and can get split. To skip ahead for a minute, imagine one object
spanning the interval [0,0.6] and another from [0.4,0.8]. The spatial median
is R=0.5, of course, and the object median is R=(0.3+0.6)/2=0.45, but both of
these planes pierce both objects and so have a cost value of 2. At R=0.4 and
R=0.6, neither of which are in the interval [0.45,0.5], the cost values go
down to 1.733. So, splitting objects throws off this concept.
As we saw with 12 items, splitting the 1x1x1 cube meant that 1/3rd of the
rays hit both children and 2/3rds hit just one. With 500 points in each child
when R=0.5, this gives an average of 666.66 objects that will be tested by a
random ray. With R=0 or 1, the function properly predicts that 1000 objects
will have to be tested (since no subdivision is actually taking place). The
graph as R goes from 0 to 1 is a parabola centered at R=0.5, pointing
upwards.
What if the distribution of points in the node is changed? Say we make it so
that the 1000 points are all in the area [0.0,0.2]*NX of the node, with the
rest of the [0.2,1.0]*NX volume empty. Do you have any guesses what the best
value for R will be? With the object median strategy, splitting the number of
objects so that half are in each child, you have R=0.1. You could also simply
split the node in half. Another strategy says that there's a large volume of
empty space, so the plane should be put right at R=0.2 so that one child is
entirely empty and rays hitting only it can quickly be discarded. The object
and spatial median strategies must give the same average, as we have seen,
and for 1000 objects this value is always 666.66, no matter what the
distribution  only the object median's R value varies. So we know the
optimum value is between R=0.1 and 0.5. You could find this minimum value of
R by expressing Pa and Pb as functions of R, substituting them in, and
differentiating this new function and look for where the slope is 0 and at
the bottom of a trough. Or you could be lazy like I am and write a Perl
program to brute force its way through values of R and find the best one. It
turns out the third, "bound the primitives" strategy is the best in this
case: R=0.2 gives a minimum value of an average of 466.66 objects that need
to be intersected.
Say we do the same experiment, but with the interval [0.0,0.5]*NX filled with
our 1000 objects, i.e. half the node is empty. The object median strategy
predicts R=0.25, and the "bound the primitives" method says R=0.5 might be
good, since it chops off the empty half of space. However, we know that the
spatial median strategy picks R=0.5 and will give the same cost as the object
median strategy, so the optimal answer lies between these two. Running the
analysis, the answer turns out to be that at R=0.375 the average is 625
objects tested against the ray. The split at 0.375 makes child A contain 750
objects and child B contain 250. So there's some point at which it's
worthwhile to include some objects in both children, but beyond that point
you should always make the splitting plane such that the one child is all
empty space.
For the 1x1x1 cube, this point turns out to be at R=0.333 (and at R=0.667, of
course), a third. What this means is that if your objects fill up a third or
less of the left or right end of a cubic node, then you want to put the
splitting plane so as to contain them all in one child. If the objects in the
node extend out further than this one third, you'll need to do some work to
find the best splitting plane.
Now here is the surprising part: this is true for a node of any dimensions.
The node could be 100x20x3, and it will still be true that if all the objects
in that node are between say [0,0.2], you'll want to put R=0.2, and this is
true up to [0,0.333] for a node of any dimensions along any axis.
Let's look at the cost function if we say that there is a uniform density
along the interval [0,Rd], with [Rd,1] devoid of objects. The cost function
is then (ignoring the scaling factor of the number of objects):
F(R,Rd) = [ R*CaH(R) + (RdR)*CbH(R) when R <= Rd
[ Rd * CaH(R) when R > Rd
The second part of this function, Rd*CaH(R), is obviously minimized when
R=Rd; a higher R just increases the surface area of the child node A. This
represents putting the splitting plane to enclose the entire data set in one
child.
The first part of this function has a derivative with respect to R that can
be used to find the minimum point of the function. This is located where:
0 = 4R  Rd  1
So:
R = ( Rd + 1 ) / 4
You can set Rd=R and solve: indeed, when Rd is 1/3rd, R is also 1/3rd. As Rd
increases, R then increases slower (1/4th as fast), so the ideal split point
moves from this 1/3rd point to 0.5 as Rd moves from 1/3rd to 1.0. So the
1/3rd answer is verified by the derivative equation, but I can't say I have a
good sense of why the value 1/3rd is significant.
Here's another example to ponder. Let's say we put 400 points in the interval
[0,0.4] and 600 points in the interval [0.6,1] (so there's nothing in
[0.4,0.6]). So, the density of points is 1.5 times as high in the second
interval vs. the first. Here's an illustration, showing the node, the
relative densities, and the two places to split:
+++
 # # # # # # # # # # # # #
 # # # # #   # # # # # # # 
# # # # #  # # # # # # # #
 # # # # #  # # # # # # # 
 # # # # #  # # # # # # # #
# # # # #   # # # # # # # 
 # # # # # # # # # # # # #
 # # # # #   # # # # # # # 
# # # # #  # # # # # # # #
 # # # # #  # # # # # # # 
 # # # # #  # # # # # # # #
# # # # #   # # # # # # # 
+++> X axis
R=0.0 0.4 0.6 1.0
Where is the best place to split this set of points? It will probably be in
[0.4,0.6], the empty zone, but where? It turns out that R=0.6 is the best
place. The two choices of splitting planes are symmetrical, so the rule
appears to be to put the plane closer to the volume with the higher density
of objects in it  this makes a certain sense. We could have guessed this
from the rule that the optimum is between 0.5 and the object median location
(which is at 0.666).
By the way, if the densities were the same, 500 points in each of the two
intervals, any plane in the interval [0.4,0.6] has the same minimal cost
function, so any would be an acceptable split. Note that there's no
preference for R=0.5 by the formula, i.e. the centerpoint is not superior to
where either set of data begins.
Stuck in the Middle with You

Up to this point our primitives have all been dimensionless points. The next
thing to do is to start looking at the effect of the BSP plane splitting
objects. This is where inefficiency comes into the BSP scheme. For example,
if your splitting plane intersects all 12 objects in the node, then both
children have 12 objects in them, and no efficiency is gained (in fact, it's
lost, since a ray has to waste time accessing the useless, no good, deadbeat
children). The function above does not show this fact.
Assume mailboxing gets used in the code, so that any primitives in both
children get tested only once by a particular ray. Also assume that the cost
of checking the mailbox is negligible (we could add it in, but it just
confuses things further). This is the formula with it all (well, almost 
it's also possible to vary TP per object, which I'll discuss later). From
here on in we'll be playing with this function and trying various simplifying
assumptions and see what happens. Here it is:
node traversal cost = TK1 + TK2 + (1+CPH) * TK3 +
CUH * (TK3 + (Pa+Pab)*TP + Ma*Pb*TP) +
CUH * (TK3 + (Pb+Pab)*TP + Mb*Pa*TP) +
CaH *((Pa+Pab)*TP) +
CbH *((Pb+Pab)*TP)
= TK1 + TK2 + (1+CPH) * TK3 +
TP * ( CUH * (Pa+Pb+2*Pab + Ma*Pb+Mb*Pa) +
CaH*(Pa+Pab) + CbH*(Pb+Pab))
= TK1 + TK2 + (1+CPH) * TK3 +
TP * ( Pab +
CUH * (Pa*(1+Mb) + Pb*(1+Ma)) +
CaH*Pa + CbH*Pb)
where Pa are the primitives exclusively in child A, Pb only in B, and Pab are
those shared by both children. The last step of the derivation is something
of a jump. What happens is that we know that 2*CUH+CaH+CbH = 1, these are
just the probabilities of hitting two children or just one, and so TP*Pab
comes out of the function. In other words, any primitive cut by the splitting
plane is just as expensive to intersect as if there was no splitting plane at
all, since it will always be tested by any ray hitting the node.
This fact brings up an interesting idea for storage: any objects that
are split by the BSP plane do not really have to be stored in both children.
In other words, a split node could have three lists: objects in child A, in
child B, and shared (hit by the split plane). When you test a node for ray
intersection, you would first test its shared list, then traverse the two
children. This would save a little memory: each object on the shared list is
stored in only one list instead of in both children's lists. There's one
other noticeable savings: mailbox avoidance. Shared objects are tested only
once with this scheme. In the traditional approach, if a ray travelled
through both children, each shared object would have to be accessed twice
(once in each child's list) and the mailbox test would then avoid the second
intersection test. Anyway, this alternate scheme is probably a wash
performancewise, and I'll bet someone's thought of it before I did, but it's
one that occurred to me due to analyzing where time goes due to a BSP split.
Hair Splitting

One classic question is, given the choice of splitting a node in half by a
plane along the X, Y, or Z axes, which axis should we pick? The functions
above address that question in a direct way: choose the one that gives the
lowest average number of ray intersections. Let's look at a 2x1x1 node with a
uniform distribution of 1000 points. We can all guess that spliting along the
center of the X axis is the best plan (so forming two 1x1x1 children), but
how much does this help? The answer turns out to be that an X split yields an
average of 600 ray/object tests, and a Y or Z split yields 700 ray/object
tests, with 1000 being the unsplit value. I thought this was interesting:
splitting the obvious axis does save us something, it saves 33% more tests
(400 vs. 300 fewer tests). The limit of savings is, of course, 500 ray/object
tests, which would happen with splitting something like a 1000x1x1 node,
where the splitting plane area is relatively tiny and so extremely few rays
hit both children.
What happens if we start paying attention to "Pab", the number of objects
chopped by the splitting plane? This changes everything, of course, since you
tend to want to minimize Pab.
Say you have a 1x1x1 node filled with 1000 straight strands of hair (or
uncooked spaghetti, say) extending along the X axis. It's obviously more
worthwhile to chop this node along the Y or Z axis, where Pab=0, because
chopping along the X axis will result in each child having the same list of
objects, Pab=1000  no help at all. But what happens if the chance of hitting
a strand of spaghetti varies? In other words, say space is filled with
strands of spaghetti, but each piece of spaghetti is 0.5 units long.
For an Nlength spaghetti strand the set of centers is [N/2,N/2+1], and the
overlap with a plane at say 0.5 is [0.5N/2,0.5+N/2]. So, if all the X
aligned spaghetti strands were 0.5 units long and randomly overlap a 1x1x1
node, these would have centers from X=0.25 to X=1.25 and would overlap a
plane at say 0.5 from center X=0.25 to X=0.75. So the probability of being
split by a plane at 0.5 is (0.750.25)/(1.25(0.25)) = 0.333.
Taking the difference of the endpoints of the intervals, the general formula
is then simply P(N) = N/(N+1). So a strand 10 units long that is known to
overlap the node has a 10/11 chance that it will pierce the BSP plane
(regardless of where the BSP plane is placed, in fact  the plane's location
is irrelevant).
So, say we have spaghetti bits of length 1 throughout space. That will mean
that Pab=P(N)*1000=(1/2)*1000=500, and the other 500 1unit spaghetti bits
will divide between the two children, on average 250 each. From our previous
analysis, 333+500=833 is the expected number of ray intersections that will
have to be performed. In fact, varying the length of the spaghetti bits gives
a function going from the "1000 points of stuff" minimum of 667 tests, up to
the worst case of 1000 tests, depending on the length of the spaghetti bits.
Let's look at the yield for a 1x1x1 node:
Object length Ray/object tests Savings
============= ================ =======
0.0 667 333
0.1 697 303
0.2 722 278
0.3 744 256
0.4 762 238
0.5 778 222
0.7 804 196
1.0 833 167
1.5 867 133
2.0 889 111
3.0 917 83
4.0 933 67
5.0 944 56
infinity 1000 0
If the objects we're trying to split up by a BSP plane are 0.1 of the
dimension of the 1x1x1 node, then we can save 30.3 percent of tests. If our
objects are about the size of the node itself, 1.0, we can save 16.7 percent
of tests, continuing to decrease as the object size increases. This implies
that as the node size decreases with each BSP cut there will likely be
diminishing returns in efficiency, since the relative object size increases
compared to the leaf node size.
This is the End, My Only Friend

This section's for the only person to reach this section, whoever he is. We
want to create the (nearly) ultimate optimal BSPtree. We can split on any
plane, and we can put the split wherever we want. MacDonald suggests
searching between the object and spatial medians, sampling a number of times
along this interval. But say we're insane and we want to optimize like crazy,
which also means searching outside of this interval.
Here's one way to get the perfect (at least judged locally by the formula)
split at any level. The idea is that when a plane is inserted, the only
places you need to check is where something is happening, i.e. where an
object's bounds in the node is located. MacDonald realized this, but he did
not give an ultimate algorithm. So here we go:
For each axis X, Y, Z:
Pab = Pa = 0
Pb = number of objects
Put the interval of each object that overlaps the node into a sorted list.
If an object's minimum is outside the node,
increment Pab, decrement Pb.
If an object's maximum is outside the node, ignore it.
Evaluate the cost function at every location on this list.
If it's a start location, evaluate, increment Pab and decrement Pb.
If it's an end location, decrement Pab, increment Pa, evaluate.
Use the splitting plane and location found to be optimal.
Dropping the constant TP, and including neither the fixed amounts of time
spent nor the occlusion of one child by another, the cost function can have a
few forms:
F(R) = Pab + Pa * CaH(R) + Pb * CbH(R)
= Pab +
(Pa+Pb) * CPH +
Pa * R*(1CPH) +
Pb * (1R)*(1CPH)
These are almost the same as the 1000 points of stuff formula, with the
additional cost of objects hit by the split plane included in. The first form
is more elegant, but hides the involved computations done by CaH(R) and
CbH(R), so we use the second in examples that follow.
So, for example, say you have 3 objects overlapping a 10x10x10 node, going
from [0,10] in all directions. Along the X axis object number one goes from
[4,7], object two from [2,8], and object three from [6,12].

object 1  ***********

object 2 2<************************************

object 3  *********************>12

+++++> X axis
X=0 2 4 6 8 10
R=0.0 0.2 0.4 0.6 0.8 1.0
When the first object is inserted into the sorted list, the list is (4s,7e),
where the "s" means the location is where an object's interval starts, "e"
where it ends. Object two's minimum extent, 2, starts outside the cube, so
"Pab" increments up to 1 and "Pb" decrements down to 2. The sorted list is
now (4s,7e,8e). Object three's maximum extent, 12, is outside the node, so
can be ignored. The final sorted list of object starts and ends is
(4s,6s,7e,8e).
Now go through the list and evaluate the cost function with each R value. Pab
starts at 1 in this case (because of object two). Since the test node is an
"s" start node, we know that an object is starting at this distance. So at
R=0.4 there are no objects that would be placed in child A, one object is
split, and two objects are in child B. So:
F(0.4) = 1 +
(0+2) * 1/3 +
0 * 0.4 * 2/3 +
2 * 0.6 * 2/3
= 2.4666
So if we split here, we would expect to test for intersection 2.4666 objects
with a random ray. Once we are done computing with the "4s" plane (and any
other start plane), we add one to Pab (now 2) and decrement Pb (now 1), since
we know that succeeding plane tests will hit two objects (one and two) until
we reach the end of one of them. Pa is still 0.
We then test "6s". F(R) is computed first, and turns out to be:
F(0.6) = 2 +
(0+1) * 1/3 +
0 * 0.6 * 2/3 +
1 * 0.4 * 2/3
= 2.6
Pab is then incremented to 3 and Pb decrements to 0. Note that in the
interval [6,7] all three objects would be penetrated by a plane, so splitting
the node would give no benefit.
We then hit "7e", and this signals the end of an object, so we first
decrement Pab, to 2, and add one to Pa, to 1, before computing F(R), which
turns out to be 2.8. Note that at any interval break, we always evaluate on
the "side" where Pab is lower, by evaluating before incrementing Pab and
after decrementing Pab.
The last entry on the sorted list is "8e", which first decrements Pab to 1
(only object three is left as being split), increments Pa to 2, and F(R) is
2.7333.
The smallest function value was at R=0.4, 2.4666, so a splitting plane is
best put at 0.4. Note that the intervals of the objects should all be grown
by some small epsilon, so that when the splitting plane is chosen it won't
intersect the object that is close to it. The Y and Z axes can be tested
similarly, and the axis plane with the smallest ray/object cost overall is
the one that is chosen.
Recall the rule that if objects are near one end or the other the optimal
splitting plane chops off the empty volume. So, a quick out on all this
testing is to first find the bounding box for the objects. If along an axis
the highest end value is less than R=1/3, or the lowest start value is
greater than R=2/3, then you can safely put the split plane at this end or
start value and be done with evaluating that axis.
Some other implementation notes: if the minimum value for a node comes back
as being close to the number of objects (say within 5 or 10 percent 
experiments are needed here, and your mileage will definitely vary), this
means that inserting a plane is probably not buying much performance and so
subdivision can halt. Alternately, you can fold in the fixed costs of
traversing children, as shown near the beginning of this article.
You don't really have to track all three of Pab, Pa, and Pb; you can track
just two and subtract these from the number of objects in the node to get the
third.
As usual, you can also have a maximum level beyond which you won't subdivide.
What's nice about the evaluation function is that it's pretty simple: you
plug the box dimensions and how many primitives are to the right, left, or
pierced by the plane and get an answer. You may also want to experiment with
setting the chances a child blocks a ray, Ma and Mb.
The basic algorithm treats all primitives as equally costly to intersect,
i.e. TP is a constant. If desired (i.e. you are truly out to massively
optimize as much as possible), it's straightforward to factor in the cost of
various types of primitives, assuming you have reasonable values for these.
Instead of simply incrementing and decrementing the various counts such as
Pab in the algorithm, you increment and decrement by the approximate amount
of time that the primitive costs to test.
Now the bad news. Recursively doing this split procedure should be fairly
nearoptimal locally, but it will not necessarily be globally optimal. For a
distribution of objects, it may be the case that a suboptimal split at one
level provides a number of perfectly optimal splits to be done at lower
levels, giving a better tree overall. MacDonald discusses this problem and
how simple BSPtree algorithms outperformed his planesearching method under
certain circumstances.
Still, for "n" objects to be inserted into an BSPtree, this algorithm is O(n
log n) per level (due to the sort). It gives an interesting mix of the
externally imposed BSPtree mixed with splitting planes that try their best to
tightly bound at least one object in the node itself.
[MacDonald89] J. David MacDonald and Kellogg S. Booth, "Heuristics for Ray
Tracing Using Space Subdivision", Graphics Interface '89, pp. 152163, 1989.

RTNews Errata, by Nikos Platis (nplatis@di.uoa.gr)
[Included here in case someone's mirroring these pages or printed them out or
whatever. I have fixed the web pages in question with this information. 
EAH]
I would like to inform you of some mistakes that I believe I found in
articles that I used.
In "Ray/Triangle Intersection with Barycentric Coordinates"
(http://www.acm.org/tog/resources/RTNews/html/rtnews5b.html#art3), the
denominators for s and t must be "(length N)^2". Also, in the picture, s and
t (or vertices 2 and 3) must be swapped: s seems to be the barycentric
coordinate with respect to vertex 2 and t the barycentric coordinate with
respect to vertex 3; the same is implied by the relations given.
In "Pluecker Coordinate Tutorial"
(http://www.acm.org/tog/resources/RTNews/html/rtnv11n1.html#art3), in the
second relation provided ( L = {PQ:PxQ} ), I think it should be clarified
that the line is directed from Q to P; this is important since Pluecker
coordinates describe directed lines, and also in accordance to the next
relation, for which it would be U=PQ.

END OF RTNEWS