Tag Archives: fps

Benchmarking tweets

I asked what others did for benchmarking in my last post. Here are the replies on Twitter in a semi-coherent edited form. If I missed any replies, I blame Twitter, whose interface is a magical maze.

First there were some FPS vs. SPF comments:

Richard Mitton: If you’re not measuring in milliseconds then you’re doing it wrong.

Christer Ericson: Yes, ms, not FPS. FPS is not a linear unit for the artists (or anyone).

Marc Olano: FPS isn’t linear. Usual definition of median averages middle 2 for even samples = also wrong. Use ms.

Morgan McGuire notes: FPS *is* a good measure if what you care about is interaction or visual smoothness. SPF is good for computational efficiency.

I replied to Richard & Christer: I’m interested in your reaction to the use of median vs. mean. FPS vs. SPF irrelevant for relative performance.

I also changed the original post to talk about milliseconds instead of frames, to avoid this facet of the discussion.

Christer Ericson: It’s important to catch the spikes, so in the context you’re talking about I would do max. Or mean+variance. Also, don’t think I’ve ever, for profiling reasons, looked at any average. You always look at a specific frame.

Timothy Lottes: I’m personally only interested in worst case ms/frame.

Cass Everitt: Agree with those that concentrate on worst times.

Eric Haines: Right, it depends what you’re looking for, e.g. don’t drop below 60 FPS. I’m mostly warning against using mean.

I added a note to the original post about tracking the max, which makes sense if you’re trying to guarantee a frame rate.

Tobias Berghoff, who benchmarks consoles:

I use min/max/med the most. Averages really only come into play when I need more digits. I spend significant amount of time below the 0.5% mark when wearing my platform tuning hat. I don’t miss trying to get sensible numbers out of PC h/w. But this also comes into play when measuring very short processes. When something only takes a couple of microseconds, you often end up oscillating between states that make the distribution multi-modal. Median won’t catch small shifts.

cupe: Stacked color-coded graph of nested timings (or a subtree of it). Usually unfiltered for analysis, avg for comparisons. Hierarchy is on the left, tooltip displays e.g. “scene/fluid/poisson”, click to restrict. Horizontal lines are milliseconds, orange line is 16.6 ms.

cupe1

E.g. click the big violet bar to see only post (and zoom in to stretch 4ms to screen):

cupe2

Javdev: We use a profiler, Adobe Scout, select multiple frames & see which code is most expensive & iterate it to prevent frame drops.

Björn Blissing: One option is to plot a histogram over the captured data. Reveals if your max/min are outliers or more common occurrences.

Michael Marcin: Try always running circular etwtrace and when frame time dips save and examine the trace.

Mikkel Gjoel: We filter in viewer. Options for all mentioned, and vsync (as that is what we are shipping).

Gjoel

Fabian Giesen: General order statistics (percentiles etc.) are good. Just a plot of frame durations over frame # is helpful, too! And simply recording all frame durations over a few seconds, sorting them and plotting that is quite handy, too. That gives you all the percentiles (and median etc.) and gives you a feel for the shape of the distribution, which matters. (I’m not very happy with single-value summaries; they lose too much information.)

Jaume Sanchez Elias: I like Chrome FPS meter: current, min, max; over time; frequency graph for each framerate

Elias

Krzysztof Narkowicz: Min, max, avg and std dev. Percentiles and med would make a nice addition, but it’s a hassle to compute them.

Anton the Mighty: I always use the standard deviation or standard error and indicated what value n sample size is. Most gfx benchs=bad. It’s usually worth also eyeballing actual data in detail because repeating patterns show either cycles or error in timers. Most recently there was something a friend had with the power manager in windows causing a cycling load on the cpu. I also visually check out timing for cpu+gpu functions across frames with apitrace etc. pretty neat.

All for now – feel free to email or tweet me with anything you want to add.

 

Don’t be mean

[Some on Twitter noted that I should be using milliseconds instead of FPS. This kind of misses the point, but let’s avoid distractions, here’s the article with that change. The sad part is that you then miss my hilarious joke about how I use FPS in the article, because if I used SPF you’d think I was talking about tanning. Which makes me think of another joke about rendering cows and the time it then takes to tan their hides. I’m full of great dad jokes.]

I think I’m reading “The Economist” too much, as I keep trying to come up with punny article titles. Sorry.

So, how do you measure a representative value for milliseconds per frame?

I don’t care about the mechanics, which timer call you use, etc. Just assume you successfully start timer/end timer and get some length of time in milliseconds for the frame. What do you do with these timings?

I usually see things such as an average, or a running average (average of last 20 or 50 or 100 or whatever frame times). I think this is mostly bad. As someone pointed out, almost everyone has more than the average number of legs. I find the same: in a given run there can sometimes be some frames where things noticeably slow down for whatever reason, some load on the computer. What you’re often trying to measure (as a graphics developer) is the performance of the rendering system itself, not the computer’s overall performance.

So, I currently use one of these two, or both: shortest time, or median time, over whatever set of frame times I have. Both have their uses. Shortest time is justifiable (to me, at least) because, assuming you have a very fine-grained timer, your best time is in some sense the “purest” measurement of the time a frame takes. Whatever other processes in your system are slowing down the other frames isn’t your concern. The timer doesn’t lie, you really did go that fast for one frame.

The other measure I’m OK with is the median. If your benchmarking system is going through a series of different frames (an animation or simulation is running, or the camera is orbiting, etc.), then grabbing the median frame is good. Choosing it instead of the average then doesn’t give so much weight to outliers. Better yet, graph the results and see whether the outliers are consistent.

Update: A number of game and VR developers pointed out that their major interest is maximum frame time. Makes sense: for a good experience (especially with VR) you don’t want to drop below your target of 30 FPS, 60 FPS, or 90 FPS.

My point is that the average, the mean, is not so good: often external slowdowns throw off the average enough and at random enough intervals that the average is very noisy and so, pretty useless. Taking the median, the central time of the sorted set, cuts out much of this variance, making each sample have an equal effect on the result.

Anyway, that’s where I’m at with benchmarking. What do you do? Comment here, tweet-reply, or email me at erich@acm.org and I’ll summarize.

p.s. pro tip: walk through your rendering pipeline every once in awhile, watching each step. It’s hard to really know where the time goes without doing so. I did this last week while looking at another bug and found a little logic error was causing a certain path to always do an additional post-process when it usually wasn’t needed. Free performance boost with a two-line fix! But, not something discoverable by benchmarking, because the variance is too much to notice “just” a few frames of difference.

This happens every few years. My favorite lucky find was around 15 years ago, walking through code in an established project and seeing that it was rendering twice for each time it displayed. A one-line change gave us 2x performance.

60 Hz, 120 Hz, 240 Hz…

Update: first, take this 60 vs. 30 FPS test (sadly, now gone! Too much traffic, is my guess). I’ll assume it’s legit (I’ll be pretty entertained if it isn’t). If you get 11/11 consistently, what are you looking for?

A topic that came up in the Udacity forum for my graphics MOOC is 240 Hz displays. Yes, there are 240 Hz displays, such as the Eizo Foris FG2421 monitor. My understanding is that 60 Hz is truly the limit of human perception. To quote Principles of Digital Image Synthesis (which you can now download for free):

The effect of temporal smoothing leads to the way we perceive light
that blinks, or flickers. When the blinking is slow, we perceive the
individual flashes of light. Above a certain rate, called the critical
flicker frequency (or CFF), the flashes fuse together into a single
continuous image. Far below that rate we see simply a series of still
images, without an objectionable sense of near-continuity.

Under the best conditions, the CFF for a human is around 60 Hz [389].

Reference 389 is:

Robert Sekuler and Randolph Blake. Perception. Alfred A. Knopf, New York, 1985.

This book has been updated since 1985, the latest edition is from 2005. Wikipedia confirms this number of 60 Hz, with the special-case exception of the “phantom array effect”.

The monitor review’s “Response Time and Gaming” section notes:

Eizo can drive the LCD panel at 240 Hz by either showing each frame twice or by inserting black frames between the pictures, which is known to significantly reduce blurring on LCD panels.

This is interesting: the 240 Hz is not that high because the eye can actually perceive 240 Hz. Rather, it is used to compensate for response problems with LCD panels. The very fact that an entirely black frame can be inserted every other frame means that our CFF is clearly way below 240 Hz.

So, my naive conclusions are that (a) 240 Hz could indeed be meaningful to the monitor, in that it can use a few frames that, combined by the visual system itself, give a better image. This Hz value of the monitor should not be confused with the Hz value of what the eye can perceive. You won’t have a faster reaction time with a 120 Hz monitor.

The thing you evidently can get out of a high-Hertz monitor is better overall image quality. I can imagine that, on some perfect monitor (assume no LCD response problem), if you have a game generating frames at 240 FPS you’re getting rendered 4 frames blended per “frame” your eye received. Essentially it’s a very expensive form of motion blur; cheaper would be to generate 60 FPS with good motion blurring. Christer Ericsson long ago informally noted how a motion-blurred 30 FPS looks better to more people than 60 FPS unblurred (and recall that most films are 24 FPS, though of course we don’t care about reaction time for films). What was interesting about the Eizo Foris review is that the reviewer wants all motion blur removed:

You probably already own a 120 Hz monitor if you are a gamer, but your monitor most likely does not have the black frame insertion technology, which means that motion blurring can still occur (even though there is not [sic] stuttering because of 120 Hz). These two factors are certainly not independent, but 120 Hz does not ensure zero motion blurring either, as some would have you believe.

The type of motion blur they describe here is an artifact, blending a bit of the previous frame with the current frame. This sort of blur I can imagine is objectionable, objects leaving (very short lived) trails behind them. True (or computed) motion blurring happens within the frame itself, simulating the camera’s frame exposure length, not with some leftover from the previous frame. I’d like to know if gamers would prefer 60 FPS unblurred vs. 60 FPS “truly” blurred. If “unblurred” is in fact the answer, we can cross off a whole area of active research for interactive rendering. Kidding, researchers, kidding! There would still be other reasons to use motion blur, such as the desire to give a scene a cinematic feel.

For 30 vs. 60 FPS there is a “reaction time” argument, that with 60 FPS you get the information faster and can react more quickly. 60 vs. 120 vs. 240, no – you won’t react faster with 240 Hz, or even 120 Hz, as 60 Hz is essentially our perceptual maximum. My main concern as this monitor refresh speed metric increases is that it will be a marketing tool, the equivalent of Monster cables to audiophiles. Yes, there’s possibly a benefit to image quality. But statements such as “there is not [sic] stuttering because of 120 Hz” make it sound as if our perceptual system’s CFF is well above 60 Hz – it isn’t. The image quality may be higher at 120 or 240 Hz, and may even indirectly cause some sort of stuttering effect, but let’s talk about it in those terms, rather than the “this faster monitor will give you that split-second advantage to let you get off the shot faster than your opponent” discussion I sometimes run across.

That said, I’m no perception expert (but can read research by those who are), nor a hard-core gamer. If you have hard data to add to the discussion, please do! I’m happy to add edits to this post with any rigorous or even semi-rigorous results you cite. “I like my expensive monitor” doesn’t count.

p.s. I got 4/11 on the test, mainly because I couldn’t tell a darn bit of difference.

The evils of fps

I completely agree with this blog post by Humus on the uselessness of the performance numbers in most rendering papers.  This is something that often comes up when reviewing papers.  Frames-per-second (fps) numbers are less than useless, since they include extraneous information (the time taken to render parts of the scene not using the technique in question) and make it very difficult to do meaningful comparisons.  The performance measurement game developers care about is the time to execute the technique in milliseconds.

Some papers do get it right, for example this one.  The authors use milliseconds for detailed performance comparisons, only using fps to show how overall performance varies with camera and light position (which is a rare legitimate use of fps).