<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Real-Time Rendering &#187; pixel shader</title>
	<atom:link href="http://www.realtimerendering.com/blog/tag/pixel-shader/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.realtimerendering.com/blog</link>
	<description>Tracking the latest developments in interactive rendering techniques</description>
	<lastBuildDate>Sun, 12 May 2013 00:21:14 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>Quick Gaussian Filtering</title>
		<link>http://www.realtimerendering.com/blog/quick-gaussian-filtering/</link>
		<comments>http://www.realtimerendering.com/blog/quick-gaussian-filtering/#comments</comments>
		<pubDate>Tue, 21 Sep 2010 01:17:42 +0000</pubDate>
		<dc:creator>Eric</dc:creator>
				<category><![CDATA[Resources]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[Gaussian]]></category>
		<category><![CDATA[pixel shader]]></category>

		<guid isPermaLink="false">http://www.realtimerendering.com/blog/?p=1718</guid>
		<description><![CDATA[There are two speed tricks with Gaussian filtering using the pixel shader. The first is that the Gaussian filter (along with the box filter) is separable: you can filter horizontally, then vertically (or vice versa, of course). So for a 9&#215;9 filter kernel you then have 18 texture samples in 2 passes instead of 81 [...]]]></description>
			<content:encoded><![CDATA[<p>There are two speed tricks with Gaussian filtering using the pixel shader. The first is that the Gaussian filter (along with the box filter) is separable: you can filter horizontally, then vertically (or vice versa, of course). So for a 9&#215;9 filter kernel you then have 18 texture samples in 2 passes instead of 81 samples in a single pass. The second trick is that each of the samples you use can actually be in-between two texels, e.g. if you need to sample texels 1 through 9, you could sample just once in between 1 and 2 and use the GPU to linearly interpolate between the two, between 3 and 4, etc., for a total of 5 samples. So instead of 18 samples you could get by with 10 samples. This is old news, dating back to at least <a href="http://tog.acm.org/resources/shaderx/">ShaderX<sup>2</sup></a> and <a href="http://http.developer.nvidia.com/GPUGems/gpugems_ch21.html">GPU Gems</a>, and we talk about it in our 3rd edition around page 469 on.</p>
<p>Some bits I didn&#8217;t know were discussed by this article by Daniel Rákos in <a href="http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/">his article</a>, and also coded up by <a href="http://www.geeks3d.com/20100909/shader-library-gaussian-blur-post-processing-filter-in-glsl/">JeGX </a>in a GLSL shader demo collection. First, I hadn&#8217;t thought of using the Pascal&#8217;s triangle numbers as the weights for the Gaussian (nice visualization <a href="http://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem">here</a>). To be honest, I&#8217;m not 100% sure that&#8217;s right, seems like you want the area under the Gaussian&#8217;s curve and not discrete samples, but the numbers are in the ball park. It&#8217;s also a heck of a lot easier than messing with the standard deviation; let&#8217;s face it, it&#8217;s a blur and we chop off the ends of the (infinite) Gaussian somewhat arbitrarily. That said, if a filtering expert wants to set me straight, please do.</p>
<p>The second tidbit: by using the linear interpolation trick, this shader was found to be 60% faster. Which sounds about right, if you assume that the taps are the main cost: the discrete version uses 9 taps, the interpolated version 5. Still, guessing and knowing are two different things, so I&#8217;m now glad to know this trick actually pays off for real, and by a significant amount.</p>
<p>The last interesting bit I learned was from a comment by heliosdev on Daniel&#8217;s article. He noted that computing the offset locations for the texture samples once (well, 4 times, one for each corner) in the vertex shader and passes these values to the pixel shader is a win. For him, it sped the process by 10%-15% on his GPU; another commenter, Panos, verified this result with his own benchmarks. Daniel is planning on benchmarking this version himself, and I&#8217;ll be interested what he finds. Daniel points out that it&#8217;s surprising that this trick actually gives any benefit. I was also under the impression that because texture fetches take so long compared to floating-point operations, that you could do a few &#8220;free&#8221; flops (as long as they weren&#8217;t dependent on the texture&#8217;s result) in between taps.</p>
<p>Long and short, I thought this was a good little trick, though one you want to benchmark to make sure it&#8217;s helping. Certainly, constants don&#8217;t want to get passed from VS to PS, that sort of thing gets optimized by the compiler (discussed <a href="http://developer.nvidia.com/object/gpu_programming_guide.html">here</a>, for example). But I can certainly imagine computing numbers in the VS and passing them down could be more efficient &#8211; my main worry was that the cost of registering these constants in the PS inputs might have some overhead. You usually want to minimize the number of registers used in a PS, so that more fragments can be put in flight.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.realtimerendering.com/blog/quick-gaussian-filtering/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>