{"id":1718,"date":"2010-09-20T19:17:42","date_gmt":"2010-09-21T01:17:42","guid":{"rendered":"http:\/\/www.realtimerendering.com\/blog\/?p=1718"},"modified":"2010-09-20T19:28:42","modified_gmt":"2010-09-21T01:28:42","slug":"quick-gaussian-filtering","status":"publish","type":"post","link":"https:\/\/www.realtimerendering.com\/blog\/quick-gaussian-filtering\/","title":{"rendered":"Quick Gaussian Filtering"},"content":{"rendered":"<p>There are two speed tricks with Gaussian filtering using the pixel shader. The first is that the Gaussian filter (along with the box filter) is separable: you can filter horizontally, then vertically (or vice versa, of course). So for a 9&#215;9 filter kernel you then have 18 texture samples in 2 passes instead of 81 samples in a single pass. The second trick is that each of the samples you use can actually be in-between two texels, e.g. if you need to sample texels 1 through 9, you could sample just once in between 1 and 2 and use the GPU to linearly interpolate between the two, between 3 and 4, etc., for a total of 5 samples. So instead of 18 samples you could get by with 10 samples. This is old news, dating back to at least <a href=\"http:\/\/tog.acm.org\/resources\/shaderx\/\">ShaderX<sup>2<\/sup><\/a> and\u00a0<a href=\"http:\/\/http.developer.nvidia.com\/GPUGems\/gpugems_ch21.html\">GPU Gems<\/a>, and we talk about it in our 3rd edition around page 469 on.<\/p>\n<p>Some bits I didn&#8217;t know were discussed by this article by Daniel R\u00e1kos in <a href=\"http:\/\/rastergrid.com\/blog\/2010\/09\/efficient-gaussian-blur-with-linear-sampling\/\">his article<\/a>, and also coded up by <a href=\"http:\/\/www.geeks3d.com\/20100909\/shader-library-gaussian-blur-post-processing-filter-in-glsl\/\">JeGX <\/a>in a GLSL shader demo collection. First, I hadn&#8217;t thought of using the Pascal&#8217;s triangle numbers as the weights for the Gaussian (nice visualization <a href=\"http:\/\/en.wikipedia.org\/wiki\/De_Moivre%E2%80%93Laplace_theorem\">here<\/a>). To be honest, I&#8217;m not 100% sure that&#8217;s right, seems like you want the area under the Gaussian&#8217;s curve and not discrete samples, but the numbers are in the ball park. It&#8217;s also a heck of a lot easier than messing with the standard deviation; let&#8217;s face it, it&#8217;s a blur and we chop off the ends of the (infinite) Gaussian somewhat arbitrarily. That said, if a filtering expert wants to set me straight, please do.<\/p>\n<p>The second tidbit: by using the linear interpolation trick, this shader was found to be 60% faster. Which sounds about right, if you assume that the taps are the main cost: the discrete version uses 9 taps, the interpolated version 5. Still, guessing and knowing are two different things, so I&#8217;m now glad to know this trick actually pays off for real, and by a significant amount.<\/p>\n<p>The last interesting bit I learned was from a comment by heliosdev on Daniel&#8217;s article. He noted that computing the offset locations for the texture samples once (well, 4 times, one for each corner) in the vertex shader and passes these values to the pixel shader is a win. For him, it sped the process by 10%-15% on his GPU; another commenter, Panos, verified this result with his own benchmarks. Daniel is planning on benchmarking this version himself, and I&#8217;ll be interested what he finds. Daniel points out that it&#8217;s surprising that this trick actually gives any benefit. I was also under the impression that because texture fetches take so long compared to floating-point operations, that you could do a few &#8220;free&#8221; flops (as long as they weren&#8217;t dependent on the texture&#8217;s result) in between taps.<\/p>\n<p>Long and short, I thought this was a good little trick, though one you want to benchmark to make sure it&#8217;s helping. Certainly, constants don&#8217;t want to get passed from VS to PS, that sort of thing gets optimized by the compiler (discussed <a href=\"http:\/\/developer.nvidia.com\/object\/gpu_programming_guide.html\">here<\/a>, for example). But I can certainly imagine computing numbers in the VS and passing them down could be more efficient &#8211; my main worry was that the cost of registering these constants in the PS inputs might have some overhead. You usually want to minimize the number of registers used in a PS, so that more fragments can be put in flight.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are two speed tricks with Gaussian filtering using the pixel shader. The first is that the Gaussian filter (along with the box filter) is separable: you can filter horizontally, then vertically (or vice versa, of course). So for a 9&#215;9 filter kernel you then have 18 texture samples in 2 passes instead of 81 [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[414,413,415],"class_list":["post-1718","post","type-post","status-publish","format-standard","hentry","category-resources","tag-filtering","tag-gaussian","tag-pixel-shader"],"_links":{"self":[{"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/posts\/1718","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/comments?post=1718"}],"version-history":[{"count":8,"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/posts\/1718\/revisions"}],"predecessor-version":[{"id":1726,"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/posts\/1718\/revisions\/1726"}],"wp:attachment":[{"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/media?parent=1718"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/categories?post=1718"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.realtimerendering.com\/blog\/wp-json\/wp\/v2\/tags?post=1718"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}