The Mystery of the Convolution Kernel

Richard Jones wrote on 1/9/2010, 2:56 AM

Am I alone in being baffled by the Convilution Kernel?

Maybe it's just me, but I can't make out what it is, what it can be used for or how to use it.

I've seen in different places that both Douglas Spotted Eagle (I do hope he's continuing to make a good recovery) and Glen Chann have suggested that it can be used to sharpen an otherwise unsharp image but I'm not sure how or why or whether it has any other functions as well. The only more detailed reference I have been able to find was in this forum posting by Glen on10 March 2008 at:-

http://www.sonycreativesoftware.com/forums/ShowMessage.asp?ForumID=4&MessageID=582552

I have attempted to find an answer by Trial and Error but with so many combinations on offer (three lines of three boxes under "Convolution Matrix," three more boxes under "Matrix Operation" and another box to be checked or unchecked) it is diifficult to know where to start or how to follow up!

Any help will be much appreciated.

Richard

Comments

MarkWWW wrote on 1/9/2010, 7:35 AM

You can see the kinds of things it can do by just looking at the various presets that are provided - things like blur or sharpen, enhance edges, or create a bumpmap or embossed effect.

What the Convolution Kernel does is to create an output image by applying arithmetical processes to the values of the pixels in the input image.

Specifically, the value of each output pixel is calculated by taking the values of the corresponding input pixel, together with the values of the eight input pixels that surround it, and multiplying each of them by a fixed value (these fixed values are the numbers shown in the Convolution Matrix). These 9 multiplied values are then all added together and multiplied by a scaling factor (Scale), normally chosen to bring the result back down to a number close to 1 (to stop the result becomming to dark or light). (This scaling factor will be worked out automatcally for you if you tick the Auto Normalise box.) Then a final adjustment is applied by adding a specific fixed value (the Offset) to the value of the output pixel. (There is one further complication in that the matrix can be rotated with respect to the frame of the picture if required.)

The result of these arithmetic processes is that each pixel in the output picture is generated from the information in the patch of 9 pixels surrounding its location in the input picture.

I think the easiest way to start to see what this means and how it works is to look at the Blur preset. Here you can see that the centre pixel is being multiplied by 5, and all its 8 neighbours are being multiplied by 4. Once these values are added together the total is then auto-scaled by a factor of 0.027 (8*4 + 5 = 37 and 0.027 = 1/37) to get the result back into the normal range, and I think it is pretty easy to see that what is going to appear at the output pixel is going to be an average of the values of the pixels in the neighbourhood of the input pixel (with a slight emphasis to the centre pixel). And if you imagine what will happen when you apply this averaging process for every pixel in the output image, it is pretty intuitive to see that the result will be a blurred representation of the original image.

Probably the next simplest preset to understand is the Emboss preset. Here you can see that we are only looking at the values of the pixels to the top left and bottom right of each pixel - the value of all the other 6 neighbours and even the value of the pixel itself in the input picture are ignored. If you consider what the effect of the multiplication and adding will be you can see that the value of the output pixel will be positive if the pixel to its upper left was brighter than that to its lower right, and negative if the opposite is the case. for all pixels where the upper left and lower right neighbours are of equal brightness, the value will be zero. (Since we need each value to be in the range between zero and one, we need to add an offset of 0.5 to make sure that this is the case.) Now if we consider what will happen when we apply this arithmetic to every pixel in the output image we can see that we will get large areas of roughly unifiorm grey (where there was little or no difference between TL and BR), but some areas of light or dark (where there was a difference between TL and BR, i.e. at edges and other features). That is, we will get an outline view of the edges/features in the original picture, such that the outlines are either light or dark depending on the direction of the light gradient at that point. And to the human eye/brain, this looks like an embossed image - the brighter edges are interpreted as being illuminated and the dark edges as being in shadow, under low angle illumination.

The Bump preset can be understood as a more extreme version of the Emboss effect, but with the original image imformation added back in - notice that the value for the centre pixel is now 1 rather than the zero in the Emboss preset. And the various Edge effects can be understood by generalising the Emboss effect to the case where we are looking in every direction to find a difference (gradient), not just in the TL-BR direction.

If you experiment with various values in the matrix, you will be able to create more variants of these kinds of effects. It's probably a good idea to leave the Auto Normalise box ticked unless you deliberately want to lighten or darken the output.

Hope that helps to demystify things a bit.

Mark

TeetimeNC wrote on 1/9/2010, 9:56 AM

Mark, that is very informative, even to someone who sorta thought they understood CK.

Jerry

Richard Jones wrote on 1/10/2010, 2:18 AM

Mark,

That's a fantastic explanation. I'm going to have some fun over the next few days playing around on the lines you suggest. Very many thanks.

Richard