Linear algebra and digital image processing. Part III. Affine transformations

Linear algebra and digital image processing. Part III. Affine transformations
October 21, 2016

In the previous post we presented some examples about the use of linear algebra in the digital image processing, specifically in the design of filters. In this post we are going to show more examples, but this time related to transformations that do not alter the pixels, but their positions inside the image.

The position of every pixel can be determined by the column and row where it is, thus, for any image we can build a matrix where each entry is a bi-dimensional vector (x, y) that describes the position of every pixel inside de matrix.

In digital image processing, there is a group of transformations which receive the coordinates of a pixel as input, and return the new coordinates where the pixel should be placed; so that, when computing these transformations for all the pixels of an image, a new image is obtained.

Here, we are going to focus on one kind of these transformations, the affine transformations, which preserve collinearity (points and straight lines, planes, etc.). Affine transformations map points into new points by applying a linear combination of translation, rotation, scaling and skewing operations.

Let's then go through all these basic operations, but before explaining how to define them in terms of matrix, let's see how images can be interpreted geometrically.

Images can be seen as a set of points inside the Cartesian coordinate system, where the coordinates of every point can be determined by the column and row of the corresponding pixel in the image.

(0,0) x y

Instead of using a normal Cartesian coordinate system, where y increases when moving up, it's common to use a system where the y-axis is reversed, due to the fact that the left-up corner of the image is considered the coordinates (0, 0), and as we move right or down, the x and y increases respectively.

(0,0) x y a b

So, geometrically, a pixel is considered as a point (a, b) in the xy-plane.

To represent affine transformations with matrices, we can use projective coordinates. This means representing a 2-coordinate vector (x, y) as a 3-coordinate vector (x, y, 1) and similarly for higher dimensions. Using this notation, any affine transformation can be expressed as a matrix multiplication:

 
x'
y'
1
    = T •   
x
y
1
 

Where T is a 3x3 matrix.

Translation

Any translation can be decomposed in two simple movements: horizontal and vertical. In the first one we move points in the direction of the x-axis, and in the second one in the y-axis direction.

Algebraically, moving a vector in the x-axis, is equivalent to add a constant (the distance along the x-axis: positive if it's moved to the right, negative otherwise) to the first component. In the same way, moving a vector in the y-axis, is equivalent to add a constant to the second component; and moving a vector in any direction is equivalent to add two constants, a and b to the first and second component of the vector, respectively. Using the 3-component notation, it could be expressed as:

 
x'
y'
1
   =   
1 0 a
0 1 b
0 0 1
   •   
x
y
1
   =   
x + a
y + b
1
 

Thus, the transformation matrix is:

T =   
1 0 a
0 1 b
0 0 1
 
x y a b

Rotation

The transformation matrix is:

T =   
cosθ -sinθ 0
sinθ cosθ 0
0 0 1
 

Where θ is the rotation angle. Notice that when rotate an image, it's rotated about the origin of the coordinate system (0, 0).

x y

If we want to rotate the image around its centre, we need firstly to translate the image in such a way that the centre of the image matches up with the origin. Then, rotate the image, and finally translate it again to its original position. The following series of figures illustrates the process.

x y
x y
x y

Algebraically, it can be expressed as the multiplication of three matrices: the first one to translate the image, the second one to rotate it, and the last one to translate the image again (matrices are placed in inverse order).

 
x'
y'
1
   =   
1 0 w/2
0 1 h/2
0 0 1
   •   
cosθ -sinθ 0
sinθ cosθ 0
0 0 1
   •   
1 0 -w/2
0 1 -h/2
0 0 1
   •   
x
y
1
 

Multiplying the three matrices, we can get a single matrix to make the whole transformation:

 
x'
y'
1
   =   
cosθ -sinθ -w/2•cosθ + h/2•sinθ + w/2
sinθ cosθ -w/2•sinθ - h/2•cosθ + h/2
0 0 1
   •   
x
y
1
 

Here, w and h are the width and height of the image, respectively.

Scaling

This transformation resizes an image up or down. The transformation matrix is:

T =   
α 0 0
0 β 0
0 0 1
 

α and β are the scale factors along the x-axis and y-axis, respectively. For scale factors greater than 1, the image will become larger along the corresponding axis, and for scale factors less than 1, the image will become smaller.

Notice that when scaling an image, it will scale the image dimensions and the position on the plane as well, so, if you want to place the resulting image matching up with the origin, you will need to make a translation after the scale operation

The following figure shows an example of a scale operation using 1.4 as the factor along the x-axis, and 0.8 along the y-axis.

x y

Flip

This transformation is useful when you want to change de direction of the image, and it can be done vertically or horizontally. They both can be seen as special cases of scaling or more intuitively as reflections about one of the axis (depending on the kind). To place the resulting image in a way that it superimposes the original one, it's necessary to make a translation at the end.

Transformation matrix for vertical flip:

T =   
1 0 0
0 -1 h
0 0 1
 

Transformation matrix for horizontal flip:

T =   
-1 0 w
0 1 0
0 0 1
 

Here, w and h are also the width and height of the image, respectively.

Skewing

When done in the x-axis direction, this transformation, also known as shearing, will displace each point horizontally by an amount proportionally to its y coordinate. Graphically, lines parallel to the x-axis remain where they are, but vertical lines becomes oblique, depending on the skew factor, which is the cotangent of the angle by which the vertical lines tilt, called the skew angle.

When done in the y-axis direction, it occurs the contrary, lines parallel to the y-axis remain where they are, but horizontal lines becomes oblique.

The following figure shows an example of a 30 degree skew in the x-axis.

x y

The skew matrix in the x-axis direction is:

T =   
1 tanθ 0
0 1 0
0 0 1
 

And the skew matrix in the y-axis direction is:

T =   
1 0 0
tanθ 1 0
0 0 1
 

Notice that when skewing an image using any of the above matrix transformations, it's done about the origin. If we want to skew an image around the centre of the image, we need to do the same procedure explained for the rotation.

In the next post we'll show some practical examples in Javascript about all the theory seen so far.

Related posts

Still no comments


Your comment

We'll never share your email with anyone else.
Comments are firstly moderated before they are made visible to everyone. Profile picture is obtained from Gravatar using your email.