OpenCV & Python – Image Pyramids

OpenCV & Python - Image Pyramid header

In Image Analysis we often always operate on the same image, keeping its size constant. However, there are some particular processes that require the generation of the same image in a series of versions at different resolutions. It is often possible to notice differences in the behavior of the techniques used, such as object recognition, at different resolutions. This technique is called Image Pyramid.

The Image Pyramid technique

This technique takes this name because of the series of images generated with different resolution (and also different size) which, stacked one on top of the other, form a pyramid. There are two types of Image Pyramids techniques:

  • Gaussian Pyramid
  • Laplacian Pyramid

At the highest level of the pyramid we have the image with lower resolution, with a few pixels and smaller in size than the one below, and gradually going down we will have images with increasingly higher resolution, with larger dimensions until we reach the base of the pyramid which is precisely the starting image (the one with maximum resolution). But let’s see in detail how these two techniques work and how they differ from each other.

Gaussian Pyramid

For the processing of a Gaussian Pyramid, there are two work phases:

  • blurring
  • downsampling

Creating a Gaussian Pyramid begins by applying a blur filter to the original image, i.e. the Blurring phase. This filter is generally a Gaussian kernel that reduces the high-frequency information in the image. The formula for Gaussian blur is:

 G(x, y) = \frac{1}{16}\begin{bmatrix}1 & 2 & 1 \2 & 4 & 2 \1 & 2 & 1\end{bmatrix}\ast I(x, y)

Where I(x, y) is the intensity of the original image and \ast represents the convolution operation.

After blurring, the image is reduced in size through the downsampling operation. This step reduces the resolution of the image.

The formula for downsampling is:

 I_{\text{downsampled}}(x, y) = I(2x, 2y)

Where I_{\text{downsampled}}(x, y) is the resulting image after downsampling and I(2x, 2y) is the intensity of the image original in half indices.

Laplacian Pyramid

The Laplacian Pyramid is a further elaboration of the Gaussian one, and is instead based on the difference between the successive levels of the Gaussian Pyramid.

 L_i = G_i - \text{expand}(G_{i+1})

Where L_i is the corresponding level in the Laplacian Pyramid, G_i is the corresponding level in the Gaussian Pyramid, and \text{expand}(\cdot) is the upsampling operation.

The upsampling operation is the reverse process of downsampling. In the case of the Gaussian Pyramid, upsampling involves repeating the rows and columns, followed by a blur filter application.

The formula for upsampling is:

 \text{expand}(G_i) = 4 \ast G_i

Where \ast is the convolution operation.

In summary, the Gaussian Pyramid is obtained by applying blur and downsampling iteratively, while the Laplacian Pyramid is obtained by subtracting the expanded versions of the Gaussian Pyramid from the layers themselves. This process allows the image to be represented at different resolution scales.

Developing an example of the Gaussian Pyramid technique with Python

To better understand this technique there is nothing better than applying these concepts directly through Python code and the OpenCV library. First, let’s choose an image that we will use as the basis for this technique. Let’s take for example this image Torino.jpg which depicts the Mole Antonelliana in Turin. You can download the following image to use for examples.

Torino

At this point we start with a part of the code that will load the image into memory and prepare it for Gaussian Pyramid processing.

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Function to view images
def display_images(images, titles):
    plt.figure(figsize=(12, 6))
    for i in range(len(images)):
        plt.subplot(2, 3, i+1)
        plt.imshow(cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB))
        plt.title(titles[i])
        plt.xticks([]), plt.yticks([])
    plt.show()

img = cv2.imread('Torino.jpg')

In the code above we have defined a function that will be useful for showing 6 versions at different resolutions of our example image which will be the result of the Image Pyramid with the Gaussian method.

# Create a Gaussian Pyramid
gaussian_pyramid = [img]
for i in range(5):
    img = cv2.pyrDown(img)
    gaussian_pyramid.append(img)

# Display the Gaussian Pyramid
titles = ['Original Image'] + [f'Level {i}' for i in range(1, 7)]
display_images(gaussian_pyramid, titles)

The code to implement the Gaussian Pyramid is very simple with a 5-iteration for loop to produce a total of 6 images (one is the original resolution one). Everything is quite simple, because the calculation is all done with a single cv2.pyrDown() function which accepts the highest resolution image as an argument and then does a downsampling. At the end of the cycle we display all 6 imagesi.

Image Pyramid - Gaussian Pyramid result

Developing an example of the Laplacian Pyramid technique with Python

Now we can continue using the previous code to apply subsequent processing to the results obtained to obtain a Laplacian Pyramid.

# Create a Laplacian Pyramid
laplacian_pyramid = [gaussian_pyramid[5]]
for i in range(5, 0, -1):
    gaussian_expanded = cv2.pyrUp(gaussian_pyramid[i])
    
    # Scale the expanded image to match the dimensions of gaussian_pyramid[i-1]
    rows, cols, _ = gaussian_pyramid[i-1].shape
    gaussian_expanded = cv2.resize(gaussian_expanded, (cols, rows))
    
    laplacian = cv2.subtract(gaussian_pyramid[i-1], gaussian_expanded)
    laplacian_pyramid.append(laplacian)

# View the Laplacian Pyramid
titles = [f'Level {i}' for i in range(6)]
display_images(laplacian_pyramid, titles)

We therefore apply the subtractive method with the cv2.subtract() method, again through a for loop which will first make the dimensions of the images at the different levels of the pyramid identical and then carry out the upsampling.

Image Pyramid - Laplacian Pyramid result

Applications of the Image Pyramid method

can apply this method in some computer vision and image processing applications. The main purposes include:

  • Object Detection at Different Scales:
    Image Pyramids are often used for object detection at different scales within an image. For example, when using object detection algorithms such as keypoint detection or edge detection, working at different scales can increase the robustness of the detection.
  • Movement Tracking:
    In the analysis of video sequences, Image Pyramids can be used to track moving objects at different resolution scales. This can improve tracking reliability, especially when objects move closer or further away from the camera.
  • Image Compression:
    Image Pyramids can be used for image compression. For example, JPEG2000 uses a pyramid approach to represent different image resolutions, allowing for greater compression with the ability to decode and display portions of the image at different resolutions.
  • Image Reconstruction:
    The Laplacian Pyramid in particular is used for image decomposition and reconstruction. This can be useful in applications such as lossless compression and image editing.
  • Optimization and Multiresolution Filters:
    Image Pyramids are also involved in optimization algorithms and multi-resolution filters, where information at different scales is exploited to improve the quality of the results.

In general, the use of Image Pyramid allows you to effectively manage information at different resolution scales, helping to improve the performance of various image processing and computer vision algorithms.

Leave a Reply