Common Operations on Images
Image processing is a fascinating thing. After spending some time on this, I realized that all the image editing you do with the help of 3rd party tools can be done with OpenCV and much more. Many special effects applied by your camera app can be easily done with the help of OpenCV, in very few lines of code.
This is a small attempt made by me to describe the most common operations that are performed using OpenCV.
CV in the OpenCV stands for “Computer Vision” and open in the sense that it is “Open source”. So OpenCV is Open source Computer Vision. OpenCV is written in C++ but worries not, it has got its wrappers in Python, MATLAB, Java, etc.
Installation in Linux is very easy. If you just want to install OpenCV for python (which will suffice in our case), this is the command:
pip3 install opencv-python
And that’s it!! You can start playing with your images now…
Let’s quickly go through reading and writing images through OpenCV:
import cv2
#Reading
input_image = cv2.imread("/path/to/input/image")
#perform some operations on image. Suppose you get 'converted_image'
#Writing
cv2.imwrite("/path/to/output/image",converted_image)
Converting to Grayscale
Grayscale images are very useful when you are performing text extraction or OCR of image documents. OCR engines perform very well on grayscale images rather than colored images.
gray_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2GRAY)
Here is what the input and output images look like:
A point note here is that colored images (BGR) in OpenCV are represented as a 3-dimensional array (n x m x 3) in memory. The grayscale images are represented as a 2-dimensional array (n x m). You can access a pixel value by its row and column coordinates. For the BGR image, it returns an array of Blue, Green, Red values. For grayscale images, just corresponding intensity is returned.
#shape is used to get the dimensions of the image
print (input_image.shape)
print (gray_image.shape)
#This will return [Blue, Green, Red] values of pixel at 50th row and 80th column
bgr_px = input_image[50,80]
Thresholding
In thresholding, every pixel of the image is assigned a value on the basis of the threshold. If the pixel value is smaller than the threshold, it is set to 0, otherwise, it is set to a maximum value (generally 255). Thresholding is generally done on grayscale images to convert them to binary images (containing only 0 and 255-pixel values.)
ret, thresh1 = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
cv2.imwrite("thresh.png",thresh1)
Thresholding is used to separate the light and dark parts of the images. It is generally used before OCRing the images and also used to deblur the images.
Inversion
Inversion simply means inverting the pixel values. Inversion is generally done on binary images and after inversion, white pixels will be converted to black, and black pixels will be converted to white.
We can invert an image using bitwise_not
function of OpenCV:
image_inv = cv2.bitwise_not(input_image)
Blurring or Smoothing
You might be wondering what is the need for blurring or smoothing a sharp image. While blur is undesirable in the images that you capture through your camera, but it’s quite useful in Image processing.
Here is the syntax to perform Average Bluring:
blurImg = cv2.blur(input_image,(10,10))
It simply takes the average of all the pixels under the kernel area and replaces the central element. Here we have defined the kernel size as (10, 10). You can try different values here and test.
Morphological Operations
Morphological operations are a set of operations that process images based on shapes. They apply a structuring element or kernel to an input image and generate an output image. The two most common morphological operations are Erosion and Dilation.
Erosion
import cv2
import numpy as np
img = cv2.imread('input_img.png')
# Matrix of size 5 as the kernel
kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.erode(img, kernel, iterations=1)
cv2.imwrite("eroded.png",img_erosion)
Dilation
import cv2
import numpy as np
img = cv2.imread('input_img.png')
# Matrix of size 5 as the kernel
kernel = np.ones((5,5), np.uint8)
img_dilation = cv2.dilate(img, kernel, iterations=1)
cv2.imwrite("dilated.png",img_dilation)
These were some basic operations on images using OpenCV. Hope you find them useful!