I’ve been working with a non-profit, GreenStand, helping them with image processing tasks and related machine learning features for the technology part of their mission.
GreenStand is (really) making the world a better place – walkin’ the walk. In short, they’re investing in people and local communities by encouraging them, financially and otherwise, to plant trees, because many of these communities have been deforested, for fuel and other reasons. You can read more about their mission here.
When I approached them about their needs, one of the first things that came up was how to identify “bad” photos. First, one has to define what a “bad” photo is. It turns out that the planters upload a photo of the newly planted seedling to demonstrate to the group that it has been planted. Since the cell phones used are often old and with low resolution cameras, and submitted over a typically spotty cell phone connection, the photos can be out of focus or distorted. Sometimes people upload the wrong photo, so the photo might be of a car, or some random scene. In addition, the photos are not taken in such a way as to make identification easier – by having the same pose, focal length, or removing other stray vegetation around the seedling.
Duplicate photos
Another problem to be solved – and this is where the machine learning / deep learning comes in – is caused by people uploading in batches and the same photos are uploaded over and over again. Knowing the basic type of tree (if not the exact species) and its geolocation in the meta data of the photo, we can flag it as a potential duplicate, and keep from overloading the server with data.
Image processing for any task is completely dependent on the data – what approaches you take, the algorithms you choose, the results that you can attain – depend entirely upon the data you are given. Fortunately, the folks at GreenStand have been up and running long enough to give me over 50000 photos to start with. I’m going to be doing prototyping for this project with Matlab. Ultimately, to run on the server, likely to be a cloud based solution, I will have to port the code over to an open source solution – probably Python and OpenCV. But for rapid prototyping of image processing and machine learning, you really cannot beat Matlab, in my opinion. It’s pricey but it saves a lot of time.
The latest version of Matlab that I’m using, 2018b has an integrated image browser “app”, that let’s the user browse a folder of images and load them into various other apps or just import them into the user’s matlab workspace. I loaded up the images and browsed through them, getting an idea of what the “good” or “bad” image characteristics might be.
The image browser let’s you load images directly into other “apps”, such as the color segmentation app. This will come in very handy later. But for now I’m just looking at the out of focus images.
“Focus” algorithms
Determining if an image is out of focus is really determining the sharpness of the image – how “crisp” the image is. This means measuring contrast. There are quite a few algorithms that have been developed for this purpose. After reading up on the literature, and testing a few, I decided upon using the “Brenner Focus Algorithm”.
https://cs.uwaterloo.ca/~vanbeek/Publications/spie2014.pdf
I implemented a variation on this algorithm by taking the average of the maximum of the square of the vertical and horizontal gradients:
Rr = \{1...rows\} Rc = \{1...cols\} \forall r \in Rr ,\forall c \in Rc \bigtriangledown H=I(r,c) - I(r,c-2) \bigtriangledown V=I(r,c) - I(r-2,c) M = max(\bigtriangledown H,\bigtriangledown V) F = mean(M)
Matlab Code
function [Measure] = brenners(Image)
[M, N] = size(Image);
DH = zeros(M, N,’single’);
DV = zeros(M, N,’single’);
DHG = gpuArray(DH);
DVG = gpuArray(DV);
IG = gpuArray(Image);
DVG(1:M-2,:) = IG(3:end,:)-IG(1:end-2,:);
DHG(:,1:N-2) = IG(:,3:end)-IG(:,1:end-2);
Mx = max(abs(DHG), abs(DVG));
M2 = Mx.^2;
M2CPU = gather(M2);
Measure = mean2(M2CPU);
end
Java Code
// Compute average of sum of squares of the gradient in H and V directions. private static double brennersFocusMetric(int[][] input,int rows, int cols ) { int[][] V = new int[rows][cols]; int[][] H = new int[rows][cols]; for(int row = 0; row < rows; row++) { for (int col = 0; col < cols-2; col++) { int grad = input[row][col+2] - input[row][col]; H[row][col] = grad; } } for(int row = 0; row < rows-2; row++) { for (int col = 0; col < cols; col++) { int grad = input[row+2][col] - input[row][col]; V[row][col] = grad; } } double sum = 0; for(int row = 0; row < rows; row++) { for (int col = 0; col < cols; col++) { double HRC = H[row][col]; double VRC = V[row][col]; sum += Math.abs(HRC) > Math.abs(VRC) ? HRC * HRC : VRC * VRC; } } return sum / (double)(rows * cols); }
The above code finds the “bad” photos well – based on my test, it eliminated all the out of focus photos, with a less than 5 percent false positive rate. Some of these false positives were the result of a dual focus – the seedling was in focus, but the background was not. The algorithm returns a number that describes a measure, somewhat subjective, of the “goodness” of focus. One will need to experiment to find the range values best suited for their own application. Of course, more parameters of what is “bad” can be used to refine the meaning of the term. The goal is to warn the user that their image of the seedling is not good, and they should retake the photo.
Next….
In the next of this series of 4 posts, I’ll go into segmentation of the images and preparation of them for testing a Support Vector Machine that classifies the images into four basic types…