5/10/19

Simplified Algorithm to find largest Color Blob in a Grid (in Python and C++)

I have taught how to use algorithms and data structures for years as a professor. And many students think that just learning about algorithm and data structure is about choosing the correct implementation of a queue, search, standard graph algorithm, etc. from the either a library or canned code they learned in their class.

The best students can use a deep knowledge of how each algorithm and data structure works and then design variations or combinations of the original structures and algorithms to solve a problem. This shows an real application of knowledge and the ability to think out of the box that interviewers look for.

This article will show how I created a custom algorithm that combines a 2d array, and linked list, and some other 'tricks' to solve the problem of given a 2d grid where the cells of the grid are colored(based in integer in grid). I believe this is a very efficient, and with a small amount of code to solve this problem.

I was inspired to develop this method after the article: Bet you can't solve this Google interview question! by Kevin Ghadyani and then also by the Geeksforgeeks article: Largest connected component on a grid .

The Problem:

Find the size of the largest contiguous color area (the number cells in the color blob). For example given this grid with 4 different colors

[[1, 4, 4, 4, 4, 3, 3, 1], 
 [2, 1, 1, 4, 3, 3, 1, 1], 
 [3, 2, 1, 1, 2, 3, 2, 1], 
 [3, 3, 2, 1, 2, 2, 2, 2], 
 [3, 1, 3, 1, 1, 4, 4, 4], 
 [1, 1, 3, 1, 1, 4, 4, 4]]

... which represents this colored grid:

You can see the biggest color group are the underlined 1's, there are 9 of them

The approach:

There are a lot of ways to approach this problem. Most articles will create a graph and do depth first search recursivly or bredth first search. You need to keep track of what cells have been visited and have a way to pick the next un-processed cell.

When every you are working a problem with a grid that is represented by a 2d array, you don't need to use a generalized graph alorithm such as a graph library that keeps a map of nodes and list of adjacent nodes. The indexes to the 2d array serve as a constant time lookup, and the adjacentcy list is just the four directions left, right, top or bottom limited just limited at the outer bounds of the gird. You do need an effencient way to keep track of visited and to pick a unvisited cell to continue. You can easily do this by having each cell(node) have composit data of not just the color number, but a visited flag.

Below is both a python 3.6 version and a C++ version:

Python 3.6:

C++:

Hope you enjoyed this. -- Professor Jenkins

Check out my youtube channel https://www.youtube.com/user/gjenkinslbcc

No comments:

Post a Comment

Please comment or give suggestions for follow up articles