A Simple Implementation of Filtering Logic in Go

Sun Dec 09, 2018
go golang tutorial filtering

Introduction

I’ve come across several cases lately where I need to implement some kind of filtering on a slice of records based on business logic requirements. At a first pass, the simplest solution may be to do filtering in-line, looping through your set of records and discarding whatever records don’t match your business requirements. However, this can quickly become unmanageable when you need to apply completely different filters in some situations. Code gets duplicated all over the place, filters get messy, the flow of the program becomes difficult to follow, testing code becomes exponentially more difficult, and the next engineer on the project will be sad. At this point, you may turn to one of the many Go libraries that provide map/reduce-esque functions with promises of making all your filtering woes vanish. Choosing a filtering library comes at cost though - libraries have to be type-agnostic and thus will generally be using reflect under the hood to do type assertions which is not ideal for performance. For the cases I’ve come across, it’s simply not necessary to use a library to assist with filtering.

In this post, we’ll walk through a basic, modular filtering approach that I’ve used and had success with. It implements filters for a single type, so it’s inflexible in that regard, but it allows filters to be easily unit tested, and allows different business logic requirements to chain together filters in whatever requirements make sense.

You can find all the source code for this example on my GitHub.

Filtering in Go

The first thing that we’ll do is define several helper types:

// Filter is a filter function applied to a single record.
type Filter func(string) bool

// FilterBulk is a bulk filter function applied to an entire slice of records.
type FilterBulk func([]string) []string

Filter is a single filter function. Each record in a slice will be checked against the filter function and kept or discarded based on the boolean return value. The filter takes a string as the argument - other types would have to be re-implemented.

FilterBulk is a similar type, but applies to the entire slice of records, rather than an individual record. This is useful in situations where deciding whether or not to keep a record must be determined based off of other records in the set - the most obvious example is de-duplication.

Each one of our filters will implement one of these two types, and then we can chain them together to create sets of filters.

Next we need a way to apply these filters:

// ApplyFilters applies a set of filters to a record list.
// Each record will be checked against each filter.
// The filters are applied in the order they are passed in.
func ApplyFilters(records []string, filters ...Filter) []string {
	// Make sure there are actually filters to be applied.
	if len(filters) == 0 {
		return records
	}

	filteredRecords := make([]string, 0, len(records))

	// Range over the records and apply all the filters to each record.
	// If the record passes all the filters, add it to the final slice.
	for _, r := range records {
		keep := true

		for _, f := range filters {
			if !f(r) {
				keep = false
				break
			}
		}

		if keep {
			filteredRecords = append(filteredRecords, r)
		}
	}

	return filteredRecords
}

// ApplyBulkFilters applies a set of filters to the entire slice of records.
// Used when each record filter requires knowledge of the other records, e.g. de-duping.
func ApplyBulkFilters(records []string, filters ...FilterBulk) []string {
	for _, f := range filters {
		records = f(records)
	}

	return records
}

These two functions take the slice of records and a slice of Filter or FilterBulk as their arguments. ApplyFilters ranges through each record in the slice and then passes that record to each filter in the set of filters. If the record passes each filter successfully, it is kept, otherwise it is discarded. ApplyBulkFilters instead ranges over the set of filters and then passes the entire slice of records to each bulk filter. Implementing these two functions with for loops rather than perhaps recursion was chosen because recursion can be expensive in a situation like this: if you iterate recursively through each filter and each record, allocations build up on each subsequent recursion and for large sets of filters or records, this can begin to cause performance issues.

At this point, we have a well-defined implementation for our filtering sets, and all we need is a way to apply them in a simple way. To do that, we’ll define one more type:

// FilterSet is a function that applies a set of filters and returns the filtered records.
type FilterSet func([]string) []string

FilterSet will let us define various functions that can take a slice of records and return a slice of the filtered records. This lets different business logic criteria get divided up based on certain identifiers or criteria. A set of filters could also be used in-line in whatever way makes sense, but for this example we’ll divide up some contrived business logic based on an ID:

var filters = map[int]FilterSet{
	1: FilterForAnimals,
	2: FilterForIDs,
}

filters will be our set of business logic requirements, and FilterForAnimals and FilterForIDs will be sets of filters that satisfy the FilterSet type. Let’s define what those two functions do now:

// FilterForAnimals applies a set of filters removing any non-animals.
func FilterForAnimals(records []string) []string {
	return ApplyBulkFilters(
		ApplyFilters(records,
			FilterMagicalCreatures,
			FilterStringLength,
			FilterInts,
			FilterWords,
		),
		FilterDuplicates,
	)
}

// FilterForIDs applies a set of filters removing any non-IDs.
func FilterForIDs(records []string) []string {
	return ApplyBulkFilters(
		ApplyFilters(records,
			FilterIDs,
		),
		FilterDuplicates,
	)
}

Each one of these functions applies various filters to the slice of records but with different filter requirements. In both cases, ApplyFilters is called first and then the result of that is passed to ApplyBulkFilters.

Our individual filters are fairly contrived so we’ll keep the explanation to a minimum:

// FilterMagicalCreatures filters out common mythical creatures.
func FilterMagicalCreatures(record string) bool {
	magicalCreatures := []string{
		"Unicorn",
		"Dragon",
		"Griffin",
		"Minotaur",
	}

	for _, c := range magicalCreatures {
		if record == c {
			return false
		}
	}

	return true
}

// FilterStringLength removes any long records.
func FilterStringLength(record string) bool {
	if len(record) > 75 {
		return false
	}

	return true
}

// FilterInts removes an integers disguised as strings.
func FilterInts(record string) bool {
	if _, err := strconv.Atoi(record); err == nil {
		return false
	}

	return true
}

// FilterWords removes any records with spaces.
func FilterWords(record string) bool {
	split := strings.Split(record, " ")
	if len(split) > 1 {
		return false
	}

	return true
}

// FilterIDs removes any IDs that don't match contrived criteria.
func FilterIDs(record string) bool {
	split := strings.Split(record, "-")
	if len(split) != 2 {
		return false
	}

	if _, err := strconv.Atoi(split[0]); err == nil {
		return false
	}
	if _, err := strconv.Atoi(split[1]); err == nil {
		return false
	}

	return true
}

Each filter does some sort of validation against the record and decides whether it is valid or not. Each filter also satisfies our Filter type that we defined way back at the beginning.

And lastly, a little program to see this all in action:

func main() {
	// Initialize some contrived records.
	records := []string{
		"Cat",
		"A sentence is not a valid record.",
		"Minotaur",
		"cd5169bf-3649-4091-862b-c7ec1de92fd9-cd5169bf-3649-4091-862b-c7ec1de92fd9-cd5169bf-3649-4091-862b-c7ec1de92fd9",
		"3412-3241",
		"Dragon",
		"Cat",
	}

	// Call the filter functions.
	// ApplyFilters will be applied first, in order from top to bottom.
	animals := filters[1](records)
	ids := filters[2](records)

	// The only thing that should be left is one record of "Cat".
	fmt.Println("Animals:")
	for _, animal := range animals {
		fmt.Println(animal)
	}

	// The only thing left should be the integer.
	fmt.Println("IDs:")
	for _, id := range ids {
		fmt.Println(id)
	}
}

This program creates an example record set and applies each set of filters to it, printing the results.

In this tutorial, we walked through a basic implementation of filtering in Go - I hope it was helpful! You can find the full source code for this tutorial on my GitHub and you can follow me on Medium to get notified of new posts.


back · blog · about · main