Quantifier predicates

Predicates in SQL are often complex and difficult to understand. In this article, Joe Celko explains the logic behind a few of the predicates: EXISTS, SOME, ANY and ALL.

SQL is based on set theory and logic. The only bad news is that many programmers have never had a class on either of those topics. They muddle through using the Boolean operators in their programming language and think that’s all there is to formal logic.

Let’s flashback to the early days of logic and play catch up. We need to start with syllogisms. Syllogisms are logical forms made up of combinations of two statements about classes of things that lead to a conclusion. They were invented by the Greeks and written up by Aristotle in Prior Analytics. You might have run into them, If you had a philosophy class that included lectures on formal fallacies. The three forms of statements allowed are:

1) ALL x ARE y

2) SOME x ARE y

3) NO x ARE y

These simple tools let us take two statements and come to a valid conclusion:

All men are mortal

All Greeks are Men

therefore: All Greeks are mortal

The Ancient Greeks seem to like to use a minimal set of tools which we could expect from a culture that gave us only circles and straight edges in plane geometry. We know that our logic was valid, so we are confident he will die if we know that Aristotle is a Greek. A polysyllogism, or a sorites, is a set of three or more such constructs that can be chained together to come to a conclusion.

You can figure out syllogisms or sorites using Euler circles, Venn diagrams, or Lewis Carroll’s diagrams. And before you ask, yes, it is that Lewis Carroll. People tend to forget that he was one of the pioneers in the evolution of logic. You don’t hear more about him when you’re studying early logic because he was on the losing side of a philosophical argument called “existential import” in the literature. Briefly, this has to do with what “ALL x ARE y” means. Does it imply that “SOME x ARE y” and that “NO x ARE non-y”; that is, there are x’s in the universe. Or does it mean only that “NO x ARE non-y”? I will get back to this point shortly.

George Boole and his book “Laws of Thought” (available in reprint additions to this very day) gave us the Boolean algebra that dominates programming languages with the basic AND, OR, and NOT operators we’ve come to know and love over the decades. Our syllogisms mutated into quantifiers which could be applied to the Boolean algebra expressions. The two forms are For all x, P(x), and For some x, P(x). If you want to look up formulas in a textbook, the symbol for the universal quantifier is ∀, an inverted letter ‘A’, and the symbol for the existential quantifier is ∃, a rotated letter ‘E’.

Existential import lost the battle, and the modern convention is that “All men are mortal” has the same meaning as “There are no men who are immortal” but does not imply that any men exist at all.

The reasoning is that if I were to walk into a bar and announce that I can beat any pink elephant in the bar, that would be a true statement. The fact that there are no pink elephants in the bar merely shows that the problem is reduced to the minimum case. If this still seems unnatural, then consider the formal mathematical properties of these quantifiers:

1) (∀x)P(x) = ¬(∃x) ¬P(x)

2) (∃x)P(x) = ¬(∀x) ¬P(x)

In Boolean terms, you can think of the (∀x) quantifier as being a generalized AND which connects an infinite string of propositions. Likewise, you can think of the (∃x) quantifier as a generalized OR, which connects an infinite string of propositions. These generalizations are actually taken from DeMorgan’s Laws.

The EXISTS() predicate

SQL has had the EXISTS() predicate since the beginning. It’s interesting within SQL because it has to return either TRUE or FALSE but can never return UNKNOWN. This is an exception to most logical operations inside SQL’s three-valued logic. It’s also not quite the same as the syllogisms we have been talking about.

Consider the statement “Some salesmen are liars,” and one way we would write it with the EXISTS() predicate in SQL:

If we are more cynical about salesmen, we might want to formulate the predicate “All salesmen are liars” with the EXISTS predicate in SQL, using the transform rule just discussed:

This informally says “There are no salesmen who are not liars” in English. In this case, the IN() predicate can be changed into a JOIN, which might improve performance and be a bit easier to read.

As an aside, the subquery doesn’t need to evaluate the SELECT clause for the EXISTS() predicate to work. In one very early version of Oracle, however, the compiler would work out the expressions in the SELECT clause and then ignore the results. Today, the preferred form is SELECT * because the star represents a nonspecific row in several places in SQL. It is easy to find in the text, and the compiler will not waste any time with expressions in the SELECT list (in fairness, this optimization should not be a problem anymore).

Now, onto the fancy stuff!

The [SOME | ANY] <subquery> predicate

The predicate <value expression> <comparison op> [ANY | SOME] <table expression> is equivalent to taking each row of the table expression, si, (assume that they are numbered from 1 to n) of <table expression> and testing <value expression> <comparison op> si with ORs between the expanded expressions:

When you get a single TRUE result, the whole predicate is TRUE. As long as <table expression> has cardinality greater than zero and one non-NULL value, you will get a result of TRUE or FALSE. The keyword SOME is the same as ANY, and the choice is just a matter of style and readability.

If you think about it, the IN() predicate is equivalent to a simple = ANY predicate. In fact, that is how it is defined in the ANSI/ISO Standard. But the <comparison op> can be any of =, <, >, <>, <=, >=, and (in theory, but not always in practice) any other comparison operator (such as IS [NOT] DISTINCT FROM).

The ALL <subquery> predicate

The <value expression> <comp op> ALL <table expression> takes each row of <table expression>, call them s1, s2,.. sn, builds a search condition for each such row expression, tests <value expression> <comp op> si with ANDs between the search conditions, thus:

When you get a single FALSE result, the whole predicate is FALSE. As long as <table expression> has cardinality greater than zero and all non-NULL values, you will get a result of TRUE or FALSE.

That sounds reasonable so far. Now let Empty_Table be an empty table (no rows, cardinality zero) and Null_Table be a table with only NULLs in its rows and a cardinality greater than zero. The rules for SQL say that <value expression> <comp op> ALL Null_Table always returns UNKNOWN, and likewise <value expression> <comp op> ANY Null_Table always returns UNKNOWN. This makes sense because every row comparison test in the expansion would return UNKNOWN, so the series of OR and AND operators would behave in the usual way.

However, <value expression> <comp op> ALL Empty_Set always returns TRUE and <value expression> <comp op> ANY Empty_Set always returns FALSE. Most people have no trouble seeing why the ANY predicate works that way; you cannot find a match, so the result is FALSE. But most people have trouble seeing why the ALL predicate is TRUE. We are trying to preserve those generalized DeMorgan’s Laws.

The Foobar.x <comp op> ALL (SELECT y FROM Table2 WHERE <search condition>) predicate converts to:

The Foobar.x <comp op> ANY (SELECT y FROM Table2 WHERE <search condition>) predicate converts to

Of the two quantified predicates, the <comp op> ALL predicate is probably the more useful of the two since it cannot be written in terms of an IN() predicate. The trick with it is to make sure that its subquery defines the set of values in which you are interested. For example, to find the authors whose books all sell for $49.95 or less, you could write:

The best way to think of this is to reverse the usual English sentence “Show me all x that are y” in your mind so that it says “y is the value of all x” instead.

The ALL predicate and Extrema functions

At first, it is counter-intuitive that these two predicates are not the same in SQL:

You have to remember the rules for the extrema functions – they drop out all the NULLs before returning the greatest, least, or computed value the aggregate function wants. But the ALL operator does not drop NULLs so that you can get them in the results. (Editor’s note: If any NULLs are returned from the subquery, no rows are returned from the main query with ALL. While the ALL operator is not dropping NULL, NULL doesn’t show up in the results because the expression >= NULL will return UNKNOWN.)

The first expression uses an aggregate function, so it returns a numeric value. In effect, it works like this:

0) Create the data

1) We can now apply the aggregate function. Remove the rows with NULLs.

(‘Albert’, 42)

(‘Chuck’, 43)

2) Find the maximum value in the column of y in the remaining rows.

But look at the second expression. That ALL() predicate makes this a logical expression. Bob’s NULL is not removed as it would be with an aggregation. Instead, we build a chain of AND-ed predicates:

((‘Albert’, 42)

AND (‘Bob’, NULL)

AND (‘Chuck’, 43))

Which becomes in effect:

SQL’s three-valued logic comes into play here. We know that

TRUE AND UNKNOWN = UNKNOWN

FALSE AND UNKNOWN = FALSE

UNKNOWN AND UNKNOWN = UNKNOWN

As you can see, once you’ve got an UNKNOWN value in a chain of AND-ed predicates, you really can’t get it to resolve to a TRUE value. You would have to cast them to a known value, using COALESCE() or some similar function.

Conclusion

It’s always a good idea to take a little time out and learn some of the more obscure features in this very large language we call SQL. But when you do, have mercy on the poor guys that have to maintain your code after you’re gone. This is what comments are for.