Testing ANSI JOINs

Overview

The main purpose of this article is to show how Couchbase ANSI JOINs are tested. In Couchbase version 5.5, you can now execute joins using the ANSI standard syntax. This addition to joins makes them much more flexible, but also much more complex. This complexity makes it difficult for us to test in our normal framework, there are simply too many possibilities to manually write out for automation. Luckily one of our testing frameworks allows us to set the parameters of a query, and then generates random sets of queries based on those parameters. This framework is known as the Random Query Generator (RQG). In this article we will discuss how RQG is used to test the ANSI JOIN functionality that was implemented. First, let us take a brief look into what is different with ANSI JOINs.

In this article we will go over the following topics:

1. Introduction to ANSI JOINs

2. Overview of Random Query Generator(RQG)

3. USING RQG to test ANSI JOINs

4. Key Points

5. List of additional ANSI JOIN bugs found

6. Appendix

Introduction to ANSI JOINs

Previously in Couchbase you could only execute lookup joins and index joins. This was good for queries in which one side of the join can produce a document key of the other side of the join. However, the problem with this is that joins could not be executed on other fields of a key space. ANSI joins fix this problem and take it a step further, now joins can be executed on arbitrary fields or expressions that result in fields, or on multiple join conditions.

Old Join syntax:

Lhs-keyspace [join-type] JOIN rhs-keyspace

ON KEYS [expression resulting in a meta().id of one of the keyspaces]

ANSI JOIN syntax:

lhs-expression [join-type] JOIN rhs-keyspace ON [join-condition]

Old Join Syntax Query:

SELECT t_3.decimal_field1 , t_3.primary_key_id , t_3.varchar_field1 , t_3.char_field1 
FROM multiple_table_db_87317526_simple_table_4 t_2 
LEFT JOIN multiple_table_db_87317526_simple_table_1 t_3 
ON KEYS [ t_2.primary_key_id ]
WHERE (t_2.bool_field1 != true AND t_3.decimal_field1 <= 4980)

SELECT t_3.decimal_field1 , t_3.primary_key_id , t_3.varchar_field1 , t_3.char_field1

FROM multiple_table_db_87317526_simple_table_4 t_2

LEFT JOIN multiple_table_db_87317526_simple_table_1 t_3

ON KEYS [ t_2.primary_key_id ]

WHERE (t_2.bool_field1 != true AND t_3.decimal_field1 <= 4980)

Same query w/ ANSI JOINs Syntax:

SELECT t_3.decimal_field1 , t_3.primary_key_id , t_3.varchar_field1 , t_3.char_field1
FROM multiple_table_db_87317526_simple_table_4 t_2 
LEFT JOIN multiple_table_db_87317526_simple_table_1 t_3 
ON t_2.primary_key_id = t_3.primary_key_id
WHERE (t_2.bool_field1 != true AND t_3.decimal_field1 <= 4980)

SELECT t_3.decimal_field1 , t_3.primary_key_id , t_3.varchar_field1 , t_3.char_field1

FROM multiple_table_db_87317526_simple_table_4 t_2

LEFT JOIN multiple_table_db_87317526_simple_table_1 t_3

ON t_2.primary_key_id = t_3.primary_key_id

WHERE (t_2.bool_field1 != true AND t_3.decimal_field1 <= 4980)

The join types that are supported in N1QL are either INNER JOIN or LEFT/RIGHT OUTER JOIN. If you simply put JOIN this will be interpreted as INNER JOIN, and LEFT/RIGHT JOIN will be interpreted as OUTER JOIN. In the new join syntax, the left hand side of the join can be a keyspace, a nested n1ql query, a n1ql expression, or a join itself. The right hand side of the join must be a keyspace in the current implementation of ANSI JOINs inside of Couchbase. The ON-clause contains the conditions the join must meet, these conditions can be any predicate expression so long as there is a proper index on the right hand side keyspace. So you can do equality comparisons and non-equality comparisons (b1.field = b2.field or b1.field</>/=/!=/b2.field or b2.field is NOT NULL or complex expression = b2.field) This is a brief look into the new feature, for a deeper dive into the feature please check out this blog post written by Bingjie Miao(the developer who implemented the feature): https://www.couchbase.com/blog/ansi-join-support-n1ql/

Now we begin to understand that ANSI JOINs have very few limitations in comparison to our old joins. The left hand side of an ANSI join can be very complex, it can even be several joins itself, or a nested query that results in some json data, or several nested queries. Because of this level of complexity, it is a better idea to use something we call RQG, rather than coming up with individual queries to test. Here is a brief overview of how RQG works.

Overview of Random Query Generator(RQG)

RQG is a framework that was written by Couchbase that leverages another framework called Randgen. Basically we specify a set of parameters in a file called a grammar, and from those parameters Randgen will create SQL query templates (queries with placeholders for the field names and table names), then our framework processes these templates and converts them to N1QL formatted queries based off the dataset that is loaded into couchbase.

Example Query Template:

SELECT NUMERIC_FIELD_LIST 
FROM BUCKET_NAME 
INNER JOIN BUCKET_NAME 
ON ( PREVIOUS_TABLE.FIELD = CURRENT_TABLE.FIELD ) 
WHERE NOT (BOOL_FIELD != false);

SELECT NUMERIC_FIELD_LIST

FROM BUCKET_NAME

INNER JOIN BUCKET_NAME

ON ( PREVIOUS_TABLE.FIELD = CURRENT_TABLE.FIELD )

WHERE NOT (BOOL_FIELD != false);

This template is generated from the grammar file that contains the parameters, basically the grammar file tells randgen to generate a query that selects a list of numeric type data from a join of two arbitrary buckets on an equality comparison. The grammar will have various types of fields to be selected from and various join conditions that can be slotted into the ON/ WHERE predicates. Once the template is generated as shown above, our framework will go through and replace these arbitrary values with values based on the data loaded in. For NUMERIC_FIELD_LIST it will select a field that is a list of ints/bigints/floats/decimals/etc depending on what is available. The framework will replace BUCKET_NAME with the specific name of a bucket that exists on the server. PREVIOUS_TABLE.FIELD will be a field from the first bucket and CURRENT_TABLE.FIELD will be the same field from the second bucket. After all this is done, it will go through the query and create a copy of the query where any differences between N1QL and SQL are ironed out. For example, in the old join syntax we could not use the keyword ON, so instead RQG would convert ON to ON KEYS for the n1ql queries.

Example Converted Query:

SELECT t_3.decimal_field1 , t_3.int_field1 
FROM multiple_table_db_87317649_simple_table_4 t_2 
INNER JOIN multiple_table_db_87317649_simple_table_1 t_3 
ON KEYS [ t_2.primary_key_id ] 
WHERE (t_2.bool_field1 != false)

SELECT t_3.decimal_field1 , t_3.int_field1

FROM multiple_table_db_87317649_simple_table_4 t_2

INNER JOIN multiple_table_db_87317649_simple_table_1 t_3

ON KEYS [ t_2.primary_key_id ]

WHERE (t_2.bool_field1 != false)

Then both the n1ql queries and the sql queries are run against the couchbase server and an equivalent MySQL server. The results of the N1QL queries are compared to the results of the equivalent MySQL queries. Our goal is to have our results match up with MySQL results because we want N1QL to be functionally similar to SQL. If you want to read more about how RQG is laid out or how it generates templates and converts them to n1ql please check out this 5-part series on RQG: https://dzone.com/articles/testing-n1ql-sql-for-json

USING RQG to test ANSI JOINs

The way RQG is setup allows us to generate a variety of queries we may not have otherwise thought of ourselves. Here is an extremely complicated template generated by RQG:

SELECT NUMERIC_FIELD_LIST, STRING_FIELD_LIST 
FROM BUCKET_NAME 
INNER JOIN BUCKET_NAME 
ON ( PREVIOUS_TABLE.FIELD = CURRENT_TABLE.FIELD ) 
WHERE (((((STRING_FIELD != STRING_VALUES AND (NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE) OR (NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE))) AND (NUMERIC_FIELD IS NOT NULL)) AND ((NOT (((BOOL_FIELD AND (NUMERIC_FIELD IS NULL) OR (NUMERIC_FIELD >= NUMERIC_VALUE))) AND ((BOOL_FIELD != false AND NUMERIC_FIELD = NUMERIC_VALUE)) OR ((BOOL_FIELD != true AND NUMERIC_FIELD = NUMERIC_VALUE)) AND (NOT (BOOL_FIELD)))) AND (STRING_FIELD IS NULL))) AND (((BOOL_FIELD AND NUMERIC_FIELD != NUMERIC_VALUE)) AND ((BOOL_FIELD = true AND NUMERIC_FIELD IS NULL)))) OR (((NOT ((NUMERIC_FIELD < NUMERIC_VALUE) AND ((BOOL_FIELD != false OR STRING_FIELD < STRING_VALUES)))) AND (NOT (((STRING_FIELD IN ( LIST )) AND (STRING_FIELD LIKE STRING_VALUES) AND NUMERIC_FIELD IS NULL)))) OR (((BOOL_FIELD AND (NOT ((NUMERIC_FIELD = NUMERIC_VALUE) AND (NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE))) OR (NUMERIC_FIELD <= NUMERIC_VALUE))) OR (((NOT (((NOT ((BOOL_FIELD != true AND NUMERIC_FIELD > NUMERIC_VALUE))) AND ((((BOOL_FIELD != false OR NUMERIC_FIELD != NUMERIC_VALUE)) AND ((BOOL_FIELD != true AND NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE)) OR ((STRING_FIELD IS NOT NULL AND NUMERIC_FIELD > NUMERIC_VALUE)) AND ((NUMERIC_FIELD >= NUMERIC_VALUE AND STRING_FIELD = STRING_VALUES AND NOT (BOOL_FIELD)))) AND ((NOT (BOOL_FIELD) OR NUMERIC_FIELD IS NULL)))) AND (((NOT ((STRING_FIELD >= STRING_VALUES AND NUMERIC_FIELD IS NULL))) OR (NOT ((((BOOL_FIELD AND NUMERIC_FIELD != NUMERIC_VALUE)) AND ((BOOL_FIELD != false OR NUMERIC_FIELD IS NULL))) AND (((STRING_FIELD IN ( LIST )) AND ((STRING_FIELD <= STRING_VALUES AND NUMERIC_FIELD <= NUMERIC_VALUE))) AND ((NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE OR STRING_FIELD != STRING_VALUES)))))) OR ((STRING_FIELD <= STRING_VALUES AND NUMERIC_FIELD >= NUMERIC_VALUE))))) OR (NOT ((NOT ((BOOL_FIELD = false AND NUMERIC_FIELD IN ( LIST )))) OR (NOT ((BOOL_FIELD = true AND NUMERIC_FIELD < NUMERIC_VALUE)))))) AND ((((BOOL_FIELD != true AND STRING_FIELD NOT BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE)) OR (((BOOL_FIELD = false OR (STRING_FIELD < STRING_VALUES) OR ((STRING_FIELD <= STRING_VALUES) OR (STRING_FIELD >= STRING_VALUES)))) AND ((NOT (BOOL_FIELD) OR NUMERIC_FIELD IS NOT NULL)))) AND (((NOT (BOOL_FIELD) OR (STRING_FIELD <= STRING_VALUES) OR (STRING_FIELD != STRING_VALUES))) OR ((NUMERIC_FIELD < NUMERIC_VALUE OR STRING_FIELD IS NULL)))))));

SELECT NUMERIC_FIELD_LIST, STRING_FIELD_LIST

FROM BUCKET_NAME

INNER JOIN BUCKET_NAME

ON ( PREVIOUS_TABLE.FIELD = CURRENT_TABLE.FIELD )

WHERE (((((STRING_FIELD != STRING_VALUES AND (NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE) OR (NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE))) AND (NUMERIC_FIELD IS NOT NULL)) AND ((NOT (((BOOL_FIELD AND (NUMERIC_FIELD IS NULL) OR (NUMERIC_FIELD >= NUMERIC_VALUE))) AND ((BOOL_FIELD != false AND NUMERIC_FIELD = NUMERIC_VALUE)) OR ((BOOL_FIELD != true AND NUMERIC_FIELD = NUMERIC_VALUE)) AND (NOT (BOOL_FIELD)))) AND (STRING_FIELD IS NULL))) AND (((BOOL_FIELD AND NUMERIC_FIELD != NUMERIC_VALUE)) AND ((BOOL_FIELD = true AND NUMERIC_FIELD IS NULL)))) OR (((NOT ((NUMERIC_FIELD < NUMERIC_VALUE) AND ((BOOL_FIELD != false OR STRING_FIELD < STRING_VALUES)))) AND (NOT (((STRING_FIELD IN ( LIST )) AND (STRING_FIELD LIKE STRING_VALUES) AND NUMERIC_FIELD IS NULL)))) OR (((BOOL_FIELD AND (NOT ((NUMERIC_FIELD = NUMERIC_VALUE) AND (NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE))) OR (NUMERIC_FIELD <= NUMERIC_VALUE))) OR (((NOT (((NOT ((BOOL_FIELD != true AND NUMERIC_FIELD > NUMERIC_VALUE))) AND ((((BOOL_FIELD != false OR NUMERIC_FIELD != NUMERIC_VALUE)) AND ((BOOL_FIELD != true AND NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE)) OR ((STRING_FIELD IS NOT NULL AND NUMERIC_FIELD > NUMERIC_VALUE)) AND ((NUMERIC_FIELD >= NUMERIC_VALUE AND STRING_FIELD = STRING_VALUES AND NOT (BOOL_FIELD)))) AND ((NOT (BOOL_FIELD) OR NUMERIC_FIELD IS NULL)))) AND (((NOT ((STRING_FIELD >= STRING_VALUES AND NUMERIC_FIELD IS NULL))) OR (NOT ((((BOOL_FIELD AND NUMERIC_FIELD != NUMERIC_VALUE)) AND ((BOOL_FIELD != false OR NUMERIC_FIELD IS NULL))) AND (((STRING_FIELD IN ( LIST )) AND ((STRING_FIELD <= STRING_VALUES AND NUMERIC_FIELD <= NUMERIC_VALUE))) AND ((NUMERIC_FIELD BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE OR STRING_FIELD != STRING_VALUES)))))) OR ((STRING_FIELD <= STRING_VALUES AND NUMERIC_FIELD >= NUMERIC_VALUE))))) OR (NOT ((NOT ((BOOL_FIELD = false AND NUMERIC_FIELD IN ( LIST )))) OR (NOT ((BOOL_FIELD = true AND NUMERIC_FIELD < NUMERIC_VALUE)))))) AND ((((BOOL_FIELD != true AND STRING_FIELD NOT BETWEEN LOWER_BOUND_VALUE and UPPER_BOUND_VALUE)) OR (((BOOL_FIELD = false OR (STRING_FIELD < STRING_VALUES) OR ((STRING_FIELD <= STRING_VALUES) OR (STRING_FIELD >= STRING_VALUES)))) AND ((NOT (BOOL_FIELD) OR NUMERIC_FIELD IS NOT NULL)))) AND (((NOT (BOOL_FIELD) OR (STRING_FIELD <= STRING_VALUES) OR (STRING_FIELD != STRING_VALUES))) OR ((NUMERIC_FIELD < NUMERIC_VALUE OR STRING_FIELD IS NULL)))))));

As you can see by the sheer size of this query, it would have been very difficult to create this query on our own, and many such queries are generated and executed through RQG. This ability to create highly complex queries makes it perfectly suited to tackle the arbitrary complexity of ANSI JOINs. The grammar that we have defined to test ANSI JOINs attempts to capture the complexity of the left-hand side of a join as well as the complexity of the ON clause. We tell RQG to generate the left hand side of a join as one of three things, a normal keyspace, a n1ql subquery, or several nested joins. We tell RQG to generate the right hand side of a join as a keyspace. Then we tell RQG to generate the ON clause of a join as mostly equality predicates between two fields one from the left hand keyspace or left hand alias, and one from the right hand keyspace. The other things that can be generated in the ON clause could be non-equality comparisons (</>/=/!=) or simply checking attributes of the right hand keyspace field (b2.field is not null, b2.field is not missing, b2.field is in a list, etc). For ANSI JOINs, there is a small difference between n1ql and sql. That difference is that N1QL supports the comparison IS (NOT) MISSING, this is because of the non-relational data model. Some buckets can have documents that have a field that other documents in that bucket do not have. So IS (NOT) MISSING means that the field does not appear in the document(or in the case of IS NOT MISSING means that the field appears in the document). Technically, if we wanted we could also tell RQG to generate n1ql subqueries inside the ON clause and compare those to the right hand keyspace values, however this would exponentially increase the run-time of our queries and is not a common use case. Here are some sample ANSI JOIN queries generated from our set of parameters above:

SELECT OUTER_BUCKET_NAME.* 
FROM BUCKET_NAME 
INNER JOIN BUCKET_NAME 
ON ( PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD ) 
WHERE NUMERIC_FIELD = NUMERIC_VALUE;

SELECT OUTER_BUCKET_NAME.*

FROM BUCKET_NAME

INNER JOIN BUCKET_NAME

ON ( PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD )

WHERE NUMERIC_FIELD = NUMERIC_VALUE;

SELECT OUTER_BUCKET_NAME.* 
FROM BUCKET_NAME 
INNER JOIN BUCKET_NAME 
ON ( (PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD) ) 
LEFT JOIN BUCKET_NAME 
ON ( (PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD OR PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD) ) 
INNER JOIN BUCKET_NAME 
ON ( NOT ((PREVIOUS_TABLE.NUMERIC_FIELD < CURRENT_TABLE.NUMERIC_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD)) ) 
INNER JOIN BUCKET_NAME 
ON ( (PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD) ) 
INNER JOIN BUCKET_NAME 
ON ( (PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD) ) 
INNER JOIN BUCKET_NAME 
ON ( (PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD) ) 
INNER JOIN BUCKET_NAME 
ON ( NOT ((PREVIOUS_TABLE.NUMERIC_FIELD != CURRENT_TABLE.NUMERIC_FIELD OR PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD)) ) 
INNER JOIN BUCKET_NAME 
ON ( NOT ((PREVIOUS_TABLE.STRING_FIELD != CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD)) ) 
WHERE NULL_NUM_FIELD IS NULL or NULL_NUM_FIELD IS MISSING;

SELECT OUTER_BUCKET_NAME.*

FROM BUCKET_NAME

INNER JOIN BUCKET_NAME

ON ( (PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD) )

LEFT JOIN BUCKET_NAME

ON ( (PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD OR PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD) )

INNER JOIN BUCKET_NAME

ON ( NOT ((PREVIOUS_TABLE.NUMERIC_FIELD < CURRENT_TABLE.NUMERIC_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD)) )

INNER JOIN BUCKET_NAME

ON ( (PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD) )

INNER JOIN BUCKET_NAME

ON ( (PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD) )

INNER JOIN BUCKET_NAME

ON ( (PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD) )

INNER JOIN BUCKET_NAME

ON ( NOT ((PREVIOUS_TABLE.NUMERIC_FIELD != CURRENT_TABLE.NUMERIC_FIELD OR PREVIOUS_TABLE.STRING_FIELD = CURRENT_TABLE.STRING_FIELD)) )

INNER JOIN BUCKET_NAME

ON ( NOT ((PREVIOUS_TABLE.STRING_FIELD != CURRENT_TABLE.STRING_FIELD AND PREVIOUS_TABLE.NUMERIC_FIELD = CURRENT_TABLE.NUMERIC_FIELD)) )

WHERE NULL_NUM_FIELD IS NULL or NULL_NUM_FIELD IS MISSING;

As you can see by these examples the ON clause can be multiple chained predicates, chained by AND/OR. We can also observe that multiple joins can be executed in one query, and that each join can have its own ON clause.

Now that it is understood how RQG generates the templates and what those templates look like, let us look at an example of a bug that we found in ANSI JOIN functionality that we probably would not have found otherwise. MB-27483, in this bug we had the following query:

SELECT t_2.* 
FROM multiple_table_db_492_simple_table_10 t_4 
INNER JOIN multiple_table_db_492_simple_table_4 t_2 
ON ( NOT ((t_4.char_field1 = t_2.char_field1 AND t_4.int_field1 = t_2.int_field1)))
WHERE (NOT ((t_2.bool_field1 = true AND t_2.decimal_field1 < 4878))) OR ((((t_2.bool_field1 != true AND t_4.int_field1 IN [ 0 , 12 , 21 , 32 , 39 ])) OR ((t_2.decimal_field1 != 4878) OR (t_4.decimal_field1 >= 4821))) AND (NOT (((NOT (t_4.bool_field1) OR t_2.char_field1 LIKE "N")) AND ((t_2.decimal_field1 BETWEEN 5 and 9985 AND t_2.varchar_field1 IS NULL AND t_2.bool_field1 = false))))) LIMIT 100

SELECT t_2.*

FROM multiple_table_db_492_simple_table_10 t_4

INNER JOIN multiple_table_db_492_simple_table_4 t_2

ON ( NOT ((t_4.char_field1 = t_2.char_field1 AND t_4.int_field1 = t_2.int_field1)))

WHERE (NOT ((t_2.bool_field1 = true AND t_2.decimal_field1 < 4878))) OR ((((t_2.bool_field1 != true AND t_4.int_field1 IN [ 0 , 12 , 21 , 32 , 39 ])) OR ((t_2.decimal_field1 != 4878) OR (t_4.decimal_field1 >= 4821))) AND (NOT (((NOT (t_4.bool_field1) OR t_2.char_field1 LIKE "N")) AND ((t_2.decimal_field1 BETWEEN 5 and 9985 AND t_2.varchar_field1 IS NULL AND t_2.bool_field1 = false))))) LIMIT 100

This query should have executed successfully given the way our server was setup, however instead of executing it was throwing an error that the proper index did not exist on the right hand side keyspace. The problem here was the WHERE clause was interfering with the index selection of the ON clause, causing the query to think it did not have the index it needed when that index did in fact exist. If we look into the WHERE clause we see it is very complex, this is not the level of complexity that we would normally have in the functional side of testing. Thus we would not have spotted this bug without using RQG.

Key Points

ANSI JOINs are quite flexible in terms of what is possible to join together, you can join arbitrary expressions with keyspaces and the ON clause can contain predicates that compare arbitrary expressions to fields. This level of complexity makes it impractical to try and come up with specific queries to test every scenario on our own through functional testing. Luckily we have a framework called RQG that is capable of generating queries based on a rule set that we can define. This functionality is very powerful and has allowed us to test ANSI JOINs to our satisfaction. It has even helped us find bugs that would be hard to find otherwise, for example if a large number of complex queries were being run, ANSI JOIN results would be off by 1. However, if any of those individual queries was run in isolation by itself, the results were correct. That is why we are confident that ANSI JOINs are in good shape and ready for customer use.

List of additional ANSI JOIN bugs found with RQG:

Bug Number	Bug Description
MB-27834	This query was not properly using the LIMIT query hint, it would execute the whole query before applying the LIMIT
MB-27763	This query was not properly using the LIMIT query hint, similar to the above bug
MB-27483	The query thought that the appropriate index didn’t exist, prompting the query to throw an error. However the index was actually present in the system
MB-27230	The wrong error was being returned by the query
MB-27201	This query was running into an error with backfill not working properly
MB-27109	Another query that thought the index it needed was unavailable, even though the index was present in the system
MB-27028	When a bunch of queries were executed in succession some of the query results were off by 1. When the queries were executed by themselves all the results were correct.
MB-26649	Query would intermittently hang

Appendix

Couchbase Server 5.5 GA Blog https://www.couchbase.com/blog/announcing-couchbase-server-5-5/
Index Joins https://dzone.com/articles/join-faster-with-couchbase-index-joins
ANSI JOINs Couchbase Blog https://www.couchbase.com/blog/ansi-join-support-n1ql/
Randgen https://launchpad.net/randgen
RQG Dzone article series https://dzone.com/articles/testing-n1ql-sql-for-json

Ajay Bhullar

Products

See How Capella Stacks Up

See How Capella Stacks Up

By Industry

By Need

Why NoSQL

What is NoSQL and why choose it?

Popular Docs

By Developer Role

Developer Playground

Start a Capella session

Resource Center

Education

Certification Exams 2023

Get Couchbase certified

About

Partnerships

Our Services

Partners: Register a Deal

Ready to register a deal with Couchbase?

Marriott

Testing ANSI JOINs

Overview

Introduction to ANSI JOINs

Overview of Random Query Generator(RQG)

USING RQG to test ANSI JOINs

Key Points

List of additional ANSI JOIN bugs found with RQG:

Appendix

Author

Posted by Ajay Bhullar

Leave a reply Cancel reply