Merge

Conditional Insert, Update or Delete


Apache DerbyBigQueryDb2 (LUW)H2MariaDBMySQLOracle DBPostgreSQLSQL ServerSQLite200720092011201320152017201920212023⊘ 3.5.7 - 3.45.0⚠ 2008R2 - 2022a✓ 15 - 16⊘ 8.3 - 14⚠ 11gR1 - 23cFREEabc⊘ 5.0 - 8.3⊘ 5.1 - 11.3⚠ 2.0.202 - 2.2.224ad⚠ 1.4.198 - 1.4.200abd⊘ 1.4.191 - 1.4.197⚠ 9.7 - 11.5.9b✓ 2.0⚠ 2.0⚠ 10.15.1.3 - 10.17.1.0be
  1. Not all operations
  2. Ambiguities not always reported
  3. Not all syntax variants
  4. ⚡Applies multiple rules
  5. No view or subqueries as source

SQL’s merge performs insert, update and delete operations based on when…then rules. It is useful for synchronizing the contents of a table with newer data. Instead of running a delete, and an update and an insert statement,0 a single merge statement takes care of everything…well, almost everything. Unfortunately, there is a scenario that standard SQL’s merge does not cover, so in the end you might still need two statements. Let’s start from the beginning…

Contents:

  1. The Base Syntax
  2. The When … Then Rules
  3. When … And Conditions
  4. Illogical Errors
  5. Scoping
  6. Not Matched by Source: Deleting Extra Rows
  7. JSON Documents as Source
  8. Related
  9. Supplemental Compatibility
  10. Normative References

The Base Syntax

Every merge statement must provide the target table (into clause), a source (using clause) and a join condition (on clause).

MERGE
 INTO <target>  [[AS] <new target name>]
USING <source>  [[AS] <new source name>]
   ON <join-condition>
      <when…then rules>

While the target is typically the name of a table,1 the source can be anything that returns a table—even subqueries or table functions such as json_table. The source and the target can be renamed just like in the from clause.✓✗

Apache DerbyBigQueryaDb2 (LUW)H2Oracle DBPostgreSQLaSQL ServeraBasetable as sourceView as sourceSubquery as sourcejson_table or similar
  1. Not with json_table, but with similar functionality provided by that system

The When … Then Rules

The base syntax is followed by the when…then rules. More precisely, by the when [not] matched…then… rules. Each rule applies either in the case that—for one source row—the on clause does identify corresponding rows in the target (when matched) or that it does not (when not matched). If there are corresponding rows in the target—no matter how many—they can be be updated or deleted. Otherwise, if there are no corresponding rows in the target, an insert can be done.

Apache DerbyBigQueryDb2 (LUW)H2Oracle DBabPostgreSQLSQL Serverwhen not matched then insertwhen     matched then updatewhen     matched then delete
  1. Cannot update columns used in the on clause
  2. Not to be confused with then update … delete [where]

The following example demonstrates the typical use case of merge: synchronizing the contents of a table with newer information provided from an external system. In the example, the external system is a web shop’s front end, which provides a user’s wish list. The database keeps the contents of all wish lists in the wish_lists table (user_id, product_id, qty).2 When a user edits a wish list, the front end provides the edited wish list to merge, which identifies and performs the required operations. For the time being we assume that the front end puts the edited wish list in the table my_wish_list (product_id, qty). We will directly process a JSON document later.

MERGE
 INTO wish_lists
USING my_wish_list
   ON wish_lists.user_id    = ?
  AND wish_lists.product_id = my_wish_list.product_id
 WHEN NOT MATCHED THEN INSERT (user_id, product_id, qty)
                       VALUES (      ?, product_id, qty)
 WHEN     MATCHED THEN UPDATE SET qty = my_wish_list.qty

The merge statement names the target table wish_lists and the source table my_wish_list. The on clause only takes those rows in the target table that belong to a specific user into consideration—whereas the user_id value is provided via a bind parameter (?). Furthermore, the on clause uses the product_id to assign the remaining rows of the wish_lists table to those in my_wish_list.

For example, if there is a row in my_wish_list for a specific product X, the on clause essentially checks the target table for corresponding rows. If there is no row in wish_lists for which user_id and product_id have the right values, the when not matched rule applies and the respective row is inserted. This happens if the user has put a new product on the wish list. Otherwise, if there is a corresponding row in the target table, the when matched clause applies and updates the qty column of the matched row(s).3 This happens if the user already had this product on the wish list.

The commands after the keyword then basically follow the syntax known from their stand-alone relatives—albeit in an abbreviated way. First of all, the then commands don’t specify a target—the merge into clause sets the target for all operations. Moreover, the commands are executed in the context of one source row and the corresponding target rows. Therefore, update and delete do not accept a where clause✓✗—they know which rows to work on anyway. Finally, the insert syntax is limited to a single-row values clause. You cannot use an insert…select nor can you use a values clause that produces more than a single row.

Support My Work

I offer SQL training, tuning and consulting. Buying my book “SQL Performance Explained” (from €9.95) also supports my work on this website.

Note that standard SQL’s merge is driven by the source table. Each row in the source is tested against the when…then rules. Rows in the target that have no corresponding source row are ignored. In the example, the result of the on clause is never true for other users, so the merge command will not affect those rows. That’s fine for our use case where we only want to update the wish list of one specific user. On the other hand, if that specific user removes a product from the list, we should also remove the respective entry from the wish_lists table. That is the idea behind synchronization. Unfortunately, standard SQL has no nifty syntax to act on extra rows in the target. We will look at this again later.

Before we continue with more advanced syntax, here’s one last fundamental thing to note. A merge statement modifies each target row at most once.4 If a single target row has multiple corresponding source rows, the merge statement raises an error✓✗ and aborts.5 That prevents the unfortunate situation that the order in which the source rows are processed affects the effect of merge. Look at the example above and ask yourself this: What happens if my_wish_list has several entries for a single product?6 If the qty values are not the same, then the order of execution makes a big difference.

When … And Conditions

The when conditions are not limited to the [not] matched keywords. After an and you can put an additional, arbitrarily complex condition. This allows us to add another when matched rule that deletes a row if the qty is 0.

 WHEN MATCHED AND my_wish_list.qty = 0 THEN DELETE
 WHEN MATCHED THEN UPDATE SET qty = my_wish_list.qty

Due to the additional condition, the first when matched rule only applies if the qty has been set to zero. The corresponding target row is deleted in this case. The remaining “matched” rows cause an update like before. This update could be limited by an additional condition as well: wish_lists.qty <> my_wish_list.qty. This prevents an update if the quantity is unchanged. That is semantically cleaner—in particular if triggers or change data capture (CDC) is involved—and can generally bring considerable performance improvements in some systems.

In case multiple when conditions are true, only the first of them takes effect—very much like how case works. The order of the rules is therefore significant.7 Source rows that don’t fulfill any when condition don’t cause any action.

Apache DerbyBigQueryDb2 (LUW)H2Oracle DBaPostgreSQLSQL Server…matched and … thenmultiple …then updatemultiple …then insert
  1. Alternative: where clause. See Notable Extensions

Note that there are systems that accept and conditions but are still limited to a single rule for each matched/not matched case.

Illogical Errors

Sometimes merge causes confusion by giving surprising error messages. That results from the fact that standard SQL’s merge first identifies all changes that must be done and then, in a second step, applies those changes. In other words, all when conditions are evaluated before any changes are made. For the last example, it means that rows deleted by the new rule do not activate any when not matched rule—not even if it appears later in the merge command. By the time insert, update, or delete starts, all decisions have already been made.8

Now consider this scenario:9 A merge statement evaluates the when not matched rules and identifies a row to be inserted. Meanwhile, before merge actually inserts the row, another transaction inserts a row with the same primary key values and commits the transaction. When merge eventually performs the operations, the insert fails with a constraint violation. This might come as a surprise because the when not matched then insert rule should not apply if the row exists. Merge first decides what to do, then it does it. In the meantime, the situation may have changed. Constraint and transaction isolation are preserved—even if they cause surprising errors.

The below mentioned “upsert” statements provided by some systems might behave less surprisingly in such scenarios.

Scoping

I’ve already explained some syntax differences that apply to insert, update and delete when used in merge as compared to their stand-alone cousins. However, there is a more subtle aspect to consider: Stand-alone insert, update, and delete statements have only one table in scope, but inside merge they have two tables in scope. The next example highlights four places where it is unclear which table’s column x is being referred to if both tables have that column.

MERGE
 INTO target
USING source
   ON x = x ⓵
WHEN MATCHED AND x > 0 ⓶ THEN UPDATE SET x = x + 1 ⓷
WHEN NOT MATCHED         THEN INSERT (x) VALUES (x)

Of course we can resolve these ambiguities by qualifying the column with the respective table name, e.g. source.x, but we would also expect that the system reports ambiguities if there are any—and that it doesn’t report them if there are none. It turns out that these totally reasonable expectations are not always met.

Apache DerbybdBigQuerybDb2 (LUW)bcH2bdOracle DBaabcdPostgreSQLSQL Serverbsource & target visible in onreports ambiguities in onsource & target visible in andreports ambiguities in andsource & target visible in setreports ambiguities in setwrong ambiguity in values
  1. See Notable Extensions
  2. The target name is also visible on the left-hand side, which is not covered by standard SQL’s BNF syntax: update set <target>.<col-name> = …
  3. ⚡No error raised: implementation-defined resolution of ambiguous column names
  4. Columns names that exist in both tables may cause errors even though only the source should be in scope—in those systems qualification resolves the problem, but that is not standard SQL

It is particularly worrisome that some systems don’t report the ambiguity labeled as . After all, it makes a big difference whether the update increments the value currently stored in the target (x = target.x + 1) or sets it to the value x = source.x + 1. These are two completely unrelated things, and some systems just pick one at their own discretion.

Not Matched by Source: Deleting Extra Rows

Standard SQL has no syntax to process rows in the target that have no corresponding row in the source, but that’s just standard SQL. In fact there are systems that offer such a syntax—beyond what is written in the standard.10

For demonstration, let’s continue the wish list example. The merge statement should also cover the case that an item that was on the wish list before is removed. Thus, a row in wish_lists should be removed if there is no corresponding entry in the source. The non-standard syntax for this is when not matched by source.

 WHEN NOT MATCHED THEN INSERT (user_id, product_id, qty)
                       VALUES (      ?, product_id, qty)
 WHEN NOT MATCHED BY SOURCE THEN DELETE

The new rule applies to those target rows that have no corresponding source row. The then clause deletes them. All of them. That is, in addition to the wish list items that the particular user removed, all wish list items of all other users are also removed! Look at the on clause: The condition on user_id was meant to limit the impact of the merge statement to one particular user. That works if only source rows can trigger actions. With when not matched by source, sole target rows can trigger actions too. As items of other users never have a corresponding source row, they are all deleted. The requirement not to affect other users has been criminally neglected.

Luckily, this requirement can be added straight into the when condition.

WHEN NOT MATCHED BY SOURCE AND user_id = ? THEN DELETE

Due to the and, the rule only applies to entries of the respective user.

Apache DerbyBigQueryDb2 (LUW)H2Oracle DBPostgreSQLSQL Server…when not matched by source…when not matched by target

In addition to when not matched by source, there is also a “by target” variant which corresponds to the standard SQL condition when not matched without any by qualifier.

Standard SQLs merge can also get the job done as you can use arbitrarily complex sub-queries as the source. That allows you to add the extra rows that need to be deleted to the source—using a full outer join or a union. From the perspective of merge, these rows will be “matched” rows so they can activate a when matched then delete rule. However, this approach loses almost all the beauty of a single merge statement. Using a separate delete statement might be a better option. I’ll leave this as an exercise for the readers and conclude by showing a better example for what you can do in the merge source.

JSON Documents as Source

You might have been wondering, and for a good reason, what’s the point of the wish list example if the front end has to insert into the my_with_list table first?11 After all, it might be easier to issue the required delete, update and insert statements—in particular if they are generated by an ORM tool. The point of this example is to demonstrate how combining different SQL features opens up interesting possibilities.

Realistically, a web front end would provide the edited content of the wish list as a JSON document—maybe like this:

[
 {"product_id":42 ,"qty":1}
,{"product_id":123,"qty":2}
]

Now, the json_table function can easily transform such a JSON document into a tabular form, suitable as a source of merge. Just put it into the using clause.

…
USING JSON_TABLE( ?
                , '$[*]'
          COLUMNS ( product_id INT PATH '$.product_id'
                  , qty        INT PATH '$.qty'
                  )
                ) my_wish_list

If the caller provides the JSON document as the value for the bind parameter (?), SQL will take care of everything else.

While we’re covering composability, let’s add a short consideration of a similar yet substantially different use case: placing an order. This use case is similar because it would use a similar JSON document. Still, there are three reasons why it is also substantially different: (1) The ordered products just need to be inserted—it is not a case for merge; (2) A second table needs to be filled—the table for the order itself, not the ordered products. Thus, we need two insert statements on different targets.12 Again, merge is no help here, but SQL feature T495, “Combined data change and retrieval”, is helpful. It allows you to embed insert, update, delete and merge statements via subqueries into select statements. Thus, you can arbitrarily combine them in a single SQL statement that takes care of all database operations required to place an order. That’s something for another article.

For now, we need to complete the list of three substantial differences between updating a wish list and placing an order: (3) The tabular (relational) storage of orders and ordered items is definitively beneficial for further processing. That is not necessarily the case for a wish list. We could also save the wish list JSON as it is. In case the contents of the wish lists ever needs to be analyzed, the respective SQL functions (like json_table) can still access the contents of the JSON document. After all, SQL does not dictate how you store the data; in fact, the contrary is true: SQL supports various approaches.13 We can—actually we must—choose the best approach for each use case.

Apache DerbyBigQueryDb2 (LUW)aH2bMariaDBMySQLOracle DBPostgreSQLSQL ServerSQLiteCombined data change and retrievalData-modifying statements in with
  1. Without merge • Only if top-level statement is select: insert into … select … final from (delete from …)
  2. Without merge

You can’t catch up on 20 years of SQL evolution in one day. Subscribe the newsletter via E-Mail, Twitter or RSS to gradually catch up and to keep modern-⁠sql.com on your radar.

About the Author

Photo of Markus Winand

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Supplemental Compatibility

Notable Extensions

Apache DerbyBigQueryDb2 (LUW)H2Oracle DBaPostgreSQLSQL Server…matched then do nothing…matched then … where ……when not matched by source…when not matched by target
  1. ⚡Implementation-defined resolution of ambiguous column names

Error Codes (SQLSTATE)

Apache DerbyBigQueryaDb2 (LUW)bH2aOracle DBaPostgreSQLSQL ServeraCardinality violation: 21000
  1. Wrong SQLSTATE class (not 21)
  2. Correct SQLSTATE class 21, but subclass is not 000

Syntax Variants

If you care about maximum compatibility, keep in mind that not all products support all syntax variants.

Apache DerbyaBigQueryDb2 (LUW)H2Oracle DBPostgreSQLSQL Server… into <target>      … into <target>    t1… into <target> as t1… using <source>      … using <source>    t1… using <source> as t1… on   <condition>  … on ( <condition> )…not matched by target…insert (<target>.col)…
  1. Only when not renaming the target table

Normative References

The merge statement is defined in ISO/IEC 9075-2:2023 as optional features F312, “MERGE statement”, F313, “Enhanced MERGE statement” and F314, “MERGE statement with DELETE branch”.

You can’t catch up on 20 years of SQL evolution in one day. Subscribe the newsletter via E-Mail, Twitter or RSS to gradually catch up and to keep modern-⁠sql.com on your radar.

About the Author

Photo of Markus Winand

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Buy the Book

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy now!
(paperback and/or PDF)

Paperback also available at Amazon.com.

Hire Markus

Markus offers SQL training and consulting for developers working at companies of all sizes.
Learn more »

Footnotes

  1. Sometimes the best approach is to just delete or truncate everything and copy the new data over. However, sometimes the functionality provided by merge is the better option. In particular, side effects like triggers or change data capture (CDC) work best with merge.

  2. Standard SQL explicitly allows views to be the target of merge to the same extent it allows views to be the target of insert, update, and delete.

  3. The underlined columns user_id and product_id form the primary key.

  4. It also happens if the qty is unchanged.

  5. Exception: the then update … delete syntax of the Oracle Database (not standard).

  6. ISO/IEC 9075-2:2016 §14.12 GR 6ai1BII: cardinality violation (21000).

  7. Assuming there is no constraint on my_wish_list that prevents it.

  8. Strictly speaking only the relative order of when matched rules among themselves and when not matched rules among themselves is significant.

  9. This approach also prevents paradoxical situations when updating columns used in the on clause.

  10. Assuming that the on clause uses the primary key and that the system uses multi-version concurrency control (aka snapshot isolation).

  11. And that’s totally fine. The standard just defines the required behavior for the standard syntax. Implementations are allowed to accept non-standard syntax and do whatever they want.

  12. A use case for a global temporary table, by the way.

  13. The multi-table insert of the Oracle Database is of limited help for this use case.

  14. JSON support was added to the SQL standard in 2016 and was substantially extended in the 2023 release. SQL also supports XML and nested tables, just to name a few more options.

Connect with Markus Winand

Subscribe mailinglistsSubscribe the RSS feedMarkus Winand on LinkedInMarkus Winand on XINGMarkus Winand on TwitterMarkus Winand on Bluesky
Copyright 2015-2024 Markus Winand. All righs reserved.
Legal | Contact | NO WARRANTY | Trademarks | Privacy and GDPR